One of our customers was porting an application on to .NETCF from native code and observed that on ARM v7 the floating point performance of .NET CF was way below that of native code. In particular he was trying out sine function. Their micro-benchmark showed that doing sine over couple of million times was 27 times slower in .NETCF as compared to native code.
The reason behind this is that native code targets the Floating Point Unit (FPU) when available. However, .NETCF targets the lowest common denominator of ARMV4i and doesn’t use FPU even when run on a higher ARM version where FPU is available. It actually uses SW emulation for floating point operations and hence is slower.
If floating point math performance is critical for your application then the way to address that would be to write the math operation in native code (a native dll) and P/Invoke into it.
However, the cost of making P/Invoke calls is high (~5-6x slower than normal method calls) and you need to ensure that it doesn’t offset the savings in using the FPU. A common way to do that is to use bulking and reduce chattiness of P/Invoke calls. E.g., if you are evaluating an expression of the form func(a) + func(b) + func(c) instead of creating a P/Invokable function func and calling it thrice, its best to put the whole expression inside the native dll and call that one function which does the whole operation. If it is possible you can even try to go one step ahead and pass a buffer of input and output and get the native dll to do all the operation and return the processed buffer in one go.
Trivia: The .NET Compact Framework that runs on the Zune (which uses ARM as well) does use the FPU on it. So we do have the support in code and hopefully at some point in the future it will make it to the main-stream .NET Compact Framework.