One of the most popular techniques for developing DSP systems is to simulate the system in C on a general purpose micro-processor and then port the C code onto a DSP device, for many applications the C may give perfectly acceptable performance, in others however some of the routines must be coded in hand optimized assembler. Many of these functions can be easily optimized by compiling to assembler and then hand optimizing.
The choice between fixed and floating point devices is quite fundamental to an application. Fixed point devices generally have a higher clock speed due to their smaller size because they do not have the extra complexity of the floating point hardware. They are also produced in larger quantities which means that they benefit from the cost savings of mass production and they can therefore sell for a lower cost.
Floating-point devices give a higher dynamic range, which will allow more complex applications to be implemented for example a large 2D FFT can require a very large dynamic range, despite the limited resolution of the input, in this case it is almost certain that fixed point devices would not be suitable. One of the problems with fixed point devices is that the programmer must always be aware of the fact that the device can overflow its numerical bounds and if this is likely then the numbers should be scaled appropriately before continuing the process. The scaling can be quite time consuming and as a rule of thumb an FFT of more than about 1024 or 2048 points is quicker on a floating-point device, despite the faster clock speed of fixed point DSPs. Although floating-point devices give a larger dynamic range a floating-point device with a given mantissa length will give no greater precision than a fixed-point device with the same size word length.
When coding real-time routines in C there are some rules that can be followed to make the code as efficient as possible. First, all function parameters and local variables are placed on the stack and so they are accessed indirectly. This can often be over come by using compilers that use registers for variables but it is often the case that the code can be optimized by careful choice of register variables and by using global variables which are placed on the heap. When implementing interrupt service routines all registers that are used must be pushed onto the stack, to prevent side effects. In some cases using higher levels of optimization and hence extra registers for interrupt service routines may actually slow them down, due to the overhead of the extra stack manipulation that is required at both the start and end of the ISR.
If you have found this solution useful then please do hit the Google (+1) button so that others may be able to find it as well.
Numerix-DSP Libraries : http://www.numerix-dsp.com/eval/