While benchmarking code to run on a traditional DSP or embedded microcontroller is a relatively simple task, due to the deterministic nature of DSP architectures, doing the same on a Pentium PC is quite tricky. I've recently tried two of the most common techniques for benchmarking DSP algorithms on a Pentium PC. The two techniques are :
Reading the Pentium Time Stamp Counter Register : http://stackoverflow.com/questions/9887839/clock-cycle-count-wth-gcc
Instruction counting using GDB : http://stackoverflow.com/questions/21628002/counting-machine-instructions-using-gdb
The Time Stamp Counter option has a lot of disadvantages when used in a multi-tasking OS due to the whole task switching happening in parallel with the application.
The inital overhead calculation may take longer, due to a task switch, so the final benchmark may take less than 0 cycles when subtracting the overhead of calling the timer functions.
The GDB solution counts instructions but does not allow for pipelining, caching, and run-time parallel instruction execution.
So far the only option I have come up with is to use a statistical analysis of the results to get an approximation to how efficient an algorithm is.
Here are some results from some different filter functions that I have benchmarked (more details in an upcoming post about the different filtering functions).
The TSC technique shows that runtime parallelization of code gives better than 1 instruction per cycle execution but each mode was executed a twenty times and the average result was taken so take note of the problems listed above.
MODE GDB TSC
1 2514 975
2 1171 754
3 1188 547
In summary, GDB instruction counting gives a good approximation to how efficient an algorithm is but the Time Stamp Counter solution gives a better estimate to the actual number of CPU clock cycles are required.