Sunday, 8 January 2017

Some Thoughts On Benchmarking Applications On A Pentium PC (Windows/Mac/Linux)

The ability to benchmark DSP algorithms to check that they execute in real time is a key part of the DSP development process. I've written several blog posts on the subject including :

http://blog.numerix-dsp.com/2013/01/how-to-benchmark-some-c-code-or.html
http://blog.numerix-dsp.com/2015/01/timing-code-running-on-xmos-xcore.html

While benchmarking code to run on a traditional DSP or embedded microcontroller is a relatively simple task, due to the deterministic nature of DSP architectures, doing the same on a Pentium PC is quite tricky. I've recently tried two of the most common techniques for benchmarking DSP algorithms on a Pentium PC. The two techniques are :

    Reading the Pentium Time Stamp Counter Register : http://stackoverflow.com/questions/9887839/clock-cycle-count-wth-gcc
    Instruction counting using GDB : http://stackoverflow.com/questions/21628002/counting-machine-instructions-using-gdb

The Time Stamp Counter option has a lot of disadvantages when used in a multi-tasking OS due to the whole task switching happening in parallel with the application.
The inital overhead calculation may take longer, due to a task switch, so the final benchmark may take less than 0 cycles when subtracting the overhead of calling the timer functions.

The GDB solution counts instructions but does not allow for pipelining, caching, and run-time parallel instruction execution.

So far the only option I have come up with is to use a statistical analysis of the results to get an approximation to how efficient an algorithm is.

Here are some results from some different filter functions that I have benchmarked (more details in an upcoming post about the different filtering functions).

The TSC technique shows that runtime parallelization of code gives better than 1 instruction per cycle execution but each mode was executed a twenty times and the average result was taken so take note of the problems listed above.

MODE    GDB         TSC
1              2514          975
2              1171          754
3              1188          547

In summary, GDB instruction counting gives a good approximation to how efficient an algorithm is but the Time Stamp Counter solution gives a better estimate to the actual number of CPU clock cycles are required.

1 comment: