Friday, 18 January 2013

How to benchmark some C code (or assembly code, if you put the assembly in a C callable function)


I often have to benchmark DSP algorithms and while code profiling tools are excellent for large applications, DSP algorithms are often most easily benchmarked using simple timing functions.

Here are two examples that I interchange, depending on the underlying device and/or operating system. These examples are very portable but each solution has its own benefits and disadvantages and the two examples will almost always give different results when executed on the same platform/OS.

This first example uses the clock() function :


#include <stdio.h>
#include <time.h>

#define LOOP_COUNT 1000000000L

void main (void)
{
long i = 0L, j = 0L;

clock_t t_overhead, t_start, t_stop;

/* Calculate the timing overhead */
t_start = clock (); t_stop = clock (); t_overhead = t_stop - t_start;

t_start = clock ();
for (i = 0; i < LOOP_COUNT; i++) { j++; } // Put your code in here
t_stop = clock ();

printf ("Time : %lf seconds\n\n", (((double)(t_stop - t_start - t_overhead)) / CLOCKS_PER_SEC));

}


This example uses the ftime() function :


#include <stdio.h>
#include <sys/timeb.h>

#define LOOP_COUNT 1000000000L

void main (void)
{
long i = 0L, j = 0L;

struct timeb t_current;
long t_overhead, t_start, t_stop;

/* Calculate the timing overhead */
ftime(&t_current); t_start = (long)((t_current.time*1000L)+(long)t_current.millitm);
ftime(&t_current); t_stop = (long)((t_current.time*1000L)+(long)t_current.millitm);
t_overhead = t_stop - t_start;

ftime(&t_current); t_start = (long)((t_current.time*1000L)+(long)t_current.millitm);
for (i = 0; i < LOOP_COUNT; i++) { j++; } // Put your code in here
ftime(&t_current); t_stop = (long)((t_current.time*1000L)+(long)t_current.millitm);

printf ("Time : %lf seconds\n\n", (((double)(t_stop - t_start - t_overhead)) / 1000.));

}



Things to be aware of when using functions like this are :
While it is entirely possible to print times to the nearest ms I suspect that in many applications the values returned by the timing functions are not actually that accurate because underlying everything are a bunch of C functions, libraries and an OS.
When calling your processing function on a multi-tasking OS you have little or no control about what else the OS is doing in the background.
The clock function returns values that depend on the implementation
On some processors it returns 0 the first time it is called and then starts incrementing
On some processors it starts counting when the process starts
On some processors it counts in milliseconds
On some embedded processors (e.g. some DSPs I have programmed) clock () does not return the time in seconds but cycles of the processor.
You should always use CLOCKS_PER_SEC to turn this value into real time
The clock() function is more commonly implemented on embedded processors than the ftime() function.
If the function to be benchmarked is very short then it may be necessary to call it a number of times to average out the effects of the OS and the accuracy of the clock functions.



No comments:

Post a Comment