Saturday 16 May 2020

Timing DSP Code Running On ARM Cortex Architecture

A recent project reqired porting some DSP algorithms to the NXP LPC55S6x ARM Cortex-M33 based microcontroller.
It was necessary to benchmark the algorithms so I wrote the following code that utilizes the Cycle Count Register, which is part of the ARM Cortex-M Data Watchpoint and Trace (DWT) unit.

The code below includes macros for accessing the DWT and also calculates the overhead of calling the functions to read the timer register, before using the same functions to time some code.
This code has been compiled and tested on the NXP LPCXpresso55S69 Development Board but should run on any ARM device that includes the DWT module.

The benchmarking macros and functions are stored in benchmark_code.h:

// benchmark_code.h
// Macros and functions for benchmarking code

#ifndef BENCHMARK_CODE_H
#define BENCHMARK_CODE_H

// Timers
// DWT (Data Watchpoint and Trace) registers, only exists on ARM Cortex with a DWT unit
#define KIN1_DWT_CONTROL            (*((volatile uint32_t*)0xE0001000))         // DWT Control register
#define KIN1_DWT_CYCCNTENA_BIT      (1UL<<0)                                    // CYCCNTENA bit in DWT_CONTROL register
#define KIN1_DWT_CYCCNT             (*((volatile uint32_t*)0xE0001004))         // DWT Cycle Counter register
#define KIN1_DEMCR                  (*((volatile uint32_t*)0xE000EDFC))         // DEMCR: Debug Exception and Monitor Control Register
#define KIN1_TRCENA_BIT             (1UL<<24)                                   // Trace enable bit in DEMCR register

#define KIN1_InitCycleCounter()     KIN1_DEMCR |= KIN1_TRCENA_BIT               // TRCENA: Enable trace and debug block DEMCR (Debug Exception and Monitor Control Register
#define KIN1_ResetCycleCounter()    KIN1_DWT_CYCCNT = 0                         // Reset cycle counter
#define KIN1_EnableCycleCounter()   KIN1_DWT_CONTROL |= KIN1_DWT_CYCCNTENA_BIT  // Enable cycle counter
#define KIN1_DisableCycleCounter()  KIN1_DWT_CONTROL &= ~KIN1_DWT_CYCCNTENA_BIT // Disable cycle counter
#define KIN1_GetCycleCounter()      KIN1_DWT_CYCCNT                             // Read cycle counter register

void benchmark_init_cycle_counter ()
{
    KIN1_InitCycleCounter();                                // enable DWT hardware
    KIN1_ResetCycleCounter();                               // reset cycle counter
    KIN1_EnableCycleCounter();                              // start counting
}

#define benchmark_get_cycle_counter KIN1_GetCycleCounter

#endif /* BENCHMARK_CODE_H */


This code can be used in the following manner:


#include "fsl_debug_console.h"
#include "benchmark_code.h"

int main(void)
{
uint32_t start_time, end_time, overhead_time; // number of cycles

  benchmark_init_cycle_counter(); // Initialize benchmark cycle counter

start_time = benchmark_get_cycle_counter();  // get cycle counter
    __asm volatile ("nop");
    end_time = benchmark_get_cycle_counter();  // get cycle counter
    overhead_time = end_time - start_time;


  start_time = benchmark_get_cycle_counter(); // get cycle counter
   PRINTF("Put your code to be benchmarked here ...\r\n");
    end_time = benchmark_get_cycle_counter();  // get cycle counter
    printf ("Elapsed time = %d cycles\n", end_time - start_time - overhead_time);

    return(0);
}

Notes
There appears to be a +/- 1 cycle jitter on the results of any code timing instance. I have not got to the bottom of exactly why but regardless of the route cause, this is very accurate and definitely suitable for the vast majority of applications.

References





If you have found this solution useful then please do hit the Google (+1) button so that others may be able to find it as well.
Numerix-DSP Libraries : http://www.numerix-dsp.com/eval/

No comments:

Post a Comment