The Numerix-DSP Blog: 2020

Tuesday, 22 December 2020

Version 9.00 Of The SigLib DSP Library Released

Version 9.00 is the latest version of the SigLib Digital Signal Processing (DSP) library and is available now from http://www.numerix-dsp.com/siglib.html.

V9.00 now includes functions for training and infering Artificial Intelligence and Machine Learning Convolutional Neural Networks (CNNs). The SigLib ML functions are designed for embedded applications such as vibration montioring etc. They are architected for Edge-AI applications and have been written for the highest level of MIPS and memory optimization.

To evaluate the Numerix-DSP Libraries and these new AI algorithms, the latest version can be downloaded from here : http://www.numerix-dsp.com/eval/.

Tuesday, 1 December 2020

Upcoming tinyML Foundation Talk - Low MIPS & Memory Machine Learning Industrial Vibration Monitoring Solution

John Edwards will be presenting a tinyML Talk on December 22, 2020 “Low MIPS & Memory Machine Learning Industrial Vibration Monitoring Solution - AKA Not All AI Applications Are Cat v Dogs on Facebook ;-)”

Further details are available here: https://forums.tinyml.org/t/tinyml-talks-on-december-22-2020-low-mips-memory-machine-learning-industrial-vibration-monitoring-solution-aka-not-all-ai-applications-are-cat-v-dogs-on-facebook-by-john-edwards/431

Free registration here: https://us02web.zoom.us/webinar/register/7016059053980/WN_EdF91B-1Reuj2WntQh8Bzw

Thursday, 22 October 2020

eBook: 8 DSP Fundamentals Every Electronics Engineer Should Know

A guide to the core knowledge required for DSP application development and feature engineering – by John Edwards and Dunstan Power of Bytesnap Design.

For assistance with your Digital Signal Processing project, please contact Dunstan at Bytesnap or John at Sigma Numerix.

Wednesday, 2 September 2020

What To Consider When Integrating DSP Algorithms

Here is a little article I wrote for Electronics Weekly, with my good friend Dunstan Power from ByteSnap Design.

Click on the image below to view the full article.

#signalprocessing #dsp #machinelearning

Thursday, 27 August 2020

Answers To Questions From The Data Science Festival Lunch & Learn "The Frequency Domain And How It Can Be Used To Aid Artificial Intelligence"

Thank you very much for attending the Lunch & Learn.
The presentation can be downloaded from here: https://www.numerix-dsp.com/ai.
The recording is available on the Data Science Festival YouTube channel : https://www.youtube.com/datasciencefestival.
Direct link: https://www.youtube.com/watch?v=6XBM0_G7iwk.

Q. Can you give definition of 'DSP'?
A. Yes. DSP is the digital processing of real-time signals. These are usually 1D signals such as voice, acoustics, radar, wireless etc. Real-world analog signals are converted to digital signals using an Analog to Digital Converter (ADC).

Q. Is the order-gram output you presented earlier a type of Fast Fourier Transform computation i.e from Spacial domain to Frequency Domain?
A. No, the ordergram is a method of ensuring that the fundamental frequency (and all harmonics) are in a fixed "location" regardless of the rotational speed of the machine.

Q. Web search results for 'ordergram' are sparse. Please may you recommend further reading, or is there an alternative term one may search for online?
A. You're right that doesn't show up many results. The ordergram is the result of Order Analysis and Mathworks have an excellent summary : https://uk.mathworks.com/help/signal/examples/order-analysis-of-a-vibration-signal.html.

Q. On the multi layer backpropagation slide, why did you choose a hidden layer length of 25? Is that significant?
A. I actually started by over specifying the hyper-parameters (input length and hidden layer length). My original choice was 256 (input) and 128 (hidden) - lets call this 256/128 but note the input length is half the FFT length.
I reduced both until the performance degraded. At 64/32 there was no noticable degradation but at 64/16 there was a clear drop-off.
I wrote a script that Iterated over a range of hidden layer lengths, from 10 to 63 (Input length -1), and ran this overnight.
It was a case of diminishing returns – above 25 (roughly half the number of inputs) there was very little benefit in using more hidden layers.

Q. Can a recording of this talk be circulated please. very interesting, but I have to step away for another meeting. Good luck to all.
A. Yes, You can relive the whole experience :-) using the link above.

Q. Could we see the reading recommendations list again please?
A. Sure, You can download the complete presentation using the link above.

Q. How did you get your labelled data?
A. It was sampled using a calibrated microphone : https://www.minidsp.com/products/acoustic-measurement/umik-1.
Each recording was stored in a file with a unique identifying name, that was used to track the performance and the results.
Although the simulation, training and real-time prediction code was written in C, the test framework was written in Python. The benefit of that was that the test framework could choose whether to use the simulation or real-time code for verification and regression test purposes.

Q. I missed the introduction, can you please quickly explain why we should use Frequency domain instead of Raw data
A. The frequency domain is a method for extracting key features such as resonant frequencies. This is more run-time efficient than relying on the Neural Network to extract these features.

Q. Are there any other functional transforms you like to try and apply when experimenting other than the fourier transform?
A. Yes, absolutely. I mentioned the Mel Frequency Cepstrum in the presentation and I think this would be worthy of further evaluation. Comparing the FFT to MFCCs is a trade-off between frequency resolution and MIPS.

Q. Did you do a comparison between using the time-domain signal versus the frequency-domain signal as inputs to you convolutional Neural Net classifier. To see the benefit of transforming the signal in to the frequency domain?
A. Nothing beyond having a play with Endolith's code, that I referenced in the presentation. This would be an interesting piece of research.

Q. The learning happens on a different machine and only the model is deployed on the edge right?
A. Absolutely correct. The model generated by the training process is stored in C array that is linked into the real-time code, at compile time.

Q. Given the number of pre-processing steps on the signal before it gets to the ML network, can one measure how much modifying these steps would 'break' the network's predictions? e.g. modifying the Hamming window, or sample frequency. I'm curious from a software/production risk perspective.
A. Yes, one could do this and it would be another excellent piece of research. I think the presented code would be a minimum but would benefit from additional algorithms such as zero crossing counting, peak detection etc..

Q. Why only a single hidden layer?
A. I was suspicious at first too but this was found to give excellent performance. From my earlier AI work with images, I belive that the reason only one hidden layer is necessary is because the DSP pre-processing extracts the features that the neural network finds easy to detect.

Q. What would be the benefit from additional hidden layers?
A. I suspect there would be a benefit if the signals were highly correlated but it would be a trade-off. My gut feeling from the existing project development is that there would be more to be gained by using a larger FFT or an even larger FFT followed by MFCC.

Q. What classes ( labels ) used in the output layer?
A. The output is the activation level for each category, with each category being referenced to the filename for labelling. Following the activation level output, there is a simple comparator that detects with category has the highest energy.

Q. Do you know a good code and data for hands on work in the frequency domain for AI?
A. This is a great question. The main benefit would be using the Frequency Domain for implementing the convolutional kernels.
Like most things in DSP, it is a trade-off. Larger convolutions gain more from using the frequency domain (less MIPS than the time domain) but there is more latency.
I will be presenting a paper on Frequency Domain Signal Processing, at the DSP Online Conference on the 24th September: https://www.dsponlineconference.com/session/Frequency_Domain_Signal_Processing.

Q. What would an alternative of classifying the same data using analog circuits look like? Feasible? Less or more expensive in terms of hardware resources vs digital ?
A. I have heard about analog classifiers but I have no experience with them so I can't answer this directly except to share my experience with analog vs. digital signal processing. The key point is that DSP devices benefit from Moores Law but analog devices don't. So every year DSPs get higher performance and lower cost. From a signal point of view, the biggest challenge to analog is noise.

Q. As a turbine ages, how rapidly does its spectrogram change? How often one would need to re-train the model.
A. The spectrogram does indeed change so the model needs to be trained on all different vibration modes, and ages, of the engine so that the classifier can detect all of the different variations with a single model.

Q. How different are the spectrograms from one turbine vs another? i.e. is the model “portable” / applicable to different devices or device specific?
A. It is very engine model specific so it depends on the number of fans, the number of blades in each fan, the architecture of the turbine (radial, axial, by-pass etc).

Q. Did you try the Mel Cepstrum, instead of the Fast Fourier Transform ?
A. Yes, I performed a short evaluation of the Mel Frequency Cepstral Coefficients (MFCCs).

For those who are not familiar with the Mel Cepstrum, this uses the FFT to calculate the spectrum then it generates a logarithmically reduced set of frequency coefficients. The benefit is that the AI algorithm then requires less MIPS, memory, power consumption etc. MFCCs are great for speech recognition and similar applications.

It is a trade-off between frequency resolution and recognition performance so a Mel cepstrum would require a larger FFT at the input and possibly a similar number of MFCCs on the output as the pure FFT solution.
I think there is potential here so it is definitely something I hope to research further in the future.

Q. Why did you program the app in C and not use a standard API such as Tensorflow Lite ?
A. This was driven by the inferencing / prediction function. The primary goal was to use whatever accelerators are available on the target CPU to optimize the convolution operations and minimize the MIPS and power consumption.
For training, I could have used Tensorflow to build the model and just used the C code for deployment. But for any specific Neural Network system the training function is only a few minor changes from the prediction function so once I'd written the predictor I just carried on and wrote the training function.

Q. How many categories can this technique support ?
A. The project was tested with four categories however this could easily be extended.

From my experience, the number of categories supported depends on three main things:

How similar the signals are that need to be detected (their cross-correlation)
How much training data is available - more data would mean a greater ability to differentiate similar signals
The frame length - longer frame lengths would help differentiate more categories

For more details about Numerix AI Consultancy services, please see here: https://www.numerix-dsp.com/ai.

Monday, 24 August 2020

Data Science Festival Lunch & Learn – The Frequency Domain And How It Can Be Used To Aid Artificial Intelligence

Really pleased to have been invited to present at the Data Science Festival Lunch and Learn, this Thursday 27th.

The topic is "The Frequency Domain And How It Can Be Used To Aid Artificial Intelligence".

I will be presenting a general introdcuction and a high level overview of a project that I've been working on over the last few months.

Further details and registration are here : https://www.datasciencefestival.com/event/dsf-lunch-learn-the-frequency-domain-and-how-it-can-be-used-to-aid-artificial-intelligence/.

Thursday, 18 June 2020

Developing Artificial Intelligence Applications using Python and TensorFlow course at the University of Oxford

I've been broadening my AI knowledge over the last few weeks by attending the "Developing Artificial Intelligence Applications using Python and TensorFlow" course at the University of Oxford. University of Oxford, Department for Continuing Education.

https://www.conted.ox.ac.uk/courses/developing-artificial-intelligence-applications-using-python-and-tensorflow.

Course Director is the very knowledgable Ajit Jaokar.

If any of you want to learn more about AI, I can't recommend the course highly enough.

I'm sure Ajit will be running more courses because this one was over subscribed so sign up ASAP.

#AI #artificialintelligence #machinelearning #universityofoxford #datascience

Sunday, 24 May 2020

A Simple Google Assistant On The Raspberry Pi

Here's a fun lockdown project that explains how to integrate a Google assistant on a Raspberry Pi : https://github.com/Numerix-DSP/GoogleAssistant .

Saturday, 16 May 2020

Timing DSP Code Running On ARM Cortex Architecture

A recent project reqired porting some DSP algorithms to the NXP LPC55S6x ARM Cortex-M33 based microcontroller.

It was necessary to benchmark the algorithms so I wrote the following code that utilizes the Cycle Count Register, which is part of the ARM Cortex-M Data Watchpoint and Trace (DWT) unit.

The code below includes macros for accessing the DWT and also calculates the overhead of calling the functions to read the timer register, before using the same functions to time some code.

This code has been compiled and tested on the NXP LPCXpresso55S69 Development Board but should run on any ARM device that includes the DWT module.

The benchmarking macros and functions are stored in benchmark_code.h:

// benchmark_code.h

// Macros and functions for benchmarking code

#ifndef BENCHMARK_CODE_H

#define BENCHMARK_CODE_H

// Timers

// DWT (Data Watchpoint and Trace) registers, only exists on ARM Cortex with a DWT unit

#define KIN1_DWT_CONTROL (*((volatile uint32_t*)0xE0001000)) // DWT Control register

#define KIN1_DWT_CYCCNTENA_BIT (1UL<<0) // CYCCNTENA bit in DWT_CONTROL register

#define KIN1_DWT_CYCCNT (*((volatile uint32_t*)0xE0001004)) // DWT Cycle Counter register

#define KIN1_DEMCR (*((volatile uint32_t*)0xE000EDFC)) // DEMCR: Debug Exception and Monitor Control Register

#define KIN1_TRCENA_BIT (1UL<<24) // Trace enable bit in DEMCR register

#define KIN1_InitCycleCounter() KIN1_DEMCR |= KIN1_TRCENA_BIT // TRCENA: Enable trace and debug block DEMCR (Debug Exception and Monitor Control Register

#define KIN1_ResetCycleCounter() KIN1_DWT_CYCCNT = 0 // Reset cycle counter

#define KIN1_EnableCycleCounter() KIN1_DWT_CONTROL |= KIN1_DWT_CYCCNTENA_BIT // Enable cycle counter

#define KIN1_DisableCycleCounter() KIN1_DWT_CONTROL &= ~KIN1_DWT_CYCCNTENA_BIT // Disable cycle counter

#define KIN1_GetCycleCounter() KIN1_DWT_CYCCNT // Read cycle counter register

void benchmark_init_cycle_counter ()

{

KIN1_InitCycleCounter(); // enable DWT hardware

KIN1_ResetCycleCounter(); // reset cycle counter

KIN1_EnableCycleCounter(); // start counting

}

#define benchmark_get_cycle_counter KIN1_GetCycleCounter

#endif /* BENCHMARK_CODE_H */

This code can be used in the following manner:

#include "fsl_debug_console.h"

#include "benchmark_code.h"

int main(void)

{

uint32_t start_time, end_time, overhead_time; // number of cycles

benchmark_init_cycle_counter(); // Initialize benchmark cycle counter

start_time = benchmark_get_cycle_counter(); // get cycle counter

__asm volatile ("nop");

end_time = benchmark_get_cycle_counter(); // get cycle counter

overhead_time = end_time - start_time;

start_time = benchmark_get_cycle_counter(); // get cycle counter

PRINTF("Put your code to be benchmarked here ...\r\n");

end_time = benchmark_get_cycle_counter(); // get cycle counter

printf ("Elapsed time = %d cycles\n", end_time - start_time - overhead_time);

return(0);

}

Notes

There appears to be a +/- 1 cycle jitter on the results of any code timing instance. I have not got to the bottom of exactly why but regardless of the route cause, this is very accurate and definitely suitable for the vast majority of applications.

References

[1] https://developer.arm.com/docs/100230/0003/part-c-debug-and-trace-components/data-watchpoint-and-trace-unit/dwt-programmers-model

[2] https://mcuoneclipse.com/2017/01/30/cycle-counting-on-arm-cortex-m-with-dwt/

If you have found this solution useful then please do hit the Google (+1) button so that others may be able to find it as well.

Numerix-DSP Libraries : http://www.numerix-dsp.com/eval/

Monday, 27 April 2020

Python/Numpy : How Not To Generate A Sinusoidal Waveform

I was recently reviewing some Python/Numpy code that included a waveform generator. A simplified version of code looked like this :

x = np.linspace(0,2*np.pi-(2*np.pi/8),8)
np.sin(x)

This generates the following :

array([ 0.00000000e+00, 7.81831482e-01, 9.74927912e-01, 4.33883739e-01,
-4.33883739e-01, -9.74927912e-01, -7.81831482e-01, -2.44929360e-16])

Which looks like a perfect single cycle of a sinusoid. Except it isn't !

On closer inspection, the last element in the array is, to all intents and purposes, 0, which means that this isn't a perfect single cycle of a sinusoid because that final sample is actually the first sample of the next cycle.

To generate a perfect single cycle of a sinusoid using linspace you need to account for where the last sample of the sinusoid should fall, if you were to plot it on a graph.

x = np.linspace(0,2*np.pi-(2*np.pi/8),8)

np.sin(x)

This generates the following array, which is spot on :

array([ 0.00000000e+00, 7.07106781e-01, 1.00000000e+00, 7.07106781e-01,

1.22464680e-16, -7.07106781e-01, -1.00000000e+00, -7.07106781e-01])

In thinking about this problem, it occurred to me that this is not ideal and very likely to cause confusion becasue it is easy to forget the required modification. The main reason for the confusion is that standard Python generates and processes data from, for example, 0 to N-1 as shown in this simple Numpy example :

np.arange(8.)

Which yeilds :

array([0., 1., 2., 3., 4., 5., 6., 7.])

So returning to the original problem, a far safer way of generating the sinusoid is the following code :

x = np.arange(0., 2.*np.pi, 2.*np.pi/8.)

np.sin(x)

Which generates the following array :

array([ 0.00000000e+00, 7.07106781e-01, 1.00000000e+00, 7.07106781e-01,

1.22464680e-16, -7.07106781e-01, -1.00000000e+00, -7.07106781e-01])

Now we have the first np.arrange() instruction to generate the time index and the second stage np.sin() to generate the sinudoid. This is clear, precise and unlikely to cause error.

Side note : of course, it would be entirely possibl to combine this into a single line instruction however I believe this causes other possibilities for error insertion.

Saturday, 25 April 2020

The Difference Between FFT Spectrum and Power Spectral Density

I always teach the difference between FFT Spectrum and Power Spectral Density on my DSP courses and many students find it confusing.

This applications note from Audio Precision summarizes the subject very neatly : The Difference Between FFT Spectrum and Power Spectral Density

Functions for calculating both the FFT Spectrum and Power Spectral Density are included in the SigLib DSP Library.

Saturday, 8 February 2020

VMWare Virtual Machines On Windows 10 - Disabling Device/Credential Guard - Solution

Not a DSP related post but something that caused me no end of unnecessary pain.

I use Virtual Machines a lot but they stopped working under Windows a while back with the following message :

"VMware Workstation and Device/Credential Guard are not compatible"

The VMWare URL pointed to a Microsoft webpage that was out of date with the newer version of Windows 10 I am using [Version 10.0.19041.21].

This helped greatly but, unfortunately, it is still out of date :
https://www.tenforums.com/tutorials/68913-enable-disable-device-guard-windows-10-a.html

Here is what I had to do but note, VMWare only started working after doing all three so the first two might not be necessary but it works now so I'm not going to make any changes ;-)

Control Panel | Programs And Features | Turn Windows Features On or Off | Untick the following :
Hyper-V
Virtual Machine Platform
Windows Hypervisor Platform

Search Windows for “Group Policy” open "Edit Group Policy" app and do :
Computer Configuration\Administrative Templates\System\Device Guard
Disable : Turn On Virtualization Based Security

Download dgreadiness from here : https://www.microsoft.com/en-us/download/details.aspx?id=53337 and do the following in an Administrator PowerShell :
.\DG_Readiness_Tool_v3.6.ps1 -Disable
Reboot.

Unfortunately, breaks Windows Subsystem For Linux :-(.
My current solution, to run WSL2, is to do the following in an Administrator PowerShell :
.\DG_Readiness_Tool_v3.6.ps1 -Enable
Reboot.

This is so bloody stupid, that I can't run a VM and WSL side-by-side.

I'll endeavour to keep this page updated when Microsoft change things, again.

PS I'm sure this is also necessary for VirtualBox but I haven't got a current Windows 10 hosted VirtualBox to test.