Thursday 27 August 2020

Answers To Questions From The Data Science Festival Lunch & Learn "The Frequency Domain And How It Can Be Used To Aid Artificial Intelligence"

Thank you very much for attending the Lunch & Learn.
The presentation can be downloaded from here: https://www.numerix-dsp.com/ai.
The recording is available on the Data Science Festival YouTube channel : https://www.youtube.com/datasciencefestival.
Direct link: https://www.youtube.com/watch?v=6XBM0_G7iwk.

Q. Can you give definition of 'DSP'?
A. Yes. DSP is the digital processing of real-time signals. These are usually 1D signals such as voice, acoustics, radar, wireless etc. Real-world analog signals are converted to digital signals using an Analog to Digital Converter (ADC).

Q. Is the order-gram output you presented earlier a type of Fast Fourier Transform computation i.e from Spacial domain to Frequency Domain?
A. No, the ordergram is a method of ensuring that the fundamental frequency (and all harmonics) are in a fixed "location" regardless of the rotational speed of the machine.

Q. Web search results for 'ordergram' are sparse. Please may you recommend further reading, or is there an alternative term one may search for online?
A. You're right that doesn't show up many results. The ordergram is the result of Order Analysis and Mathworks have an excellent summary : https://uk.mathworks.com/help/signal/examples/order-analysis-of-a-vibration-signal.html.

Q. On the multi layer backpropagation slide, why did you choose a hidden layer length of 25? Is that significant?
A. I actually started by over specifying the hyper-parameters (input length and hidden layer length). My original choice was 256 (input) and 128 (hidden) - lets call this 256/128 but note the input length is half the FFT length.
I reduced both until the performance degraded. At 64/32 there was no noticable degradation but at 64/16 there was a clear drop-off.
I wrote a script that Iterated over a range of hidden layer lengths, from 10 to 63 (Input length -1), and ran this overnight.
It was a case of diminishing returns – above 25 (roughly half the number of inputs) there was very little benefit in using more hidden layers.

Q. Can a recording of this talk be circulated please. very interesting, but I have to step away for another meeting. Good luck to all.
A. Yes, You can relive the whole experience :-) using the link above.

Q. Could we see the reading recommendations list again please?
A. Sure, You can download the complete presentation using the link above.

Q. How did you get your labelled data?
A. It was sampled using a calibrated microphone : https://www.minidsp.com/products/acoustic-measurement/umik-1.
Each recording was stored in a file with a unique identifying name, that was used to track the performance and the results.
Although the simulation, training and real-time prediction code was written in C, the test framework was written in Python. The benefit of that was that the test framework could choose whether to use the simulation or real-time code for verification and regression test purposes.

Q. I missed the introduction, can you please quickly explain why we should use Frequency domain instead of Raw data
A. The frequency domain is a method for extracting key features such as resonant frequencies. This is more run-time efficient than relying on the Neural Network to extract these features.

Q. Are there any other functional transforms you like to try and apply when experimenting other than the fourier transform?
A. Yes, absolutely. I mentioned the Mel Frequency Cepstrum in the presentation and I think this would be worthy of further evaluation. Comparing the FFT to MFCCs is a trade-off between frequency resolution and MIPS.

Q. Did you do a comparison between using the time-domain signal versus the frequency-domain signal as inputs to you convolutional Neural Net classifier. To see the benefit of transforming the signal in to the frequency domain?
A. Nothing beyond having a play with Endolith's code, that I referenced in the presentation. This would be an interesting piece of research.

Q. The learning happens on a different machine and only the model is deployed on the edge right?
A. Absolutely correct. The model generated by the training process is stored in C array that is linked into the real-time code, at compile time.

Q. Given the number of pre-processing steps on the signal before it gets to the ML network, can one measure how much modifying these steps would 'break' the network's predictions? e.g. modifying the Hamming window, or sample frequency. I'm curious from a software/production risk perspective.
A. Yes, one could do this and it would be another excellent piece of research. I think the presented code would be a minimum but would benefit from additional algorithms such as zero crossing counting, peak detection etc..

Q. Why only a single hidden layer?
A. I was suspicious at first too but this was found to give excellent performance. From my earlier AI work with images, I belive that the reason only one hidden layer is necessary is because the DSP pre-processing extracts the features that the neural network finds easy to detect.

Q. What would be the benefit from additional hidden layers?
A. I suspect there would be a benefit if the signals were highly correlated but it would be a trade-off. My gut feeling from the existing project development is that there would be more to be gained by using a larger FFT or an even larger FFT followed by MFCC.

Q. What classes ( labels ) used in the output layer?
A. The output is the activation level for each category, with each category being referenced to the filename for labelling. Following the activation level output, there is a simple comparator that detects with category has the highest energy.

Q. Do you know a good code and data for hands on work in the frequency domain for AI?
A. This is a great question. The main benefit would be using the Frequency Domain for implementing the convolutional kernels.
Like most things in DSP, it is a trade-off. Larger convolutions gain more from using the frequency domain (less MIPS than the time domain) but there is more latency.
I will be presenting a paper on Frequency Domain Signal Processing, at the DSP Online Conference on the 24th September: https://www.dsponlineconference.com/session/Frequency_Domain_Signal_Processing.

Q. What would an alternative of classifying the same data using analog circuits look like?   Feasible?   Less or more expensive in terms of hardware resources  vs digital ?
A. I have heard about analog classifiers but I have no experience with them so I can't answer this directly except to share my experience with analog vs. digital signal processing. The key point is that DSP devices benefit from Moores Law but analog devices don't. So every year DSPs get higher performance and lower cost. From a signal point of view, the biggest challenge to analog is noise.

Q. As a turbine ages, how rapidly does its spectrogram change?  How often one would need to re-train the model.
A. The spectrogram does indeed change so the model needs to be trained on all different vibration modes, and ages, of the engine so that the classifier can detect all of the different variations with a single model.

Q. How different are the spectrograms from one turbine vs another?  i.e. is the model “portable” / applicable to different devices or device specific?
A. It is very engine model specific so it depends on the number of fans, the number of blades in each fan, the architecture of the turbine (radial, axial, by-pass etc).

Q. Did you try the Mel Cepstrum, instead of the Fast Fourier Transform ?
A. Yes, I performed a short evaluation of the Mel Frequency Cepstral Coefficients (MFCCs).

For those who are not familiar with the Mel Cepstrum, this uses the FFT to calculate the spectrum then it generates a logarithmically reduced set of frequency coefficients. The benefit is that the AI algorithm then requires less MIPS, memory, power consumption etc. MFCCs are great for speech recognition and similar applications.

It is a trade-off between frequency resolution and recognition performance so a Mel cepstrum would require a larger FFT at the input and possibly a similar number of MFCCs on the output as the pure FFT solution.
I think there is potential here so it is definitely something I hope to research further in the future.

Q. Why did you program the app in C and not use a standard API such as Tensorflow Lite ?
A. This was driven by the inferencing / prediction function. The primary goal was to use whatever accelerators are available on the target CPU to optimize the convolution operations and minimize the MIPS and power consumption.
For training, I could have used Tensorflow to build the model and just used the C code for deployment. But for any specific Neural Network system the training function is only a few minor changes from the prediction function so once I'd written the predictor I just carried on and wrote the training function.

Q. How many categories can this technique support ?
A. The project was tested with four categories however this could easily be extended.

From my experience, the number of categories supported depends on three main things:

  1.     How similar the signals are that need to be detected (their cross-correlation)
  2.     How much training data is available - more data would mean a greater ability to differentiate similar signals
  3.     The frame length - longer frame lengths would help differentiate more categories


For more details about Numerix AI Consultancy services, please see here: https://www.numerix-dsp.com/ai.


No comments:

Post a Comment