I was recently developing a Machine Learning application that would predict classes of audio stored in .wav file.

The .wav file frames are easily processed using numpy to include functions such as the Fast Fourier Transform (FFT).

When it came to processing the frames in the neural network I was stumped by how to translate the frames into a Tensorflow dataset and despite my best efforts I kept getting the following error:

ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received:

Through much reading of blog posts and finally by trial and error I found the solution so here it is.

I've created this bare minimum example code so that it is easy to follow but it is equally easy to expand the input frame sizes from 3 points, in this example, to as many as you like because the tricky bit is the reshaping and dimension expansion to convert the numpy arrays into the required format for Tensorflow.

Here is the python code, which includes plenty of print statements so that you can visualize how the numpy arrays are modified:

Here are the results:

(5, 3)

y.shape:

(5,)

X:

[[1 2 3]

[2 3 4]

[3 4 5]

[4 5 6]

[5 6 7]]

y:

[4 5 6 7 8]

X.shape:

(5, 1, 3, 1)

X:

[[[[1]

[2]

[3]]] [[[2]

[3]

[4]]] [[[3]

[4]

[5]]] [[[4]

[5]

[6]]] [[[5]

[6]

[7]]]]

y_pred: [[6.0000005]]

We can see that even with the small amount of training data, that the predicted result is suprisingly accurate.

## No comments:

## Post a Comment