Wednesday 13 October 2021

Processing .wav File Frames Through A Tensorflow Convolutional Neural Network

 I was recently developing a Machine Learning application that would predict classes of audio stored in .wav file.

The .wav file frames are easily processed using numpy to include functions such as the Fast Fourier Transform (FFT).

When it came to processing the frames in the neural network I was stumped by how to translate the frames into a Tensorflow dataset and despite my best efforts I kept getting the following error:

ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received:

Through much reading of blog posts and finally by trial and error I found the solution so here it is.

I've created this bare minimum example code so that it is easy to follow but it is equally easy to expand the input frame sizes from 3 points, in this example, to as many as you like because the tricky bit is the reshaping and dimension expansion to convert the numpy arrays into the required format for Tensorflow.

Here is the python code, which includes plenty of print statements so that you can visualize how the numpy arrays are modified:

import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
 
# Top level parameters
n_steps = 3
n_features = 1

X = np.array(         # Source array containing frames of data
  [[123], [234],
   [345], [456],
   [567]])
y = np.array([45678])

print(f'X.shape:\n{X.shape}')
print(f'y.shape:\n{y.shape}')
print(f'X:\n{X}')
print(f'y:\n{y}')

# Reshape and expand dimensions
X = X.reshape((X.shape[0], X.shape[1], n_features))
X = np.expand_dims(np.array(X), n_features)
y = np.expand_dims(np.array(y), n_features)

print(f'X.shape:\n{X.shape}')
print(f'X:\n{X}')

# Define model
model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(n_steps, n_features)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

# Fit model
train_dataset = tf.data.Dataset.from_tensor_slices((X, y))
model.fit(train_dataset, epochs=1000, verbose=0)

# Predict value
x_input = np.array([345])
x_input = x_input.reshape((1, n_steps, n_features))
y_pred = model.predict(x_input, verbose=0)
print(f'y_pred: {y_pred}')

Here are the results:

X.shape:
(5, 3)
y.shape:
(5,)
X:
[[1 2 3]
[2 3 4]
[3 4 5]
[4 5 6]
[5 6 7]]
y:
[4 5 6 7 8]
X.shape:
(5, 1, 3, 1)
X:
[[[[1]
[2]
[3]]] [[[2]
[3]
[4]]] [[[3]
[4]
[5]]] [[[4]
[5]
[6]]] [[[5]
[6]
[7]]]]
y_pred: [[6.0000005]]

We can see that even with the small amount of training data, that the predicted result is suprisingly accurate.


No comments:

Post a Comment