Tuesday, 29 January 2013

How To Run A Command On A Remote Unix Machine From Windows

I recently had to create a script under Windows that could kick off a task under Unix. The user of the Windows machine had no knowledge of Unix so I wanted a script or application that I could put on the desktop of the windows machine, that the user could just double click to execute.

I decided to use Strawberry Perl (http://strawberryperl.com/) to execute a script under Windows. After installing Strawberry Perl I installed the perl module Net::SSH2 using the command :
$cpan Net::SSH2
Then I executed the following script :


#!/usr/local/bin/perl

#use strict;
#use warnings;
#use 5.010;  #if you are using version 5.10 or higher

use Net::SSH2;

my $host = "ipaddress";
my $user_id = "username";
my $password = "password";

my ($len, $buf); # Buffer for reading SSH2 output

my $ssh2 = Net::SSH2->new();
$ssh2->connect($host) or die "Unable to connect $host $@ \n";
$ssh2->auth_password($user_id,$password) or die "Unable to login $@ \n";
my $chan = $ssh2->channel();

#Initialize and execute Unix command remotely
$linuxCommand = "ls > output.tmp";
$chan->exec($linuxCommand) or die "Unable to execute the command $@ \n";

sleep (5);


In order to use this script in your application all you need to do is cut and paste the code above into an empty file with the extension .pl (e.g. remoteCommand.pl). Having done this, modify these lines :
my $host = "ipaddress";
my $user_id = "username";
my $password = "password";
(Do not remove the quotation marks)
Then modify this line to change the commande to execute on the remote machine :
$linuxCommand = "ls > output.tmp";
(Put your command between the quotation marks).

Now all you need to do is double click the Perl file and it will execute the command on the remote machine.

Finally, if you have found this solution useful then please do hit the Google (+1) button so that others may be able to find it as well.

Friday, 18 January 2013

How to benchmark some C code (or assembly code, if you put the assembly in a C callable function)


I often have to benchmark DSP algorithms and while code profiling tools are excellent for large applications, DSP algorithms are often most easily benchmarked using simple timing functions.

Here are two examples that I interchange, depending on the underlying device and/or operating system. These examples are very portable but each solution has its own benefits and disadvantages and the two examples will almost always give different results when executed on the same platform/OS.

This first example uses the clock() function :


#include <stdio.h>
#include <time.h>

#define LOOP_COUNT 1000000000L

void main (void)
{
long i = 0L, j = 0L;

clock_t t_overhead, t_start, t_stop;

/* Calculate the timing overhead */
t_start = clock (); t_stop = clock (); t_overhead = t_stop - t_start;

t_start = clock ();
for (i = 0; i < LOOP_COUNT; i++) { j++; } // Put your code in here
t_stop = clock ();

printf ("Time : %lf seconds\n\n", (((double)(t_stop - t_start - t_overhead)) / CLOCKS_PER_SEC));

}


This example uses the ftime() function :


#include <stdio.h>
#include <sys/timeb.h>

#define LOOP_COUNT 1000000000L

void main (void)
{
long i = 0L, j = 0L;

struct timeb t_current;
long t_overhead, t_start, t_stop;

/* Calculate the timing overhead */
ftime(&t_current); t_start = (long)((t_current.time*1000L)+(long)t_current.millitm);
ftime(&t_current); t_stop = (long)((t_current.time*1000L)+(long)t_current.millitm);
t_overhead = t_stop - t_start;

ftime(&t_current); t_start = (long)((t_current.time*1000L)+(long)t_current.millitm);
for (i = 0; i < LOOP_COUNT; i++) { j++; } // Put your code in here
ftime(&t_current); t_stop = (long)((t_current.time*1000L)+(long)t_current.millitm);

printf ("Time : %lf seconds\n\n", (((double)(t_stop - t_start - t_overhead)) / 1000.));

}



Things to be aware of when using functions like this are :
While it is entirely possible to print times to the nearest ms I suspect that in many applications the values returned by the timing functions are not actually that accurate because underlying everything are a bunch of C functions, libraries and an OS.
When calling your processing function on a multi-tasking OS you have little or no control about what else the OS is doing in the background.
The clock function returns values that depend on the implementation
On some processors it returns 0 the first time it is called and then starts incrementing
On some processors it starts counting when the process starts
On some processors it counts in milliseconds
On some embedded processors (e.g. some DSPs I have programmed) clock () does not return the time in seconds but cycles of the processor.
You should always use CLOCKS_PER_SEC to turn this value into real time
The clock() function is more commonly implemented on embedded processors than the ftime() function.
If the function to be benchmarked is very short then it may be necessary to call it a number of times to average out the effects of the OS and the accuracy of the clock functions.



Friday, 11 January 2013

Raspberry Pi Hints And Tips


Actually, none of this is Pi specific but I have found that some friends have bought Pis but not had any previous experience with Linux so here are some notes I wrote for them.

To update the OS and applications and then clean up the install :
sudo apt-get update; sudo apt-get /y upgrade; sudo apt-get autoremove

To find out how much disk and memory space is used/available :
df -h; free -m

How to send an email from the command line :
echo -e "Subject: Test Mail\r\n\r\nThis is a test mail" |msmtp --debug --from=default -t destination@EmailAddress

How to add a new user :
sudo adduser username
sudo visudo
Add following line under the "root ALL=(ALL) ALL" Line:
username ALL=(ALL:ALL) NOPASSWD: ALL

You should delete the default user (user:raspberry, passwd:pi). How to delete a user :
Ensure you have backed up any important information then enter :
sudo deluser --remove-home username

How to change the hostname (e.g. from raspberrypi to donkey) :
sudo nano /etc/hostname
Change raspberrypi to donkey
sudo hostname donkey
nano /etc/hosts
change :
127.0.1.1       raspberrypi
to
127.0.1.1       donkey

Install Apache web server :
sudo apt-get install apache2 php5 libapache2-mod-php5

Tuesday, 8 January 2013

Introduction To Digital Signal Processing


Here is a quick introduction to DSP that I wrote many years ago, as part of a course : http://www.numerix-dsp.com/tutorials/DSP/DSPIntroduction.pdf.

Here is a quick introduction to frequency domain processing that I wrote around the same time : http://www.numerix-dsp.com/tutorials/DSP/FrequencyDomainProcessing.pdf.

Both of these are fairly high level so don't be disappointed if you are expecting the answer to life, the universe and everything.
That's 42, by the way ;-)

Sunday, 6 January 2013

Let's Do Some Digital Signal Processing (Part 2) - One Pole Filter


One-pole filters are very useful and efficient techniques for filtering signals. They use a feedback value to implement a low-pass filter. They can also be used to design efficient peak-hold filters in the time-domain and are also used in power spectrum analyzers and can help extract features of the signals that are being analyzed.

The structure of the one-pole filter is shown in the following diagram :
And the following equation :
y(n) = x(n) - a * y(n-1),          0 < a < 1.0

The single coefficient defines the gain and time-constant of the filter as follows :

Time constant :
Gain :
Here is the code for the one-pole filter :

#include <stdio.h>

#define COUNT 8 /* Number of samples to process */
#define ALPHA 0.6 /* Feedback value */

float impulseSrc[] = {
 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
 };
float stepSrc[] = {
 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0
 };
float Dst[] = {
 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
 };

void one_pole (const float *, float *, const float, float *, const int count);

void main (void)

{
 float State = 0.0; /* Reset the state value */
 one_pole (impulseSrc, Dst, 0.6, &State, COUNT);
 printf ("One Pole Impulse = %f, %f, %f, %f, %f, %f, %f, %f\n", Dst[0], Dst[1], Dst[2], Dst[3], Dst[4], Dst[5], Dst[6], Dst[7]);
 State = 0.0; /* Reset the state value */
 one_pole (stepSrc, Dst, 0.6, &State, COUNT);
 printf ("One Pole Step = %f, %f, %f, %f, %f, %f, %f, %f\n", Dst[0], Dst[1], Dst[2], Dst[3], Dst[4], Dst[5], Dst[6], Dst[7]);
}


void one_pole (const float *pSrc,
float *pDst,
const float Alpha,
float *pState,
const int VectorLength)
{
    float LocalState = *pState; /* Use a local variable to make the code more efficient */
    int   i;
    for (i = 0; i < VectorLength; i++)
    {
        LocalState = *pSrc++ + (LocalState * Alpha);
        *pDst++ = LocalState;
    }
    *pState = LocalState;
}

And here is the output :
One Pole Impulse = 1.000000, 0.600000, 0.360000, 0.216000, 0.129600, 0.077760, 0.046656, 0.027994
One Pole Step = 1.000000, 1.600000, 1.960000, 2.176000, 2.305600, 2.383360, 2.430016, 2.458010

This implementation shows one problem with feedback filters and that is the gain, as shown above with the step response. In an ideal scenario (and definitely when using fixed point devices) then the magnitude of the output should be the same as the magnitude of the input (i.e. 0 -> 1). In this example it can be seen that the output value (i.e. the gain) tends to 2.5 (i.e. 1/(1-Alpha)). To counter this we can rewrite the one-pole filter equation as follows :
y(n) = x(n) * (1- a) + a * y(n-1) 0 < a < 1.0

And the new flow diagram looks like :
Now when we rewrite the code, it looks like this :

void one_pole (float *pSrc,
float *pDst,
float Alpha,
         float *pState,
int VectorLength)
{
    float OneMinusAlpha;
    float LocalState = *pState;
    int   i;
    OneMinusAlpha = 1.0 - Alpha;

    for (i = 0; i < VectorLength; i++)
    {
        LocalState = ((LocalState * Alpha) + (*pSrc++ * OneMinusAlpha);
        *pDst++ = LocalState;
    }

    *pState = LocalState;
}

And now the gain is normalized.
As mentioned at the start of this blog, scaling is a big issue with feedback filters and the more feedback coefficients the bigger the problem. Solving the problem for one-pole filters is a trivial task, doing the same more biquad IIR filters is a much more complex task - maybe something for a future blog.



Friday, 4 January 2013

What's All This dBm Stuff Anyhow ? - With suitable homage to Bob Pease :-)


dBm is a logarithmic measurement of power that is used in telecommunications as a more convenient form of power measurement than the linear Watt (W) that most people are familiar with.

dB is a measurement value that represents the ratio of two powers :
dBm is the power ratio in decibels (dB) between the measured power and the reference power of one milliwatt (mW).

In a telecoms systems the transmit power is driving the channel impedence, which in audio telephony is typically 600 ohm, while in wireless applications 50 ohms is the most common.
Note that while dB is a relative measurement between the two powers (e.g. input and output), dBm is absolute because it is measured relative (if that makes sense) to a fixed reference power of 1 mW.

Being a logarithmic scale, dBm is really handy for calculating powers through a network so for example in audio telephony, a -13 dBm Tx power would be seen as -55 dBm at the receiver if the channel attenuation were 42 dB.

The following table shows some typical dBm / mW values used in mobile comms.

dBm mW
36 4000 (i.e. 4W) Typical macrocell basestation Tx power
30 1000
26 400 Typical mobile phone or indoor office basestation Tx power
20 100
16 40 Typical residential small basestation Tx power
10 10
6 4
3 2
1 1
-3 0.5
-10 0.1 Typical maximum received signal power
-60 10^-6 Typical minimum receive power to be able to detect a signal (in mobile comms, without a front end AGC)

The key things to learn from the table are the following :
The absolute power doubles for a 3 dB increase and halves for a 3 dB drop.
The absolute power is multiplied by 10 for a 10 dB increase. So a 20 dB increase is equivalent to absolute power being multiplied by 100.
Given that Power = (V^2 / R) then the voltage increases by sqrt(2) for a 3 dB increase. Hence a 6 dB increase is required in order for the voltage (or current for which power = I^2.R) to double.

If you would like to read more on the subject Rohde And Schwartz have written a very good applications note called dB or not dB? Everything you ever wanted to know about decibels but were afraid to ask.

My Favourite Book : Playing the Enemy: Nelson Mandela and the Game That Made a Nation, by John Carlin


I was discussing "favourites" with a friend, at the bar, the other night and the subject of books came up.

Playing The Enemy is a most incredible story and says a lot about the characters of the individuals concerned; Nelson Mandela, Francois Pineaar and many others.

If you have no interest in rugby or even sport then don't be put off because this book describes an amazing period of world history.

Thursday, 3 January 2013

What Is Wrong With This Piece Of Pseudo C Code ?


The idea for this blog sprang out of me doing a spring clean on my old laptop and finding all sorts of old training course and applications notes that I had written. This clean up got me thinking about some of the questions I had answered over the years and as I have been remembering them I have put them up.

Here is another question that came up regularly when I was supporting engineers programming DSPs in C. Frequently the Engineer would phone up and report that the hardware kept crashing. While this was entirely possible, my first place to start would often be to take a look at the project source code and the most common cause of the problem is shown here.

So what is wrong with this bit of code and why will it crash an application ?

.
.
int function (int param; .....)
{
int array [2000];
.
.

The answer is that the array is placed on the stack and while many compilers (Microsoft, Borland, GCC and the like) leave as much space between the stack and the heap as possible, this is not always possible on an embedded processor, with limited memory available, so the end result is that the stack starts to encroach on the heap and finally disaster strikes.

Most embedded linkers include a linker command file that allows the programmer to specify the position and size for the code, the stack, the heap and all the other memory blocks. While it is entirely possible to specify a larger stack space in the linker command file, this always seems to me like an unnecessary hack that is bound to end in tears because at some point in the future someone else is going to have to maintain the code and will not be looking for some hidden parameters in the LCF.

Far better, in my mind, is to allocate all arrays at the top level of the application, using malloc () and that way, if the application runs out of memory you will get an error message that can be easily detected and handled.

Having malloc'd the array at the top level then pointers can be passed to the sub-functions. This not only avoids the stack/heap problem discussed above but also makes the code re-entrant and a lot more portable.

A final word on compilers for Windows/Linux etc. While the latest Microsoft linker (And I assume GCC etc are the same) allocates 1 MByte for the stack so there is way more to play with. I still think allocating arrays onto the stack is bad form so I avoid doing it here as well.

Philosophy


Let's Do Some Digital Signal Processing (Part 1) - Dot Product

Whenever I am asked to program a new DSP processor, or even write DSP algorithms on general purpose processors, (there isn't much difference these days, from a programming perspective) I always start with the basics. I install the development tools and the simulator then start hacking two simple algorithms :
Dot Product
One-Pole Filter
The nice thing about these functions is that they are really simple but are great for exercising the optimization capabilities of the C compiler. For this blog we'll look at the dot product, we'll come to the one-pole filter later.
The dot product performs the following mathematical operation on two data sets :
In otherwise, it successively multiplies each of the values in one array by each of the values in the other array, as follows :
Here is the code we will compile. This code should run on pretty much any processor with a C compiler.


#include <stdio.h>

#define COUNT 48 /* The filter size must be modulo 8 */
#define USE_ARRAY_INDEX 1 /* Set to '1' to use array indexing or '0' to use pointers */

float a[] = {
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0,
11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0,
21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0,
31.0, 32.0, 33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0,
41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0
};
float b[] = {
41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0, 49.0, 50.0,
51.0, 52.0, 53.0, 54.0, 55.0, 56.0, 57.0, 58.0, 59.0, 60.0,
61.0, 62.0, 63.0, 64.0, 65.0, 66.0, 67.0, 68.0, 69.0, 70.0,
71.0, 72.0, 73.0, 74.0, 75.0, 76.0, 77.0, 78.0, 79.0, 80.0,
81.0, 82.0, 83.0, 84.0, 85.0, 86.0, 87.0, 88.0
};

float dotp (const float *, const float *, const int count);

void main (void)

{
float sum;
sum = dotp (a, b, COUNT);
printf ("Dot product = %f\n", sum);
}


float dotp (const float *x, const float *y, const int count)
{
int i;
float sum = 0.0;
for (i = 0; i < count; i++)
{
#if USE_ARRAY_INDEX
sum += x[i] * y[i];
#else
sum += *x++ * *y++;
#endif
}
return sum;
}

If you try to compile this code on an embedded device then the compiler might not support stdio so you will have to remove the #include and the printf lines.

You will notice that within the code there is some conditional compilation (#if USE_ARRAY_INDEX) that selects whether to use arrays and indexes or pointers. When I was taught to program C, as an undergraduate, I was told unambiguously "always use pointers because they are quicker than arrays and indices" this isn't always the case and I might look into why this is the case in a future blog.

For a simple function like the dot-product it is safe to assume that most modern compilers will optimize either form of this C code as efficiently as if we had written it in assembly code but this function is the starting point for creating more complex functions.

So if most compilers will give the same level of optimization with both pointers and arrays/indices then why do I include both ? I have found that it is a good habit to get into because when working with more complex functions it becomes increasingly difficult to predict which method will give the best results. On modern VLIW/SIMD machines I have often been surprised to find that one method may give near 100% optimized code but the other is very slow. So for very little effort on my behalf I always try both methods and see which works best.

If all goes well, when you compile and run this example then you should see the following output :
Dot product = 85064.000000

We will look at how to benchmark algorithms in a later blog.

Wednesday, 2 January 2013

I Have Downloaded xyz FFT (Fourier) Transform Algorithm But How Do I Know I Have Implemented It Correctly And There Are No Bugs ?


Implementing an FFT on any RISC/VLIW architecture is a relatively simple task but getting it optimized for a particular architecture can be really time consuming which is why most of the time the first place I look is the device manufacturers web site to see what their Apps team have come up with.

When ever I use a new FFT function for the first time I always verify the results of processing both a sine and cosine wave against what is expected from the underlying maths.

If you perform a DFT on a cosine wave then the results should be purely real, as shown in the following diagram. The top graph shows 32 samples of a cosine, with the middle and bottom graphs showing the real and imaginary components of the frequency domain output from the FFT. Notice that there is no energy in the imaginary component and the real energy is located in two impulses representing the magnitudes of the two phasors that are the components of the original cosine waveform.


Similarly if you do the same for a cosine wave then the results should be purely imaginary, as follows :

Note the magnitude of the two phasors that form the sine wave.

If this is the first time you have seen graphs like this then can you explain the magnitude (16) of all four impulses in the frequency domain ? If not then you might like to check the following blog entry : http://realgonegeek.blogspot.com/2012/12/i-am-performing-fast-fourier-transform.html.

The above input datasets use multiple cycles of sine and cosine waves that have 8 points in their fundamental. This has been chosen because of the ease of creating the wave forms with just the values 0, 0.7071, 1 and their negatives. Here are the base lookup tables that you can use to create the larger input waveforms by simply duplicating them the required number of times :

int sin8[] = {Q28(0.0), Q28(0.70710678118654752440084436210485), Q28(1.0), Q28(0.70710678118654752440084436210485), Q28(0.0), Q28(-0.70710678118654752440084436210485), Q28(-1.0), Q28(-0.70710678118654752440084436210485)};
int cos8[] = {Q28(1.0), Q28(0.70710678118654752440084436210485), Q28(0.0), Q28(-0.70710678118654752440084436210485), Q28(-1.0), Q28(-0.70710678118654752440084436210485), Q28(0.0), Q28(0.70710678118654752440084436210485)};

Note the Qnn macros are used to convert to a Q Format appropriate for a fixed point device, these are not required on a floating point device. If you are not familiar with the Q Format for fixed point numbers then you might like to checkout this blog entry : http://blog.numerix-dsp.com/2015/01/introduction-to-q-format-number-system.html.

If you see anything different to the above then the following points should help find the problem.

Has the code implemented the bit-reversed addressing functionality required in the FFT ?
If so, is this in the right place ?
The rule of thumb is if it is decimation-in-time then bit-reverse in time. If DIF then bit-reverse in the frequency domain.
Has the code scaled the results to allow for the 1/N scaling in the DFT ?
Have you used the forward or reverse DFT function ?
These can be interchanged depending on whether you use e^-jwt or e^jwt to represent the forward or inverse operation. There is no right or wrong answer to which to use. At it is most simple, it depends on whether you and/or the person who wrote the code is an engineer or a mathmatician. If this sounds weird it is a bit like whether to use i or j for imaginary numbers. I'm an engineer so i always refers to current ;-)
Is this code written for a traditional DSP architecture ? If so, does it require that the input and output arrays (including twiddle factors) are correctly aligned on the correct modulo memory address.
Has the twiddle factor (FFT coefficient) look-up-table been implemented correctly ?
This usually takes the form of a 3/4 sine look-up-table.

These are the most common reasons for FFTs not working.
Once you get your FFT working for the simple sine and cosine waveforms then barring fixed-point overflow issues then in general it will work for any input data sequence.

Tuesday, 1 January 2013

Quick Tip For Downloading Files (e.g. Music) To An Android Device


Now that everything is cloud based I find that I don't need to use a USB cable to backup contact details etc and sharing most files across devices is easily supported with Google Drive, Dropbox etc. Now-a-days I find that the only time I ever used USB was to synch music and large files.

I looked into using one of the cloud based music services to store my music but since I have a NAS on the home network and the laptop and phone are always connected I thought there must be a better solution.

For a short while I did use ES File Explorer to copy the files off the NAS onto the Droid but I found it fiddly especially on my phone.

I use FileZilla (http://filezilla-project.org/) a lot for work so I recently installed FTPServer by Andreas Liebig (https://play.google.com/store/apps/details?id=lutey.FTPServer) onto all of my Droids and I configured the FileZilla client on my laptop for each device.

Now any time I want to copy files to my Driod I just open the FTPServer (one click on the Droid), open FileZilla and connect to the phone (two clicks). Now I can drag and drop to my heart's content.

This combination is excellent because it is dead easy and super fast to copy music across. As soon as the FTP server is closed Android even re-scans the SDCard so the music is immediately available in my Media Player - I use N7 (https://play.google.com/store/apps/details?id=com.n7mobile.nplayer) by the way and love it.

What's The Best Chilli Recipe In The World ?


Something slightly different to start the New Year.

Nothing to do with my geekery interests but a while ago I wanted to surprise my Wife, for dinner, and googled for a chilli recipe. This one came up first so I tried it (http://www.bbcgoodfood.com/recipes/3228/chilli-con-carne). It is dead easy and absolutely stunning.

The comments at the bottom of the page are completely true : adding a small piece of dark chocolate really enhances the flavour and also it is even better the next day, if you leave it in the fridge and warm it up in the microwave.

My favourite topping is cheese but soured cream and Guacamole are also great options.

Gents : if your wife or partner like Chocolate, and who's doesn't ;-), then try this recipe for the 5 minute chocolate mug cake that I first heard about on the Chris Evans radio show (http://allrecipes.co.uk/recipe/7622/5-minute-chocolate-mug-cake.aspx).
A drop of whiskey makes it taste even better ;-)

OK so that is about the only thing I know how to cook, apart from Toad-In-The-Hole. So back to talking tech now.