*[Continued from Understanding the normal distribution (Part 4)]*

**Working the statistics**

For my last trick, I'd like to show you how to use **Equation 37** to compute statistical parameters like mean, variance, and standard deviation. For each of these parameters, we'll be evaluating integrals like **Equation 37**, Each time, we're going to need fundamental definite integrals like these:

How do we know that these are the right results? Well, you can ask Mathcad, as I did, or consult a table of integrals, as I also did. But if you're curious to see how the derivations play out, you can find the secrets here:

The trick is to square the integral, convert to polar coordinates, and to integrate over *r* and *θ*. Slick.

By the way, while you're at the MIT site, you might want to browse their other videos, which are myriad.

Armed with the primitive integals of **Equation 38**, we can compute the mean values when *f(x)* is any power of *x*. For each case, we'll use the change of variables:

Which means:

**The integral of the distribution function: f(x) = 1**

This has to be the simplest function worth looking at. For this case,

**Equation 37**becomes:

From **Equation 38**, this is:

This is, of course, the same result that I wrote down in **Equation 25**. We chose the multiplying constant to force the integral of the distribution function to be unity, so it's hardly surprising that we get the result we were demanding. Still, it's sort of comforting to see that mathematics still works.

**The mean: f(x) = x**

We previously calculated the

*average*, or

*mean*, of a bunch of numbers from the familiar formula:

By analogy, the mean of the function *f(x)* = *x* is given by the formula:

Making the usual substitution, we get:

We can split this one into two integrals:

We've already established that the first integral in square brackets is equal to . What's more, according to **Equation 38** is equal to zero. So the mean of *x* is simply:

(47)

**The variance: f(x) = x2**

Finally, recall that the variance of a bunch of numbers was defined to be:

This time, our **f(x)** is equal to *x*2, and the defining equation becomes:

Note carefully that, as in the variance of a discrete set of measurements, we are calculating the expectation value of *x* as measured fromµ, not from *x* = 0. This makes sense. We're looking for the variation from the central peak, wherever that is. Making the usual substitutions, we get:

Using the identity in **Equation 38**, we get, finally:

Now we have the results for all three incarnations of *f(x)*. We found that:

So what is the *standard deviation*? Why, it's what it's always been: The square root of *V*. The value s, which we introduced just to give us a way of scaling the width of the central peak of the distribution is, in fact, σ.

Now, at last, you understand the reason for that seemingly unnecessary step of including a factor of ½ in the exponent of the distribution function. If we hadn't put it there, we wouldn't have ended up with σ as the standard deviation.

**A view backwards**

Well, it's been a long, slow slog, but we've made great strides in defining and understanding the normal distribution. Let's just briefly review what we've done.

I began this column by pointing out that noise is always going to be present in embedded systems, so we need to understand its nature so we can better deal with it. As a way of dipping our toes into the water, I suggested that we look at the most primitive kinds of random processes, which are physical "randomizers" like coin flips, pointer spins, and dice throws. In their most primitive forms, all three kinds of devices have uniform probability distributions, meaning that any one outcome is as likely as any other.

But when we began to look at the statistics of thrown dice, we found that the distributions were no longer uniform, but trended towards a continuous, bell-shaped curve. The only requirement is the usual rule for multiple dice, that the final result is the sum of all the values showing on the various dice.

After some judicious scaling, I developed **Figure 9**, which shows pretty convincingly that not only do the curves trend towards a limiting case of a continuous curve, but that the curve is, in fact, the normal distribution.

Without actually proving it, I suggested that the mathematical form for the distribution should be the one suggested by Sir Willard Gibbs: "It's the simplest one we can think of." From that simple conjecture, I tweaked the distribution with a multiplying constant and a couple of constant parameters, *µ* and *σ*.

Finally, I introduced the concept of an expectation value of some function *f*(*x*). Then I specialised *f*(*x*) to be the first three powers of *x*: *x*0=1, *x*1=*x*, and *x*2. We found that:

So the parameters introduced into the normal distribution for purposes of scaling turn out to be the mean and standard deviation.

Now that we've looked the normal distribution in the eye, are we done with it? Not by a long shot. For starters, you've probably heard terms like one-sigma, three-sigma, and even six-sigma. From the terms, you can probably guess that they relate to deviations of *σ*, 3*σ*. and 6S from the mean. The implications of these results are profound, because they relate to the reliability of any process affected by random noise.

Lastly, we've only looked at scalar functions *f(x)*. But in most real-world problems, the state of a system is described by multiple scalar values, which we can lump together into a *state vector*. For such cases, we need to extend the normal distribution, and in particular the variance, in scalar/matrix form. In the vector form of the normal distribution, the variance *V* becomes a matrix, famously known as the *covariance matrix*. This matrix plays an utterly critical role in to the Kalman filter, which is the whole point of this study.