A normal distribution of data means that most of the examples in a set of
data are close to the "average," while relatively few examples tend
to one extreme or the other.
Let's say you are writing a story about nutrition. You need to look at
people's typical daily calorie consumption. Like most data, the numbers for
people's typical consumption probably will turn out to be normally
distributed. That is, for most people, their consumption will be close to the
mean, while fewer people eat a lot more or a lot less than the mean.
When you think about it, that's just common sense. Not that many people are
getting by on a single serving of kelp and rice. Or on eight meals of steak
and milkshakes. Most people lie somewhere in between.
If you looked at normally distributed data on a graph, it would look
something like this:
The x-axis (the horizontal one) is the value in question... calories
consumed, dollars earned or crimes committed, for example. And the y-axis
(the vertical one) is the number of data points for each value on the x-axis...
in other words, the number of people who eat x calories, the number of
households that earn x dollars, or the number of cities with x
crimes committed.
Now, not all sets of data will have graphs that look this perfect. Some
will have relatively flat curves, others will be pretty steep. Sometimes the
mean will lean a little bit to one side or the other. But all normally
distributed data will have something like this same "bell curve"
shape.
The standard deviation is a statistic that tells you how tightly all
the various examples are clustered around the mean in a set of data. When the
examples are pretty tightly bunched together and the bell-shaped curve is
steep, the standard deviation is small. When the examples are spread apart and
the bell curve is relatively flat, that tells you you have a relatively large
standard deviation.
Computing the value of a standard deviation is complicated. But let me show
you graphically what a standard deviation represents...
One standard deviation away from the mean in either direction on the
horizontal axis (the red area on the above graph) accounts for somewhere
around 68 percent of the people in this group. Two standard deviations away
from the mean (the red and green areas) account for roughly 95 percent of the
people. And three standard deviations (the red, green and blue areas) account
for about 99 percent of the people.
If this curve were flatter and more spread out, the standard deviation
would have to be larger in order to account for those 68 percent or so of the
people. So that's why the standard deviation can tell you how spread out the
examples in a set are from the mean.
Why is this useful? Here's an example: If you are comparing test scores for
different schools, the standard deviation will tell you how diverse the test
scores are for each school.
Let's say Springfield Elementary has a higher mean test score than
Shelbyville Elementary. Your first reaction might be to say that the kids at
Springfield are smarter.
But a bigger standard deviation for one school tells you that there are
relatively more kids at that school scoring toward one extreme or the other.
By asking a few follow-up questions you might find that, say, Springfield's
mean was skewed up because the school district sends all of the gifted
education kids to Springfield. Or that Shelbyville's scores were dragged down
because students who recently have been "mainstreamed" from special
education classes have all been sent to Shelbyville.
In this way, looking at the standard deviation can help point you in the
right direction when asking why data is the way it is.
The standard deviation can also help you evaluate the worth of all those
so-called "studies" that seem to be released to the press everyday.
A large standard deviation in a study that claims to show a relationship
between eating Twinkies and killing politicians, for example, might tip you
off that the study's claims aren't all that trustworthy.
Here is one formula for computing the standard deviation.
A warning,
this is for math geeks only! Writers and others seeking only a basic
understanding of stats don't need to read any further in this chapter.
Remember, a decent calculator and stats program will calculate this for you...
Terms you'll need to know
x = one value in your set of data
(x) = the mean (average) of all values x in your set of data
n = the number of values x in your set of data
For each value x, subtract (x) from x, then multiply that value
by itself (otherwise known as determining the square of that value). Sum up
all those squared values. Then multiply that value by this value... 1/(n-1).
Then take the square root of the resulting value. That's the standard
deviation of your set of data. |