There has been a plethora of op-ed pieces lately, including important
pieces in both the Wall Street Journal and the New York Times, on the
believability — or lack thereof — of recent presidential polls. The
reason for the plethora is that the projections made by different
pollsters, made with high degrees of “confidence,” have frequently been
contradictory, and in the case of CNN-Gallup, fluctuating somewhat
wildly from day to day. The op-ed pieces all deplore the situation, but
none of them have suggested what the problem is with the polls. The
problem is that a lot of pollsters purport to be sampling, when in fact
they are only polling.
What’s the difference?
Well, you can’t project anything on the basis of one poll, but you
sometimes can project something on the basis of one sample.
Suppose that the pollster calls, randomly, 1,000 potential voters and
merely asks them one question: “Are you going to vote for Al Gore?”
Suppose 480 of these 1,000 potential voters say they are going to
vote for Al. The pollster now has one “data point” for Gore. He has
conducted one poll and he knows that 480 voters said they will vote for
Gore. But that’s all he knows. If he were to call another 1,000
potential voters, randomly, he has no basis for projecting how many of
that second group will say they will vote for Gore. If he calls a
second group and they say that 440 of them are going to vote for Gore he
shouldn’t be surprised. When a pollster calls 1,000 voters randomly,
480 for Gore is not all that different from 440 for Gore. After all, the
second group might have said 100, or 900.
Needless to say, the pollster had no basis for projecting, after the
first poll, that 48 million Americans (almost a hundred million
Americans voted in the last presidential election) would vote for Gore
and he would certainly have no basis after the second poll to proclaim
that 44 million American would vote for Gore and that 4 million
Americans had changed their mind since the first poll. But that is what
pollsters have been doing, and that is why there has been the plethora
of op-ed pieces like this.
You see, the pollster hasn’t sampled anything when he asks 1,000
voters, randomly selected, if they are going to vote for Gore, “yes” or
“no.” Whether it’s 480 “yes” or zero or a 1,000, after asking, the
pollster now has one and only one data point for Gore, and he can’t do
statistical analysis on one data point. Or two. It takes hundreds,
maybe thousands of data points to do that type of statistical analysis.
Suppose he did take 1,000 polls, of 1,000 voters, each, and plotted
the 1,000 data points he got. Suppose that in about two-thirds (or in
667) of the polls he got numbers for Gore voters between 440 and 480,
with the average being 460. That is, he gets a sort of bell-shaped
curve when he plots the results of his 1,000 polls. Now the pollster is
in a position to turn to the formulas in the back of his textbook and
project the percentage of voters who will vote for Gore in the general
election with some calculated “degree of confidence.” In case you
guessed, with these hypothetical numbers, gotten from 1,000 hypothetical
polls, Gore will probably get 46 percent of the vote and is unlikely to
get less than 44 percent or more than 48 percent. (Of course, Clinton
won last time with about 48 percent of the vote, and someone may win
this time with only 48 percent of the popular vote.)
But pollsters never take those 1,000 polls. They never get the data
they need to do that type of statistical analysis. They only take one
poll. For Gore they get one data point, one number, one percentage.
For Bush they get one data point, one number, one percentage.
Similarly, one data point for Nader and one data point for Buchanan.
The pollsters claim they “sample” the electorate with their one
poll. They don’t.
If they want to use statistical sampling theory on voters, they have
to actually sample something that is characteristic of the voter but
varies, statistically, across the electorate, and in conducting their
polls they don’t do that. To illustrate what they would have to do in
order to sample the electorate, they could, for example, give all 480 of
the Gore voters identified in the first polling group the standard “IQ”
test.
We know that an IQ score — irrespective of whether you believe that
the score has anything to do with intelligence or not — is essentially
an individual characteristic and essentially doesn’t change. For the
general population, a plot of the IQ scores versus the number scoring
those characteristic scores will look somewhat bell-shaped. The center
of the bell-curve will lie at an IQ score of 100 and the IQ scores of
two-thirds of the population will lie between an IQ score of 85 and an
IQ score of 115. (For a bell-curve the average is the median is the
mean. The average IQ is 100, and the so-called “standard deviation from
the mean” for the IQ bell-curve for the general population is about
15.) One-sixth of the general population will have an IQ score greater
than 115 and one-sixth will have an IQ score less than 85.
Now, it is conceivable that the distribution of IQ scores for Gore
voters may not be the same as for the general population. How can we
find out? Well our pollster has just sampled the Gore-voter IQ curve.
With almost 500 data points for his sample Gore-voter bell-curve, he can
now use the formulas in the back of his book to calculate a mean and a
standard deviation for the sample Gore-voter IQ curve and can then, on
the basis of that one sample, with some degree of confidence, project
what the total electorate Gore-voter IQ curve will look like.
If Bush has about the same number of votes in the poll, the pollster
can — after giving the Bush voters the standard IQ test — also
project, with about the same degree of confidence, what the total
electorate Bush-voter IQ curve will look like.
But if Buchanan only has 20 or so votes in the 1,000-vote poll, the
pollster can’t project, with the same degree of confidence, what the
total electorate Buchanan-voter IQ curve will look like. He hasn’t got a
big enough sample.
Now, the pollster still has no basis whatever for projecting how many
Gore voters there will be on Election Day, but he has a fair idea — as
if anyone cared — of what their IQ scores are.
As previously noted in these columns, in order to sample something,
there has to be something to sample. Something that is
characteristically constant for the individual, but also statistically
varies across the “population.” You can’t sample Gore voters on who
they’re going to vote for; all of them are going to vote for Gore. You
can’t sample that characteristic. However, there is variability in their
characteristic IQ scores, and age and height and weight and income etc.
You can sample all those things, and on the basis of one sample,
sometimes make projections about all Gore voters.
We didn’t have to sample IQ. We used it because it gives a nice
bell-shaped curve. We could have sampled their height. But the curve
obtained by plotting the number of Gore voters versus their height would
probably not turn out very bell-shaped. In fact, it would likely look
like a two-humped camel because women have a different height
distribution than men. It would be better, then, to make one height
profile for the 240 or so female Gore voters and a different height
profile for the 240 or so male Gore voters. You get the idea. On the
basis of one sample of 240 male (female) Gore voters, the pollster could
project a height profile for all male (female) Gore voters. Those voter
profiles may, or may not, be significantly different for Nader voters.
The pollster could even have sampled the 480 Gore voters’ weight, but
the weight profile of Gore voters would have been even more difficult to
make projections about on the basis of one sample. However, the weight
profile projected on the basis of one sample would have some advantage
in that it could be “reality” checked. All that the pollster has to do
is to see to it that when the voter pulls the lever in the voting booth
for Gore that he or she is simultaneously weighed and his or her weight
is printed on the Gore ballot.
The point of all this is that the pollsters are presently claiming to
be sampling something when they aren’t, really. Now if they asked each
group of voters how tall they thought the emperor of Japan was, they
could use the formulas in the back of their book to project separate
Gore-voter emperor-height and Bush-voter emperor-height opinion
profiles. Then, for a reality check, the pollster would have to get
that question on the ballot, which might be a problem.
In summary, when you already know that all the voters in your
subgroup are going to vote for Buchanan, for example, you can’t sample
their vote. There is no formula in the back of the book that will allow
the pollster to project on the basis of that one poll how many Buchanan
voters there will be out of 100 million voters on Election Day.
If proof were needed, CNN-Gallup has just conclusively demonstrated,
in back-to-back polls, that pollsters can’t project how many votes any
candidate will get on the basis of a single poll. If the pollsters want
to make those kind of projections, they will just have to take hundreds,
if not thousands, of polls and then use the appropriate statistical
analysis formulas they find in the back of their textbooks.
Or, if they can find someone who will pay them big bucks to take the
necessary sample, and they can get it on the ballot, they can project on
the basis of that one sample how tall the average Gore or Bush voter
will say on Election Day the emperor of Japan is. Dubya probably
wouldn’t pay for the sample, but Global Al might.
WATCH: 2 Venezuelan gang members released from custody
WND Staff