There has been a plethora of op-ed pieces lately, including important

pieces in both the Wall Street Journal and the New York Times, on the

believability -- or lack thereof -- of recent presidential polls. The

reason for the plethora is that the projections made by different

pollsters, made with high degrees of "confidence," have frequently been

contradictory, and in the case of CNN-Gallup, fluctuating somewhat

wildly from day to day. The op-ed pieces all deplore the situation, but

none of them have suggested what the problem is with the polls. The

problem is that a lot of pollsters purport to be sampling, when in fact

they are only polling.

What's the difference?

Advertisement - story continues below

Well, you can't project anything on the basis of one poll, but you

sometimes can project something on the basis of one sample.

Suppose that the pollster calls, randomly, 1,000 potential voters and

merely asks them one question: "Are you going to vote for Al Gore?"

**TRENDING:** In just 4 months, Biden manages to highlight how competent Trump was over previous 4 years

Suppose 480 of these 1,000 potential voters say they are going to

vote for Al. The pollster now has one "data point" for Gore. He has

conducted one poll and he knows that 480 voters said they will vote for

Gore. But that's all he knows. If he were to call another 1,000

potential voters, randomly, he has no basis for projecting how many of

that second group will say they will vote for Gore. If he calls a

second group and they say that 440 of them are going to vote for Gore he

shouldn't be surprised. When a pollster calls 1,000 voters randomly,

480 for Gore is not all that different from 440 for Gore. After all, the

second group might have said 100, or 900.

Needless to say, the pollster had no basis for projecting, after the

first poll, that 48 million Americans (almost a hundred million

Americans voted in the last presidential election) would vote for Gore

and he would certainly have no basis after the second poll to proclaim

that 44 million American would vote for Gore and that 4 million

Americans had changed their mind since the first poll. But that is what

pollsters have been doing, and that is why there has been the plethora

of op-ed pieces like this.

Advertisement - story continues below

You see, the pollster hasn't sampled anything when he asks 1,000

voters, randomly selected, if they are going to vote for Gore, "yes" or

"no." Whether it's 480 "yes" or zero or a 1,000, after asking, the

pollster now has one and only one data point for Gore, and he can't do

statistical analysis on one data point. Or two. It takes hundreds,

maybe thousands of data points to do that type of statistical analysis.

Suppose he did take 1,000 polls, of 1,000 voters, each, and plotted

the 1,000 data points he got. Suppose that in about two-thirds (or in

667) of the polls he got numbers for Gore voters between 440 and 480,

with the average being 460. That is, he gets a sort of bell-shaped

curve when he plots the results of his 1,000 polls. Now the pollster is

in a position to turn to the formulas in the back of his textbook and

project the percentage of voters who will vote for Gore in the general

election with some calculated "degree of confidence." In case you

guessed, with these hypothetical numbers, gotten from 1,000 hypothetical

polls, Gore will probably get 46 percent of the vote and is unlikely to

get less than 44 percent or more than 48 percent. (Of course, Clinton

won last time with about 48 percent of the vote, and someone may win

this time with only 48 percent of the popular vote.)

But pollsters never take those 1,000 polls. They never get the data

they need to do that type of statistical analysis. They only take one

poll. For Gore they get one data point, one number, one percentage.

For Bush they get one data point, one number, one percentage.

Similarly, one data point for Nader and one data point for Buchanan.

The pollsters claim they "sample" the electorate with their one

poll. They don't.

If they want to use statistical sampling theory on voters, they have

to actually sample something that is characteristic of the voter but

varies, statistically, across the electorate, and in conducting their

polls they don't do that. To illustrate what they would have to do in

order to sample the electorate, they could, for example, give all 480 of

the Gore voters identified in the first polling group the standard "IQ"

test.

Advertisement - story continues below

We know that an IQ score -- irrespective of whether you believe that

the score has anything to do with intelligence or not -- is essentially

an individual characteristic and essentially doesn't change. For the

general population, a plot of the IQ scores versus the number scoring

those characteristic scores will look somewhat bell-shaped. The center

of the bell-curve will lie at an IQ score of 100 and the IQ scores of

two-thirds of the population will lie between an IQ score of 85 and an

IQ score of 115. (For a bell-curve the average is the median is the

mean. The average IQ is 100, and the so-called "standard deviation from

the mean" for the IQ bell-curve for the general population is about

15.) One-sixth of the general population will have an IQ score greater

than 115 and one-sixth will have an IQ score less than 85.

Now, it is conceivable that the distribution of IQ scores for Gore

voters may not be the same as for the general population. How can we

find out? Well our pollster has just sampled the Gore-voter IQ curve.

With almost 500 data points for his sample Gore-voter bell-curve, he can

now use the formulas in the back of his book to calculate a mean and a

standard deviation for the sample Gore-voter IQ curve and can then, on

the basis of that one sample, with some degree of confidence, project

what the total electorate Gore-voter IQ curve will look like.

If Bush has about the same number of votes in the poll, the pollster

can -- after giving the Bush voters the standard IQ test -- also

project, with about the same degree of confidence, what the total

electorate Bush-voter IQ curve will look like.

But if Buchanan only has 20 or so votes in the 1,000-vote poll, the

pollster can't project, with the same degree of confidence, what the

total electorate Buchanan-voter IQ curve will look like. He hasn't got a

big enough sample.

Advertisement - story continues below

Now, the pollster still has no basis whatever for projecting how many

Gore voters there will be on Election Day, but he has a fair idea -- as

if anyone cared -- of what their IQ scores are.

As previously noted in these columns, in order to sample something,

there has to be something to sample. Something that is

characteristically constant for the individual, but also statistically

varies across the "population." You can't sample Gore voters on who

they're going to vote for; all of them are going to vote for Gore. You

can't sample that characteristic. However, there is variability in their

characteristic IQ scores, and age and height and weight and income etc.

You can sample all those things, and on the basis of one sample,

sometimes make projections about all Gore voters.

We didn't have to sample IQ. We used it because it gives a nice

bell-shaped curve. We could have sampled their height. But the curve

obtained by plotting the number of Gore voters versus their height would

probably not turn out very bell-shaped. In fact, it would likely look

like a two-humped camel because women have a different height

distribution than men. It would be better, then, to make one height

profile for the 240 or so female Gore voters and a different height

profile for the 240 or so male Gore voters. You get the idea. On the

basis of one sample of 240 male (female) Gore voters, the pollster could

project a height profile for all male (female) Gore voters. Those voter

profiles may, or may not, be significantly different for Nader voters.

The pollster could even have sampled the 480 Gore voters' weight, but

the weight profile of Gore voters would have been even more difficult to

make projections about on the basis of one sample. However, the weight

profile projected on the basis of one sample would have some advantage

in that it could be "reality" checked. All that the pollster has to do

is to see to it that when the voter pulls the lever in the voting booth

for Gore that he or she is simultaneously weighed and his or her weight

is printed on the Gore ballot.

Advertisement - story continues below

The point of all this is that the pollsters are presently claiming to

be sampling something when they aren't, really. Now if they asked each

group of voters how tall they thought the emperor of Japan was, they

could use the formulas in the back of their book to project separate

Gore-voter emperor-height and Bush-voter emperor-height opinion

profiles. Then, for a reality check, the pollster would have to get

that question on the ballot, which might be a problem.

In summary, when you already know that all the voters in your

subgroup are going to vote for Buchanan, for example, you can't sample

their vote. There is no formula in the back of the book that will allow

the pollster to project on the basis of that one poll how many Buchanan

voters there will be out of 100 million voters on Election Day.

If proof were needed, CNN-Gallup has just conclusively demonstrated,

in back-to-back polls, that pollsters can't project how many votes any

candidate will get on the basis of a single poll. If the pollsters want

to make those kind of projections, they will just have to take hundreds,

if not thousands, of polls and then use the appropriate statistical

analysis formulas they find in the back of their textbooks.

Or, if they can find someone who will pay them big bucks to take the

necessary sample, and they can get it on the ballot, they can project on

the basis of that one sample how tall the average Gore or Bush voter

will say on Election Day the emperor of Japan is. Dubya probably

wouldn't pay for the sample, but Global Al might.