May 28, 2012 at 07:02 AM EDT
How big data predicted Eurovision — and offended Malta
When a data scientist crunched enough numbers to predict that Sweden would win this weekend's Eurovision Song Contest, he felt fairly confident. But he didn't expect that the biggest noise would be the inaccurate prediction that Malta would do well -- something he's now apologized for.

On Saturday night millions of people all over Europe — all over the world, in fact — tuned in to the annual Eurovision Song Contest, a cheesy televisual explosion that many worship as a festival of camp, cross-border craziness. In a three-and-a-half hour live broadcast from Azerbaijan (yes, really), viewers from 42 countries listened to 26 songs and voted for which one they liked the most.

Amid all that, you might think that guessing the winner would be hard. But one man predicted the result… sort of.

Meet Martin O’Leary, a glaciologist and data nerd who works at the University of Michigan. O’Leary, who calls himself a “recovering mathematician,” decided to use statistical analysis on Eurovision to try and understand which country would win:

Sweden’s going to win, unless it’s Malta, or maybe somebody else. If you average together the taste in pop music of all of Europe, you get a Hungarian. Don’t trust the scores on Saturday night, they’re just toying with your emotions.

And guess what? Sweden won!

You can read O’Leary’s entire series of posts to understand how he arrived at that conclusion, but here’s the quick version.

He did it by taking performing a Bayesian analysis on a wide range of previous Eurovision results, taking into account a few important factors with his model. First, the recognition that while Eurovision is a song competition, the results are not really based on the quality of song — although it can play a part. Then there’s the fact that there are semi-finals (held to whittle the number of contestants down) that allow some songs to be tested in public.

And then, most importantly, there’s the recognition that Eurovision is heavily influenced by transnational politics: which countries like which other countries plays a big part in voting. Entrants from the Balkans, for example, tend to trade votes with each other. Greece nearly always awards maximum points to Cyprus and vice versa. Big European powers like the U.K, France and Germany perform less well than smaller countries with lots of positive sentiment toward them.

But while O’Leary’s number-crunching enabled him to predict Sweden’s victory — and claim a victory for data modeling — it wasn’t infallible.

In particular, his prediction that Malta would be in the mixed seems to have caused some consternation. His guess was so exciting to the Maltese that it even made the newspapers, but in the end the country’s entry came in a measly 21st out of 26.

This was clearly upsetting to the Maltese, so he issued an apology:

This prediction caused quite a stir in Malta, with a story in the Times of Malta and over 16,000 pageviews from Malta1 on Saturday alone. Many took this as good evidence that Malta were going to do well in the contest, and some people were rather annoyed with me when they did not.

I’d like to apologise if I misled anyone. I didn’t expect anyone to take the model predictions particularly seriously, and if I had known, I would have included some more caveats and explanations of exactly what the model was predicting. Instead, I was fairly loose and jokey about the model results, and didn’t really talk about what they meant in real terms. Sorry, guys.

But will they forgive him?

Related research and analysis from GigaOM Pro: