Monthly Archives: February 2013

Social Media aren’t votes. Primary election and predictive models

Italian version

The case of the 2012 centre-left primary election was the first important italian event of collective participation in a political action that has found in SNSs a valuable sounding board. Thousands of twitters and posts on Facebook have turned the challenge for determine the coalition’s leadership in a great storytelling, where politicians and civil society, thanks to SNSs, have competed on equal terms to give a voice to their demands. Facebook, LinkedIn, Twitter, and other social media tools have changed the way many people communicate, getting the mass political arena denser, more complex and more participatory. This is proved, for instance, by the increasing comments on Twitter during the television debate among the five contenders, that was aired on SkyTg24: according to Blogmeter, the involvement of this primary election has reached record highs in Italy with 127.426 tweets about this subject from 30 minutes before the start of the television program to 30 minutes after the end.

If undoubtedly traditional and social media tools both have central role in shaping political sphere, more controversial is the use of big data extracted from Facebook, Twitter or Google to build a predictive model for the election. In fact, during the campaign for the Republican presidential nomination, many scholars argued about theoretical limits of predictive analysis, that can’t be overcame simply with more advanced technology. Despite a group of researchers of München University in occasion of the German federal election has conducted a content analysis of over 100,000 messages containing a reference to either a political party or a politician and it has proved that Twitter is widely used for the political deliberations, the employ of a statistical algorithms as a model for extracting big data has yet to become an accurate tool able to predict the election outcome (for a literacy review see also this recent article about United States presidential primary).

As Michel Wu explains, we need three requirements to validate any statistical model or algorithm that aims to become predictive – from Apple’s stock price to the percentage shares of the politician: 1. a model or algorithm that compute some predicted outcome (e.g. the leverage of the traffic that is spawned by the politician followers); 2. an independent measure of the outcome that the model is trying to predict (e.g. the opinion poll results) 3. a measure that compares and quantifies how closely the predicted outcome matches the measured independently outcome.

The main point is the second: having a measure that is really independent from the results obtained by the social media monitoring tools. It should be pretty obvious, but the risk is to fall into the fallacy of circular reasoning: that is, using likes and retweets for building a model that is extended to the whole sample. Hence the need to use a variable for comparison: in our case, the opinion polls.

Method

Influenced from the attractive and effective method used by Nate Silver to predict the outcomes ofUnited States presidential primary, we have decided to employ a different prospective to measure the impact of social media on the level of candidates’ popularity. The data extracted from social media are often used as proxies of electoral success of a politician or a party. Sometimes the research is based on quantitive analysis; other times, with the aid of sentiment analysis, it’s more focused on the area of the subjective opinions, emotions and human affects available in the social web in the forms of news, reviews, blogs, chats and even twitters. Our approach is different: we use the data extracted from social media for comparing and correcting the results of exit polls conducted by pollsters and media agencies to query voters about their voting selection. For this purpose, we have picked up 30 exit polls from October 1st to November 23th 2012. The data of exit polls don’t refer to their spreading, but to their effective carrying out. When in a day there are many exit polls, we have taken into account their average value. Data are openly available on the site sondaggipoliticoelettorali.it.

Thanks to collaboration with Blogmeter, we also have picked up – for the most important candidates (Pierluigi Bersani, Matteo Renzi and Nichi Vendola) and for the same period of time – all the main metrics relative to their official presence on Twitter and Facebook  (total number of fun/followers, new fun/followers for a day/, Facebook People Talking About, Twitter Mentions, Facebook and Twitter engagement).

After a set of first analysis focused on the use of all the social media indicators, we have built a predictive model for every candidate based on some correctives that we have computed from Facebook People Talking About and Twitter Mentions. The correctives are estimated on the basis of the temporal correlation between the time trend in consensus measured from the opinion polls  and the trend of the metrics we have taken in account. The model, based on a multiple linear regression (MLR), allows us to estimate the corrective values for obtaining, on the basis of a trend extracted from the polls, a prevision of the electoral result for every candidate.

As we can see from the analysis of the graphs, in the case of Pierluigi Bersani the spread between the data obtained from our prevision and the electoral result is 4.16 percentage points, whereas the spread derived from the data computed by the average of trend time of consensus in the opinion polls, in a period of time from October 1st to November 23th 2012, is 5.41 percentage points. Instead, in the case of Matteo Renzi, the spread between the data obtained from our metrics and the electoral result is 2.85 percentage points, whereas the spread computed by the average of trend time in the same period of time is 3.33 percentage points. At last, if we see the data relative to Nichi Vendola, we notice that our spread is equal to 0.09, whereas the spread between the data of election and the average of polls is 0.12.

We can infer that our model shows an excellent predictive ability – that is apparent mostly in the case of Nichi Vendola – compared with the historical trend of opinion polls in a period of time of about two months. The reason of this predictive ability is because our model is calibrated on a metrics that combine together indicators of engagement obtained both from Facebook and Twitter  and it provides different analysis for every single candidate. Showing data amount separately for every politician of the primary election, we find out, for instance, that the total volume of mention don’t’ always is  positive for the candidate but it assumes a different meaning depending on the politician and, probably, on the material contents. Our data show, for instance, that Vendola’s talking abouts led to a negative coefficient that, most likely,  change electoral preferences.

These research show us how is difficult to build predictive model that not considers the politician’s different communication strategies: in fact, every politician has own discursive style, ways of dialogue and ability to handle many digital platforms. And it should be considered that every candidate has a favored target, with socio-demographic or cultural variables that could be discordant. We should consider not only that the reality of the mass media isn’t statistically representative of the entire population, but also that the specific behaviors of every candidate using the most popular SNS’s  like Facebook and Twitter are to hard to generalize. All of this lead us to rethink the relationship between predictive models and social media, especially when we deal with social events so complex as the politics and the building of a public sphere, that are able to easily overlapping offline and online experiences.

The predictive reality and the social media

In this context, we should consider that SNSs aren’t a representative sample either of the entire population or the voters. It should be a commonplace, but it become a fact not so evident because of the overload of news and researches on every political fact, public debate e.g., that pretend to measure the political engagement rate, starting from the numbers of tweets or comments on politicians’ Facebook pages or those of the political party or the political television shows, e.g.

In fact, the researchers begin to understand that there are specific processes of autoselection that provide a more political powerful use of the net, from all those citizens that are previously involved or that are actives (Norris 2001), or that have the cultural and technological resources needful to practice an active citizenship (Bentivegna 2009). There aren’t a simple process of convergence between voters’ attitude and their offline and online political practices, not useful especially for predictive analysis. Is not the number of the politician’s followers or his level on engagement on a Facebook page that it let us predict the electoral choices. In the case of the 2012 centre-left primary election, Nichi Vendola had the most of fans online, about 250.000 on Twitter and more than 500.000 on Facebook and Matteo Renzi had the most high levels of engagement (see Vincenzo Cosenza’s data analysis), whereas Pierluigi Bersani, the candidate who have gotten the most of voters’ electoral preferences, had a lower amount of data.

This doesn’t mean that there is a clear difference between online and offline spheres. The ongoing incidence of the net in the people’s daily life broadens the political involvement in a continuum that includes the real experience, the media consumption and the practices of online engagement (Dahlgren 2009). But this doesn’t concern all the voters, because not all people actually act in the online life.

Networked public spheres and the interpretation of reality

We must understand how it make possible thinking about this dual form of  representativeness and what kind of relationship exists between these. This question could open the way to remarkable researches in predictive analytics on social networks. The value of likes of political candidates’ pages on Facebook (Giglietto 2012) is connected to politicians’ abilities of using the SNS strategically and having a conversation with users. Nevertheless, this ability can’t necessarily help predicting the election polls, because we must always take into account the weight of the mass media in shaping the public opinion. For this reason our research intends to rethink the use of social media in predictive key, not using this medium as the only source of data, but as a reality that we have to connect with a more traditional source of data as opinion polls.

We need to build a relationship between the different public spheres in which it’s possible to express a public opinion, as it happens in the mainstream media, and a opinion in public, as it happens in the social media. For this reason, we prefer talking about networked public spheres (Boccia Artieri 2012). In that regard, the social media can be considered as conversational correctives of opinion polls. It establishes a connection between the real choice among the politician candidates and the strategic use of SNSs as political communication tools that aim for increasing their visibility and the engagement – see the employ of hashtag for joining people around a issue. It’s a first corrective idea that has met a positive response in our model, that links together the opinion polls with the single candidate’s levels of engagement: Mentions on Twitter and Talking About on Facebook. For improving our model, we should add the sentiment analysis, a method that apply quantitative tools for measuring opinions, feelings and emotions from online textual contents. There are many remarkable examples about this, bur remains unsolved, especially in the political field, the question of irony: how it could treat, e.g., the contents on Twitter and Facebook generated by Marxisti per Tabacci? Moreover, it should also consider that there are other elements that might lead to increase the public sphere, as news websites and blogs, where people generates and shares opinions and tools of engagement (share, like, comments). In that sense, it’s considerable, for instance, that Pierluigi Bersani is the candidate who have obtained the most number of mentions on news websites and blogs. This research on the centre-left primary election in Italy is a first step to an integrated model that aim at reading a complex reality where online and offline must be considered as a continuum. Also in a predictive key.

Research team
Giovanni Boccia Artieri – Università di Urbino Carlo Bo
Manolo Farci – Università di Urbino Carlo Bo
Fabio Giglietto – Università di Urbino Carlo Bo
Luca Rossi – Università di Urbino Carlo Bo
Elisabetta Zurovac – Università di Urbino Carlo Bo

Bibliography

Boccia Artieri G. (2012), Stati di connessione. Pubblici, cittadini, consumatori nella (Social) Network Society, FrancoAngeli, Milano.
Bentivegna S. (2009), Disuguaglianze digitali. Le nuove forme di esclusione nella società dell’informazione, Laterza, Roma-Bari.
Dahlgren P. (2009), Media and Political Engagement. Citizens, Communication and Democracy, Cambridge University Press.
Norris P. (2001), Digital Divide: Civic Engagement, Information Poverty and the Internet Worldwide, Cambridge University Press, Cambridge MA.
Giglietto, F. (2012), If Likes Were Votes: An Empirical Study on the 2011 Italian Administrative Elections. International AAAI Conference on Weblogs and Social Media; Sixth International AAAI Conference on Weblogs and Social Media. Retrieved from http://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4577

Advertisements
Tagged , , , ,