Social Media and Election Outcomes
Research Question: How did social media, and Twitter in particular, affect recent election outcomes in the United States?
Data: The main sources of raw data were existing datasets of tweets, collected either randomly throughout the day or specifically around U.S. presidential elections. Twitter API and webscraping methods were used to then gather information on the millions of profiles of the users who had sent these tweets.
Additional variables came from existing datasets, such as county-level election data from the MIT Election Lab and Dave Leip’s Presidential Election Atlas, or data on county characteristics from the U.S. Census.
Methods: We were interested in the effect of social media on election outcomes across US counties or as reported in individual survey data. We used ordinary least squares (OLS) or two-stage least squares (2SLS) regressions.
The key empirical challenge was that the degree to which people use Twitter across the United States is not random but the result of differences in local characteristics. To isolate a causal effect of Twitter, we thus exploited quasi-random variation in Twitter’s early rollout. In particular, we made use of the fact that Twitter became popular at the South by Southwest (SXSW) festival in 2007.
More precisely, we used the number of people in a given county that follow SXSW and signed up to Twitter in March 2007 as an instrument for Twitter usage in 2SLS regressions. To get around the fact that these counties may still not be random, we control for the number of people that follow SXSW and signed up to Twitter in 2006, who are indistinguishable based on their profile information and the characteristics of their home counties from those who signed up in March 2007.
Challenges: Two main challenges arose. First, it took considerable resources to construct a reliable measure of Twitter usage across the United States from the originally obtained datasets on tweets. The challenge is that people tend to tweet a lot outside of their home locations during particular events or when visiting tourist sights. We thus had to collect additional information on the profiles of each Twitter user in the tweet-level datasets to arrive at a more reasonable proxy for Twitter usage across counties. We believe this will be useful to other researchers as well.
Second, another challenge was how to train machine learning algorithms to reliably predict the political slant of tweets. One aspect of our research was the question whether political content on Twitter has become more positive about Democrats over time. However, because many tweets are short and political tweets may contain sarcasm, it took us a considerable amount of time to find ways to reliably measure whether tweets were “pro-Democrat,” “pro-Republican,” or neutral.
Findings: We have three main findings: 1. Twitter likely hurt Donald Trump’s vote share in the 2016 and 2020 elections. 2. Twitter had no effect on previous presidential elections or Congressional elections. 3. The reason for Twitter’s pro-Democratic effect is unlikely to be driven solely by increases in turnout. Twitter’s political content also leans disproportionately left, and this increased in the run-up to the 2016 elections.