![picture of someone typing on laptop](/sites/g/files/toruqf261/files/styles/8x3_1440w_540h/public/2023-04/laptop_john-schnobrich-yfbyvpeghfq-unsplash.jpg?itok=O0xwYfM_)
Characteristics of Political Discourse in Social Media
Research Questions: Is the structure of dialogue in political subreddits consistent with the structure of political discourse suggested by political theory? And if the theoretical structure is not supported, what is the underlying structure of political discourse in political subreddits? Do contextual features of subreddits such as the available reaction mechanisms (e.g., upvote and downvote versus only upvote) alter the structure and/or dynamics of political discourse
Data: We used the open Pushift API to collect 155 million comments from Reddit. Furthermore, we manually labeled over 4500 user comments to train state of the art Natural Language Processing pretrained models.
Methods: We trained a language model to predict elements of three political discourse theories in order to classify our corpus comprising 155 million documents. Then, we created confirmatory factor analysis models to investigate the influence of different reaction mechanisms on the prevalence of the three political discourse types.
Challenges: In order to store the data and handle the large size of data on Google Cloud, we had to set up an BigQuery data infrastructure (a SQL-like database). Despite this, it was still hard to query the data, since we had restrictions imposed by the cloud infrastructure. In order to run our models, we split data in multiple batches, both for the language classification task and to apply confirmatory factor analysis. Any updates of the data required the running of scripts that took multiple hours, while a full analysis of a case study took multiple days. Nonetheless, the cloud infrastructure allowed us to run scripts uninterrupted, which would have not been possible on a local machine, or a generic programming service such as google drive.
Findings: Our study introduced a framework for evaluating the empirical manifestation of political theories on social media. We performed a comparative analysis of three prominent theories of political discourse, i.e., deliberative, civic, and demagogic, producing a set of constitutive rhetoric and linguistic elements that we referred to as a “minimal conceptualization” of these three discourse theories. We then used multi-label classification to explore the extent to which these rhetorical elements “manifested” themselves in 155 million user comments across 55 political message boards on Reddit. We find that basic components of the theories indeed mirrored user actual rhetorical behaviors. Nonetheless, we also found that specific theoretically defined elements of discourses did not resurface in the political discussions on Reddit. Over a time span of eight years (2010-2018), we created a quasi-experimental setting to identify changes in political discourse as a function of introducing or removing upvotes, downvotes, or both. We showed that social media reaction changes were associated with changes in the nature of political discourse, with “only upvoting” leading to the highest level of civic and deliberative discourse, while the lack of reaction mechanisms resulted in the strongest manifestation of demagoguery. These results have both scientific value, as they connect political theoretic work to empirical data, and policy implications, as it generates knowledge about how to ideally design social media environments that align with democratic values