METHODOLOGY

About

The Junk News Aggregator, the Visual Junk News Aggregator and the Top 10 Junk News Aggregator are research projects of the Computational Propaganda group (COMPROP) of the Oxford Internet Institute (OII) at the University of Oxford.

These aggregators are intended as tools to help researchers, journalists, and the public see what English language junk news stories are being shared and engaged with on Facebook, ahead of the 2018 US midterm elections on November 6, 2018.

The aggregators show junk news posts along with how many reactions they received, for all eight types of post reactions available on Facebook, namely: Likes, Comments, Shares, and the five emoji reactions: Love, Haha, Wow, Angry, and Sad.

What is Junk News?

Per the definitions in Bolsover & Howard (2018), Gallacher et al. (2017), Howard et al. (2017, 2018), and Woolley and Howard (2018), the term "junk news" refers to various forms of propaganda and ideologically extreme, hyper-partisan, or conspiratorial political news and information. The term includes news publications that present verifiably false content as factual news. This content includes propagandistic, ideologically extreme, hyper-partisan, or conspiracy-oriented news and information. Frequently, attention-grabbing techniques are used, such as lots of pictures, moving images, excessive capitalization, personal attacks, emotionally charged words and pictures, populist generalizations, and logical fallacies. It presents commentary as news. The term refers to a publisher overall, i.e. based on content that is typically published by a publisher, rather than referring to an individual article. Further context on junk news can be found below.

The Methodology in a Nutshell

In brief, the methodology used for the aggregator followed these steps:

  1. Sources of junk news were identified from a dataset of tweets from US users relating to the 2018 midterms elections. Links contained in those tweets were extracted.
  2. A team of 5 rigorously trained coders labelled these links (source websites) independently, based on a grounded typology that has been tested over several elections around the world in 2016-2018.
  3. A source website was coded as junk news when it failed on 3 out of 5 of criteria of the typology.
  4. For each junk news source website, the posts it uploaded on its public Facebook page are retrieved and displayed on the Junk News Aggregator every hour. The visual version and the Top 10 version display the most engaged-with posts of the last 24 hours, refreshing daily at 5pm ET.

More details about the data collection, methodology, and how the aggregators work can be found in the sections on Data Collection and Methodology, About the Junk News Aggregator, and About the Top-10 Junk News Aggregator, below.

About the Junk News Aggregator

The Junk News Aggregator shows Facebook posts posted by junk news outlets on their public Facebook page. It shows posts uploaded to Facebook up to a month ago. You can filter posts based on how long ago they were posted (e.g. 1 hour ago, 2 hours ago, etc.), and also based on keywords and based on the publisher name. You can also sort posts, not only by when they were posted (newest/ oldest), but also by how many engagements (or reactions) they received, for each of the eight post engagement types available on Facebook (Likes, Comments, Shares, and the five emoji reactions: Love, Haha, Wow, Angry, Sad), and by the sum of all engagements across all eight metrics ("All"). In addition, you can sort posts by the age-adjusted version of each of the engagement types (number of engagements divided by the post's age in seconds), which shows the number of engagements this post received per second of its life on Facebook (up to the point it was retrieved by the Aggregator system). Since the Aggregator system only queries Facebook once an hour (due to Facebook API's rate limits, this cannot be done more frequently), this accounts for the age of the post at the time Facebook was queried, and offers a more appropriate measure for comparing and sorting posts based on engagement numbers.

About the Top-10 Junk News Aggregator

The Top-10 Junk News Aggregator is a smaller and simpler version of the Junk News Aggregator. It uses the same data as the Junk News Aggregator, but it queries this data less frequently, once every 24 hours, at 5pm ET, and shows only the top 10 most engaged-with Facebook posts that were posted in this 24-hour period by junk news sources. Specifically, these are the top 10 most engaged-with posts in terms of the overall age-adjusted total engagements these posts have received. A post's overall age-adjusted total engagements is the sum of all engagements received by this post (the number of Likes + Comments + Shares + Love reactions + Haha reactions + Wow reactions + Angry reactions + Sad reactions) divided by the post's age in seconds, where a post's age equals the time when the post was retrieved from Facebook minus the time when the post was posted to Facebook, with this age measured here in seconds.

About the Visual Junk News Aggregator

The interactive image grid on the homepage is an image-based top-256 Visual Junk News Aggregator. It shows images from the top 256 most engaged with junk news posts of the last 24 hours. It uses exactly the same logic as the Top 10 Aggregator (described above): it updates every 24 hours at 5pm ET, and shows only the top 256 most engaged with Facebook posts that were posted in this 24-hour period by junk news sources. These are the top posts based on age-adjusted total engagements (the "age-adjusted total: All" metric). Each image in the grid corresponds to a junk news Facebook post. Hovering over an image reveals a pop-up showing more information about the relevant post: the Facebook Page that posted it, the time and date posted, the text in the post, and engagement numbers. Clicking on an image takes you to the full Aggregator, where you can explore junk news in greater detail.

Data Collection and Methodology

The Junk News Aggregator and its Top 10 version offer users the ability to track in near real time the junk news content being posted and engaged with on Facebook, and filter this information by time, reaction type, and by keywords.

It queries Facebook every hour on the hour, using the public Facebook Graph API, and collects only public data. No sensitive, personal or user-identifying data is collected. Specifically, it queries a list of Public Facebook Pages of specific junk news publishers. These junk news publishers were chosen because their web content was found to be particularly frequently discussed in social media conversations relating to the 2018 US midterm elections. The list of junk news publishers was assembled and utilised in the Aggregator according to the following sequence of steps:

  1. Selecting Twitter hashtags relevant to the 2018 US midterm elections. Ensure that these hashtags relate to this specific election and not to other 2018 elections around the world, and ensure the list of hashtags is balanced, in the sense of covering both right and left-leaning hashtags.
  2. Using the Twitter Streaming API, to get any other hashtags that are mentioned together with the hashtags in our list: get all tweets mentioning any of the hashtags in our list, and from these tweets, get other hashtags mentioned in those tweets. This snowball sampling of hashtags results in an expanded list of hashtags. This snowball sampling for the hashtags happened from September 15 to September 19, 2018. The list of hashtags can be found here.
  3. Using this expanded hashtag list, use the Twitter Streaming API to retrieve all English-language tweets (including retweets and quote tweets) mentioning any of these hashtags. This data collection resulted in 2,541,544 tweets posted in the period September 21 to September 30, 2018.
  4. Out of this set of tweets, get all the URLs they mention. Keep only the base URL (i.e. not a specific article's URL but the homepage URL), and count how many times each base URL was mentioned.
  5. Giving this list of URLs to trained US experts (trained human annotators or "coders"), to classify all URLs into categories of news and political content, using a grounded typology (Woolley and Howard, 2018). For the junk news category, there exist five criteria, so if a news source fails to satisfy at least three of these five criteria, it gets classified under this category. To train our team of US experts to categorize sources of political news and information according to our grounded typology, we established a rigorous training system. For the analysis of the 2018 US midterms we worked with a team of three coders. Each source was triple-coded. Any conflicting decision was thoroughly discussed between coders to achieve consensus. For sources where consensus was not achieved, coders discussed with each other thoroughly to achieve consensus. In the event that consensus was not achieved, an executive team of three other highly experienced coders reviewed the source and made a final coding decision.
  6. For every website in the Junk News list that has been shared on Twitter, identifying their Facebook page, if they have any. In order to establish that a given Facebook page corresponds to a given website, it is required that either the website explicitly lists this Facebook page as theirs, and/or that the given Facebook page lists under its 'Website' field this particular website.
  7. Out of all junk news sites, keep only the ones that have a Facebook page. Out of those, keep only the top 50 most shared ones on Twitter (due to the Facebook Graph API's rate limits, not all of them can be tracked). These 50 junk news sites, along with the typology criteria which they violate (due to which they get classified as junk news), can be found in this CSV file, and the explanation of the code (abbreviation) used for each criterion is in this CSV file.
  8. Out of these 50 junk news sites, for the public Facebook page of each, retrieve the public posts authored by them and some of the post metadata (including aggregate-level engagement numbers): every hour on the hour, get all posts authored by these pages, not including any names of people who engage with these posts, but rather only numbers of engagements (reactions), for all eight types of engagement that Facebook makes available: Share, Comment, Like, Love, Haha, Wow, Sad, Angry. Write these posts to a database.
  9. For the Junk News Aggregator site, retrieve data from this database of public Facebook posts by junk news publishers, and allow site visitors to explore, sort, and filter this data by time, engagement numbers, and keywords.

We note that, for each Facebook post collected, the displayed engagement numbers were last updated at most an hour after the post was posted, as, due to the Facebook API's data limits, we cannot update those again later. But the link to the Facebook post is provided, so you can click on that to see current engagement levels for this post.

Further Context on Junk News

Social media is an important source of news and information about politics in the United States. But during the 2016 US Presidential Election social media platforms emerged as a fertile breeding ground for foreign influence campaigns, conspiracy theories, and radical alternative media outlets. Anecdotally, the nature of this political news and information seems to have evolved over time, but political communication researchers have yet to develop a comprehensive, grounded, internally consistent, typology of the types of sources shared by social media users. Rather than chasing a definition of what is popularly known as “fake news”, we produce a grounded typology of what users actually shared and apply rigorous coding and content analysis techniques to define the new phenomenon. To understand what social media users are sharing in their political communication, we had analyzed large volumes of political conversation over Twitter during the 2016 Presidential campaign period and the 2018 State of the Union Address in the United States. Based on this analysis, researchers have developed the concept of “junk news”, which refers to sources that deliberately publish or aggregate misleading, deceptive or incorrect information packaged as real news about politics, economics or culture.

Following the highly contentious 2016 US Presidential Election, there has been a growing body of empirical work demonstrating how large volumes of misinformation can circulate over social media during critical moments of public life (Allcott and Gentzkow 2017; Vicario et al. 2016; Vosoughi, Roy, and Aral 2018). Scholars have argued that the spread of “computational propaganda” sustained by social media algorithms can negatively impact democratic discourse and disrupt digital public spheres (Bradshaw and Howard 2018; Howard and Woolley 2016; Persily 2017; Tucker et al. 2017; Wardle and Derakhshan 2017). Indeed, both social network infrastructure and user behaviors provide capacities and constraints for the spread of computational propaganda (Bradshaw and Howard 2018; Flaxman, Goel, and Rao 2016; Marwick and Lewis 2017; Pariser 2011; Wu 2017). Yet, the body of work that is devoted to conceptualizing misinformation phenomena faces a number of epistemological and methodological challenges, has remained fragmentary, is ambiguous at best and lacks a common vocabulary (boyd, 2017; Fletcher, Cornia, Graves, & Nielsen, 2018; Wardle & Derakhshan, 2017). Terminology on misinformation has become highly contentious, constantly being weaponised by politically motivated actors to discredit media reporting (Neudert 2017; Woolley and Howard 2017).

Drawing on perspectives from political communication, this typology helps reveal the nature of content being shared over social media. The typology has evolved over several years and multiple scholarly publications, including scientific working papers, peer review journal articles, and book manuscripts.

There have been a few attempts to systematically operationalize fake news as a concept. The primary challenge is that it is impossible to evaluate the amount of fact checking that goes into a particular piece of writing at a scale sufficient for saying something general about the trends on a social media platform. Most researchers—and indeed most citizens—have manual heuristics for evaluating sources that deliberately publish or aggregate misleading, deceptive or incorrect information purporting to be real news about politics, economics or culture. For different social media users there are different ways of evaluating the qualities of news and political information on social media.

Some outlets appear highly unprofessional. These sources do not employ standards and best practices of professional journalism. They refrain from providing clear information about real authors, editors, publishers and owners. They lack transparency and accountability, and do not publish corrections on debunked information. Other outlets have stylistic trademarks that make them suspicious to most readers. These sources use emotionally driven language with emotive expressions, hyperbole, ad hominem attacks, misleading headlines, excessive capitalization, unsafe generalizations and logical fallacies, moving images, and lots of pictures and mobilizing memes. For other sources of political news and information, false information and conspiracy theories seem to drive the strategy of word choice, story placement, and argument structure. They report without consulting multiple sources and do not fact-check. Sources are often untrustworthy and standards of production lack reliability. While some sources may have a fatal “credibility issue”, others have strong bias, and reporting in these sources is highly biased, ideologically skewed or hyper-partisan, and news reporting frequently includes strongly opinionated commentary and inflammatory viewpoints. Finally, some sources simply mimic established news reporting styles and formats, as counterfeit sources. Such sources are most commonly cited as examples of fake news, but they are essentially sources of commentary that is masked as news. They are stylistically disguised or falsely branded, making references to news agencies and credible sources, and headlines written in a news tone with date, time and location stamps.

Focusing specifically on political news and information being shared during the 2016 US Presidential election and the State of the Union Address in January 2018, we have already analyzed in previously published studies 21.8 million tweets that voters shared over Twitter, one of the most popular platforms for conversations about politics in the United States.

Content typologies are of central importance to the study of political communication, and for many years the broad categories and subcategories of political news and information have remained widely accepted by researchers, though of course there is debate over how transportable such traditional categories are to new media political communication (Earl, Martin, McCarthy, & Soule, 2004; Karlsson & Sjøvaag, 2016). However, the recent attention and debate over the effects of junk news in the media ecosystem have forced researchers—especially those working in political communication and social media—to re-evaluate the production models, normative values, and ideational impact of political news and information of social media with new categories and definitions that are actually grounded in the content being shared over social media (Lim, Jr, and Ling 2018). Currently, the debate lacks a grounded and comparative framework of the types of information that circulate on social media, and remains detached from evidence of social media sharing behavior. We advance this debate with a rigorously composed typology based on a focused, cross-case comparison of key political events in the US.

We proceed from a recognition that what users consume and share over social media in their political conversations is not simply news, but could include a wide variety of sources for political news and information, including user-generated content, conspiratorial alternative media outlets, and entertainment outlets. Indeed, there is significant research that humorous content is a staple of information sharing in contemporary political communication (Becker 2012; Becker, Xenos, and Waisanen 2010; Moy, Xenos, and Hess 2006). Given the lack of guidance in the existing literature on what information users are sharing, a grounded and iterative method of cataloguing content and evaluation is especially relevant. Based on our analysis of the US Presidential Election in 2016 we develop and analyze such a grounded typology of sources of news and information shared over Twitter. While Twitter provides access to a wealth of data on public news sharing in the United States, the user base is not fully representative (Blank 2017). Nevertheless, Twitter remains a central source of news and is especially popular among journalists, politicians and opinion leaders, who further disseminate information in non-public social media spaces such as Facebook and WhatsApp (Jungherr 2016).

Typology building is one of the most foundational tasks in political research, and is especially important when it comes to investigating and explicating new phenomena, unexpected problems, or sudden changes in social systems (Aronovitch, 2012; Howard & Hussain, 2013; Swedberg, 2018). Understanding the diversity of variables and cases, among real world outcomes, involves carefully constructing categories that both accurately describe the features of such new political phenomena and serve as transportable concepts across several cases. Certainly political propaganda, misinformation, dirty tricks, and negative campaigning are not new features of public life. But social media applications are new platforms for spreading political news and information, the speed at which misinformation spreads is significantly greater, and use of an individual’s data in the targeting formula for misinformation are three distinct components of this contemporary mode of political communication.

Typologies in political communication research have been useful for frame analysis for the study of news, or the organization of event-based datasets in which media accounts provide the primary features for important incidents (Althaus, Edy, and Phalen 2001; Erickson and Howard 2007). Even before social media, scholars used such methods to expose the ways in which sensational news organizations used human interest frames, while serious news organizations used responsibility and conflict frames (Semetko and Valkenburg 2000). In order to understand what users were actually sharing over social media, we develop a typology of political news and information.

References

  • Allcott, Hunt, and Matthew Gentzkow. 2017. “Social Media and Fake News in the 2016 Election.” National Bureau of Economic Research. http://www.nber.org/papers/w23089.
  • Althaus, Scott L., Jill A. Edy, and Patricia F. Phalen. 2001. “Using Substitutes for Full-Text News Stories in Content Analysis: Which Text Is Best?” American Journal of Political Science 45 (3): 707–23. https://doi.org/10.2307/2669247.
  • Aronovitch, Hilliard. 2012. “Interpreting Weber’s Ideal-Types.” Philosophy of the Social Sciences 42 (3): 356–69. https://doi.org/10.1177/0048393111408779.
  • Becker, Amy B. 2012. “Comedy Types and Political Campaigns: The Differential Influence of Other-Directed Hostile Humor and Self-Ridicule on Candidate Evaluations.” Mass Communication and Society 15 (6): 791–812. https://doi.org/10.1080/15205436.2011.628431.
  • Becker, Amy B., Michael A. Xenos, and Don J. Waisanen. 2010. “Sizing Up The Daily Show: Audience Perceptions of Political Comedy Programming.” Atlantic Journal of Communication 18 (3): 144–57. https://doi.org/10.1080/15456871003742112.
  • Blank, Grant. 2017. “The Digital Divide Among Twitter Users and Its Implications for Social Research.” Social Science Computer Review 35 (6): 679–97. https://doi.org/10.1177/0894439316671698.
  • Bolsover, Gillian & Philip Howard. 2018. “Chinese computational propaganda: automation, algorithms and the manipulation of information about Chinese politics on Twitter and Weibo.” Information, Communication & Society. DOI: 10.1080/1369118X.2018.1476576
  • boyd, danah. 2017. “Google and Facebook Can’t Just Make Fake News Disappear.” WIRED. 2017. https://www.wired.com/2017/03/google-and-facebook-cant-just-make-fake-news-disappear/.
  • Bradshaw, Samantha, and Philip N. Howard. 2018. “Why Does Junk News Spread So Quickly Across Social Media? Algorithms, Advertising and Exposure in Public Life.” Knight Foundation Working Paper, January. https://kf-site-production.s3.amazonaws.com/media_elements/files/000/000/142/original/Topos_KF_White-Paper_Howard_V1_ado.pdf.
  • Earl, Jennifer, Andrew Martin, John D. McCarthy, and Sarah A. Soule. 2004. “The Use for Newspaper Data in the Study of Collective Action.” Annual Review of Sociology 30 (1): 65–80. https://doi.org/10.1146/annurev.soc.30.012703.110603.
  • Erickson, Kris, and Philip N. Howard. 2007. “A Case of Mistaken Identity? News Accounts of Hacker, Consumer, and Organizational Responsibility for Compromised Digital Records.” Journal of Computer-Mediated Communication 12 (4): 1229–47. https://doi.org/10.1111/j.1083-6101.2007.00371.x.
  • Flaxman, Seth, Sharad Goel, and Justin M. Rao. 2016. “Filter Bubbles, Echo Chambers, and Online News Consumption.” Public Opinion Quarterly 80 (S1): 298–320. https://doi.org/10.1093/poq/nfw006.
  • Fletcher, Richard, Alessio Cornia, Lucas Graves, and Rasmus Kleis Nielsen. n.d. “Measuring the Reach of ‘Fake News’ and Online Disinformation in Europe,” 10.
  • Gallacher, John D., Vlad Barash, Philip N. Howard, and John Kelly. 2017 “Junk News on Military Affairs and National Security: Social Media Disinformation Campaigns Against US Military Personnel and Veterans.” (Data Memo 2017.9). Oxford, UK: Project on Computational Propaganda.
  • Howard, Philip, Gillian Bolsover, Bence Kollanyi, Samantha Bradshaw, & Lisa-Maria Neudert. 2017. “Junk news and bots during the U.S. Election: What were Michigan voters sharing over Twitter?” (Data Memo 2017.1). Oxford, UK: Project on Computational Propaganda.
  • Howard, Philip N., and Muzammil M. Hussain. 2013. Democracy’s Fourth Wave? Digital Media and the Arab Spring. New York, NY: Oxford University Press.
  • Howard, Philip, and Samuel Woolley. 2016. “Political Communication, Computational Propaganda, and Autonomous Agents.” Edited by Philip N. Howard. International Journal of Communication 10 (Special Issue): 20.
  • Howard, Philip, Samuel Woolley, Ryan Calo. 2018. “Algorithms, bots, and political communication in the US 2016 election: The challenge of automated political communication for election law and administration.” Journal of Information Technology & Politics, 15:2, 81-93, DOI: 10.1080/19331681.2018.1448735
  • Jr, Edson C. Tandoc, Zheng Wei Lim, and Richard Ling. 2018. “Defining ‘Fake News.’” Digital Journalism 6 (2): 137–53. https://doi.org/10.1080/21670811.2017.1360143.
  • Jungherr, Andreas. 2016. “Twitter Use in Election Campaigns: A Systematic Literature Review.” Journal of Information Technology & Politics 13 (1): 72–91. https://doi.org/10.1080/19331681.2015.1132401.
  • Karlsson, Michael, and Helle Sjøvaag. 2016. “Content Analysis and Online News.” Digital Journalism 4 (1): 177–92. https://doi.org/10.1080/21670811.2015.1096619.
  • Caio Machado, Beatriz Kira, Gustavo Hirsch, Nahema Marchal, Bence Kollanyi, Philip N. Howard, Thomas Lederer, and Vlad Barash. 2018. “News and Political Information Consumption in Brazil: Mapping the First Round of the 2018 Brazilian Presidential Election on Twitter.” (Data Memo 2018.4). Oxford, UK: Project on Computational Propaganda.
  • Marwick, Alice, and Rebecca Lewis. 2017. “Media Manipulation and Disinformation Online.” Data & Society Research Institute. https://datasociety.net/pubs/oh/DataAndSociety_MediaManipulationAndDisinformationOnline.pdf.
  • Moy, Patricia, Michael A. Xenos, and Verena K. Hess. 2006. “Priming Effects of Late-Night Comedy.” International Journal of Public Opinion Research 18 (2): 198–210. https://doi.org/10.1093/ijpor/edh092.
  • Narayanan, V, V Barash, J Kelly, B Kollanyi, L M Neudert, and P N Howard. n.d. “Polarization, Partisanship and Junk News Consumption over Social Media in the US.” Data Memo 2018.1.
  • Neudert, Lisa Maria. 2017. “Computational Propaganda in Germany: A Cautionary Tale.” 2017.7. Computational Propaganda Working Paper Series. Oxford, United Kingdom: Oxford Internet Institute, University of Oxford.
  • Pariser, Eli. 2011. The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think. London, UK: Penguin Books.
  • Persily, Nate. 2017. “The 2016 U.S. Election: Can Democracy Survive the Internet?” Journal of Democracy 28 (2): 63–76.
  • Semetko, Holli A., and Patti M. Valkenburg Valkenburg. 2000. “Framing European Politics: A Content Analysis of Press and Television News.” Journal of Communication 50 (2): 93–109. https://doi.org/10.1111/j.1460-2466.2000.tb02843.x.
  • Swedberg, Richard. 2018. “How to Use Max Weber’s Ideal Type in Sociological Analysis.” Journal of Classical Sociology 18 (3): 181–96. https://doi.org/10.1177/1468795X17743643.
  • Tucker, Joshua A., Yannis Theocharis, Margaret E. Roberts, and Pablo Barberá. 2017. “From Liberation2 to Turmoil: Social Media and Democracy.” Journal of Democracy 28 (4): 46–59.
  • Vicario, Michela Del, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. Eugene Stanley, and Walter Quattrociocchi. 2016. “The Spreading of Misinformation Online.” Proceedings of the National Academy of Sciences 113 (3): 554–59. https://doi.org/10.1073/pnas.1517441113.
  • Vosoughi, Soroush, Deb Roy, and Sinan Aral. 2018. “The Spread of True and False News Online.” Science 359 (6380): 1146–51. https://doi.org/10.1126/science.aap9559.
  • Wardle, Claire, and Hossein Derakhshan. 2017. “Information Disorder: Toward and Interdisciplinary Framework for Research and Policy Making.” Council of Europe. https://rm.coe.int/information-disorder-report-november-2017/1680764666.
  • Woolley, Samuel, and Philip Howard. 2017. “Computational Propaganda: Executive Summary.” Working Paper 2017.11. Oxford, United Kingdom: Project on Computational Propaganda, Oxford Internet Institute, Oxford University.
  • Woolley, Samuel, and Philip Howard (Eds.). 2018. “Computational Propaganda: Political Parties, Politicians, and Political Manipulation on Social Media.” Oxford University Press.
  • Wu, Tim. 2017. The Attention Merchants: The Epic Struggle to Get Inside Our Heads. Atlantic Books.