• shareshare
  • link
  • cite
  • add
auto_awesome_motion View all 3 versions
Research data . Dataset . 2021

Dataset for: "It is just a flu: Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations"

Papadamou, Kostantinos; Zannettou, Savvas; Blackburn, Jeremy; De Cristofaro, Emiliano; Stringhini, Gianluca; Sirivianos, Michael;

Dataset for the paper: "It is just a flu: Assessing the Effect of Watch History on YouTube’s Pseudoscientific Video Recommendations" Abstract: The role played by YouTube’s recommendation algorithm in unwittingly promoting misinformation and conspiracy theories is not entirely understood. Yet, this can have dire real-world consequences, especially when pseudoscientific content is promoted to users at critical times, such as the COVID-19 pandemic. In this paper, we set out to characterize and detect pseudoscientific misinformation on YouTube. We collect 6.6K videos related to COVID-19, the Flat Earth theory, as well as the anti-vaccination and anti-mask movements. Using crowdsourcing, we annotate them as pseudoscience, legitimate science, or irrelevant and train a deep learning classifier to detect pseudoscientific videos with an accuracy of 0.79. We quantify user exposure to this content on various parts of the platform and how this exposure changes based on the user’s watch history. We find that YouTube suggests more pseudoscientific content regarding traditional pseudoscientific topics (e.g., flat earth, anti-vaccination) than for emerging ones (like COVID-19). At the same time, these recommendations are more common on the search results page than on a user’s homepage or in the recommendation section when actively watching videos. Finally, we shed light on how a user’s watch history substantially affects the type of recommended videos. Dataset Files The dataset consists of three files: the metadata, comments, and captions of the ground-truth dataset videos collected and manually reviewed in this paper. 1. Video Metadata "groundtruth_videos.json": Contains the metadata of our manually reviewed ground-truth dataset videos. The ground-truth dataset includes 1,197 science, 1,325 pseudoscience, and 3,212 irrelevant videos. More specifically, it includes the metadata of videos related to the following pseudoscientific topics: COVID-19: (607 science, 368 pseudoscience, 721 irrelevant videos) Anti-vaccination (363 science, 394 pseudoscience, and 1,060 irrelevant videos) Anti-mask (65 science, 188 pseudoscience, and 724 irrelevant videos) Flat Earth (162 science, 375 pseudoscience, and 707 irrelevant videos) Note, that 600 of the videos in this dataset include the "annotation.manual_review_label" attribute, which is the label assigned by the first author of this paper to evaluate the performance of the crowdsourced annotation process. - Video Metadata Description: "search_term": The search terms used to search YouTube and retrieve these videos during our data collection. It can be one of the following search terms: 'covid-19', 'coronavirus', 'anti-vaccination', 'anti-vaxx', 'anti-mask', or 'flat earth'. "annotation.annotations": The list of the three annotations assigned to each video by our crowdsourced annotators. "annotation.label": The annotation label assigned to the video based on the majority agreement of the crowdsourced annotators. "annotation.manual_review_label": The label assigned by the first author of this paper to evaluate the performance of the crowdsourced annotation process. "isSeed": 0 if the video is a seed video of our data collection, 1 if it is a recommended video of a seed video. "relatedVideos": The recommended videos of the given video as returned by the YouTube Data API. 2. Video Comments: "groundtruth_videos_comments_ids.json": Includes the identifiers of the comments of our ground-truth videos. 3. Video Transcripts: "groundtruth_videos_transcripts.json": Includes the captions of our ground-truth videos. If you use this dataset in any publication, of any form and kind, please cite using this data. @article{papadamou2020just, title={'It is just a flu': Assessing the Effect of Watch History on YouTube's Pseudoscientific Video Recommendations}, author={Papadamou, Kostantinos and Zannettou, Savvas and Blackburn, Jeremy and De Cristofaro, Emiliano and Stringhini, Gianluca and Sirivianos, Michael}, journal={arXiv preprint arXiv:2010.11638}, year={2020} }

Acknowledgments: This project has received funding from the European Union's Horizon 2020 Research and Innovation program under the CONCORDIA project (Grant Agreement No. 830927), and from the Innovation and Networks Executive Agency (INEA) under the CYberSafety II project (Grant Agreement No. 1614254). This work reflects only the authors' views; the funding agencies are not responsible for any use that may be made of the information it contains.


YouTube, YouTube Videos, YouTube's Recommendation Algorithm, Science, Pseudoscience, Pseudoscientific Misinformation, Watch History, COVID-19, Anti-vaccination, Anti-mask, Flat Earth

Funded by
Cyber security cOmpeteNCe fOr Research anD InnovAtion
  • Funder: European Commission (EC)
  • Project Code: 830927
  • Funding stream: H2020 | RIA
EnhaNcing seCurity And privacy in the Social wEb: a user centered approach for the protection of minors
  • Funder: European Commission (EC)
  • Project Code: 691025
  • Funding stream: H2020 | MSCA-RISE
Related to Research communities
Download fromView all 3 sources
Dataset . 2021
Providers: Datacite