TidyTuesdayAltText

An R package with the goal of providing insight into the alternative (alt) text accompanying the data visualizations shared on Twitter as part of the TidyTuesday social project

About the data Hex logo for the package. White with a thick black border. Inside, the TidyTuesday logo on the top half which are the words TidyTuesday in white against a broad brush stroke of black paint. On the bottom half, the words alt = "text" in black against a white background and within angle brackets to simulate html code.

The original data were collected and made available by Tom Mock (@thomas_mock) using {rtweet}. These data are available in the TidyTuesday repository.

These tweets were processed and scraped for alternative text by Silvia Canelón (@spcanelon)

  1. Data were filtered to remove tweets without attached media (e.g. images)
  2. Data were supplemented with reply tweets collected using {rtweet}. This was done to identify whether the original tweet or a reply tweet contained an external link (e.g. data source, repository with source code)
  3. Alternative (alt) text was scraped from tweet images using {RSelenium}. The first image attached to each tweet was considered the primary image and only the primary image from each tweet was scraped for alternative text. The following attributes were used to build the scraper:
  • CSS selector: .css-1dbjc4n.r-1p0dtai.r-1mlwlqe.r-1d2f490.r-11wrixw
  • Element attribute: aria-label
Web inspection tool being used to identify the CSS selector corresponding to the primary image of one of Hao Ye's (@Hao_and_Y) tweets with alt text

Figure 1: Example of web inspection being used to identify the CSS selector utilized for alt-text web scraping

This data package does not include data that could directly identify the tweet author in order to respect any author’s decision to delete a tweet or make their account private after the data was originally collected.1

To obtain the tweet text, author screen name, and many other tweet attributes, you can “rehydrate” the TweetIds (or “status” ids)2) using the {rtweet} package.3


TidyTuesday databases on Notion

I use the data available in the TidyTuesday repository to populate some searchable TidyTuesday databases at tiny.cc/notion-dataviz with data visualizations tagged by the dataset of the week, hashtags, mentions, etc.

The Notion 2021 TidyTuesday database showing a gallery of the most recent data visualizations in the collection, organized in a grid

Figure 2: Screenshot of the 2021 TidyTuesday database on Notion, taken on June 1, 2021

Thanks to historical twitter data collected by @thomas_mock, the #TidyTuesday database now has tweets dating back to 2018! 6100+ searchable tweets w/ #dataviz creations from 1400+ participants 🤩Check it out! http://tiny.cc/notion-dataviz

Figure 3: Screenshot of the tweet sharing the TidyTuesday database on Notion


  1. Developer Policy – Twitter Developers | Twitter Developer ↩︎

  2. Tweet object | Twitter Developer ↩︎

  3. Get tweets data for given statuses (status IDs). — lookup_tweets • rOpenSci: rtweet ↩︎

Marie Ouellet
Marie Ouellet
Principal Investigator

Dr. Marie Ouellet’s research focuses on delinquent groups, including how they emerge and evolve, and how networks structure this process. She is currently leading a longitudinal study on police networks to better understand the informal structure of policing, including organizational cohesion and fragmentation within departments, and the consequences of these network structures on the diffusion of behaviors and attitudes. Ouellet’s work has been published in Criminology, Criminology & Public Policy, Journal of Research in Crime and Delinquency, and Justice Quarterly.