3 min read

Are You Ready for Data is Plural?

Today I’d like to draw attention to a super cool dataset highlighted by the super cool data-focused newsletter, Data is Plural.

If this post increases awareness of Jeremy Singer-Vine’s fantastic Data is Plural newsletter by even one person, I consider that a victory. #Rstats enthusiasts might recognize the name from #tidytuesday. Last’s week’s Dairy data challenge was found via Data is Plural.

The highest honor I feel I can pay to Data is Plural is a little plot based on their data set. Let’s dive into the Foreign Gifts and Contracts Report from the Office of Federal Student Aid.

Here is the link to the data: Foreign Gifts and Contracts Report

As usual, we have to import libraries and set options.

library(tidyverse)
library(readxl)
library(janitor)
extrafont::loadfonts(device = "win")

theme_set(theme_minimal(base_family = "Gill Sans MT") +
            theme(plot.subtitle = element_text(size = 9),
                  axis.text = element_text(color = "black"),
                  axis.title = element_text(color = "black"),
                  panel.grid.minor = element_blank()))

For those unaware, until readxl can process links directly, here is a handy way to read an .xls or .xlsx file into R without saving it to a permanent directory.

# create temp file

temp <- tempfile()

# direct saving to tempfile

download.file("https://studentaid.ed.gov/sa/sites/default/files/fsawg/datacenter/library/ForeignGifts.xls", 
              mode = "wb",
              destfile = temp)

# read in and clean names

fg_import <- read_xls(temp, 
              skip = 1) %>% 
  clean_names()

unlink(temp)

One could ask many questions of this data. My first, and simplest question, was to understand which universities received the most $ from non-United States citizens. This data includes both gifts and contracts so it makes sense to split out each category.

# find well heeled institutions (from foreign sources)

by_uni <- fg_import %>% 
  group_by(institution_name) %>% 
  summarize(total_received = sum(foreign_gift_amount),
            contract_amt = sum(foreign_gift_amount[gift_type == "Contract"]),
            gift_amt = sum(foreign_gift_amount[gift_type != "Contract"]),
            n = n()) %>% 
  gather(gift_type, amt, contract_amt:gift_amt) %>% 
  mutate(gift_type = ifelse(gift_type == "gift_amt", "Gift", "Contract")) %>% 
  arrange(-total_received)

It isn’t the cleanest data frame but it will get the job done. Let’s plot the top 20.

ggplot(by_uni[1:40,], aes(fct_reorder(institution_name, total_received), 
                          amt,
                          fill = gift_type)) +
  geom_col() +
  coord_flip() +
  scale_y_continuous(labels = scales::dollar) +
  scale_fill_manual(values = c("lightsalmon", "slategrey")) +
  labs(x = "",
       y = "Foreign Gifts Since 2012",
       title = "When it comes to foreign contracts & giving, the usual suspects are at the top",
       subtitle = "Total gifts/contracts from foreign sources, Title IV-eligible domestic institutions since 2012",
       fill = "Gift Type",
       caption = "Source: Office of Federal Student Aid") +
   geom_curve(aes(x = 16, y = 8e8, xend = 18, yend = 5.5e8),
             color = "black",
             size = 0.70,
             ncp = 10,
             arrow = arrow(type = "closed",
                           length = unit(0.25, "cm"))) +
  annotate("text", 
           x = 15.3, y = 6e8,
           label = str_wrap("MIT hits the top of the tables with gifts alone!", 25),
           color = "black",
           hjust = 0,
           size = 3,
           family = "Gill Sans MT") +
   theme(plot.title = element_text(size = 9.5),
         plot.subtitle = element_text(size = 8))

Not too much surprising here. The most interesting thing is the institutions which seem to be heavily contract dependent. My suspicion is that the majority of this comes from non-U.S. campuses. Doha’s Education City plays (or is it pays) home to Carnegie Melon, Texas A&M, Georgetown, Northwestern, and others. For that reason we see those universities on this list. NYU maintains a campus in Abu Dhabi and so earns its share of “contract” revenue.

Notably, though, MIT and Stanford make the top 20 with gifts alone!

I this post has motivated readers to investigate the Data is Plural newsletter. It is a terrific resource and should be maintained and harvested regularly.