Zoë Schlanger | Quartz | March 9, 2017 | 0 Comments

These Are the 158 Key Federal Science Data Sets Rogue Programmers Have Duplicated So Far


Since the weeks leading up to Donald Trump’s inauguration day, impromptu gatherings of programmers, scientists and archivists have popped up at universities across the country. They gather on weekends, laptops and thumb drives in hand, order pizza and then download and archive as much federal science data as they can get their hands on.

These “data rescue” events have managed to archive tens of thousands of government website pages they fear may be edited or removed under an administration that has expressed hostility toward climate and environmental science. Copies of those web pages now live within the Internet Archive, best known for its Wayback Machine platform.

But the Internet Archive can’t scrape more elaborate databases—so in addition to simple web pages, the groups pull down intricate and often large data sets from science agencies like NASA, the Environmental Protection Agency and the National Oceanic and Atmospheric Administration, all three of which have been singled out by the Trump administration for budget and staffing cuts to their Earth and climate science programs.

Since January, 158 complete data sets have been downloaded, labeled and re-uploaded to DataRefuge.org, a growing repository of scraped government science.

And now, there’s a data visualization tool that lets you see exactly which data sets, from which agencies, the data rescue groups have duplicated so far. The list includes data sets from NOAA’s Earth-observing satellites, NASA’s polar-orbiting missions, pollution discharge monitoring reports from EPA, among many others.

Sarah Kolbe, a data scientist for California State University, built the data visualization after volunteering with a group of programmers in Madison, Wisconsin, who held its first “data rescue” event March 5.

“We’re planning to do another soon,” she says, so more data sets will likely be added.

DataRefuge Dashboard

A coalition of researchers called the Environmental Data and Governance Initiative, or EDGI, is monitoring the government web pages for any changes under the new administration, by comparing them to the scraped copies. (They also plan to track any data sets that are removed.)

And they’ve already found several notable changes: Climate change reports have disappeared off State Department websites, and as The New York Times points out, the science and technology office of EPA has changed its mission description from creating “scientific and technological foundations to achieve clean water” to creating “economically and technologically achievable performance standards.” A description of a federal fracking rule, and another about a methane emissions rule, have also gone missing from Interior Department web pages.


Thank you for subscribing to newsletters from Nextgov.com.
We think these reports might interest you:

  • Modernizing IT for Mission Success

    Surveying Federal and Defense Leaders on Priorities and Challenges at the Tactical Edge

  • Communicating Innovation in Federal Government

    Federal Government spending on ‘obsolete technology’ continues to increase. Supporting the twin pillars of improved digital service delivery for citizens on the one hand, and the increasingly optimized and flexible working practices for federal employees on the other, are neither easy nor inexpensive tasks. This whitepaper explores how federal agencies can leverage the value of existing agency technology assets while offering IT leaders the ability to implement the kind of employee productivity, citizen service improvements and security demanded by federal oversight.

  • Effective Ransomware Response

    This whitepaper provides an overview and understanding of ransomware and how to successfully combat it.

  • Forecasting Cloud's Future

    Conversations with Federal, State, and Local Technology Leaders on Cloud-Driven Digital Transformation

  • IT Transformation Trends: Flash Storage as a Strategic IT Asset

    MIT Technology Review: Flash Storage As a Strategic IT Asset For the first time in decades, IT leaders now consider all-flash storage as a strategic IT asset. IT has become a new operating model that enables self-service with high performance, density and resiliency. It also offers the self-service agility of the public cloud combined with the security, performance, and cost-effectiveness of a private cloud. Download this MIT Technology Review paper to learn more about how all-flash storage is transforming the data center.


When you download a report, your information may be shared with the underwriters of that document.