
Making the Forms 990 Open Access: Unleashing Nonprofit Tax Data to Invigorate Social Science Research
Fall 2022
Nonprofit tax returns, known as the Form 990, are an important resource for scholars, journalists, and practitioners. These public documents provide a wealth of detailed information, including: executive compensation, board membership, mission statements, programmatic activities, revenue sources, expense types, lobbying efforts – just to name a few. Though each individual tax filing is public, a comprehensive dataset compiling the full scope of information only exists from 1998 to 2003. Therefore, the aim of this project is to develop a comprehensive dataset for today.
This project incorporates two components to achieve this goal. First, the Internal Revenue Service makes e-filed Form 990s electronically available on their website in .xml format. These .xml files cover all tax returns from roughly 2012 onward. The first aspect of the project develops an R program to easily allow researchers to collect these data themselves or simply download the compiled data. Second, this project leverages optical character recognition (OCR) to scan all paper-filed Form 990s during this same period but then goes back further to 2008 (the year the current Form 990 was implemented). Together, then, this results in the complete population of Form 990s or over 2.8 million tax filings from over 300,000 nonprofits spanning 2008-2022. This dataset, and corresponding data collection tools, will help unleash innovative insights on the U.S. nonprofit sector.