ChatGPT-Empowered Automated Cross-Country Data Collection on Tax Policies

Simone Paci (Politics), Fall 2023

Does ChatGPT make for a good research assistant? Many social science projects rely on the systematic collection of large volumes of data from public sources. Traditionally, this task is relegated to research assistants, who parse through websites, articles, and books, collecting the relevant information through extensive manual efforts. However, recent advances in large language models such as ChatGPT promise to disrupt this standard approach, as they perform increasingly well at information retrieval from large unstructured text inputs. I test this proposition in the context of a project about tax enforcement policies. Employing in parallel a team of research assistants and an automated web-scraping event extraction algorithm, I will compare the performance of the two approaches in constructing a database on specific tax laws passed across countries and years. In doing so, this project will pioneer and validate a novel automated data collection strategy, with the potential to drastically reduce the cost and duration of large online data gathering across the social sciences.