
Sample Complexity Bounds: A New Approach to Sample Size Calculation in the Social Sciences
Spring 2024
What constitutes "good enough" data for quantitative social science? Statistical learning is increasingly applied to substantive questions in social science, enabling researchers to work with complex, high-dimensional data structures ranging from open-ended survey responses to video data. However, persistent uncertainty about data requirements makes it difficult to design and pre-register these studies while making effective use of scarce resources. To address this issue, we introduce a novel method which allows applied researchers to calculate bounds on the sample size necessary to achieve a minimum level of accuracy and confidence for any discrete classification method, implemented in the companion scR R package. Although the method's scope is universal, we will implement an application with a highly unknown data-generating process -- an open-ended survey aimed at detecting the political impact of competing narratives of historical conflict in Nigeria -- as a hard test of its ability to generate a general lower bound.