Together with the Center for Digital Humanities, the initiative for Data-Driven Social Science welcomes members of the Princeton community to attend an informative presentation on the innovative use of "Novel Machine Learning Methods for Computing Cultural Heritage: An Interdisciplinary Approach" with speaker Benjamin Lee (University of Washington).
Widespread efforts over the past two decades have drastically improved digital access to cultural heritage collections, transforming research for historians, sociologists, political scientists, and humanities researchers. Yet, scholars and the public alike face a persistent challenge: how to navigate and analyze these collections, which frequently contain millions of items and often suffer from imperfect metadata. In this talk, I will discuss my interdisciplinary research, which approaches this challenge from three primary directions:
- Using machine learning to develop and deploy large-scale search and discovery systems with new modes of interaction for cultural heritage collections. In particular, these systems enable end-users to dynamically search for content and concepts of interest by interacting with machine learners that can train and predict across millions of images in under a second. I call this functionality open faceted search.
- Leveraging these systems in order to advance research in the digital humanities and cultural heritage, as well as further public humanities initiatives, through collaborations with social scientists and humanists.
- Studying the sociotechnical implications of applying machine learning to cultural heritage from the perspectives of bias and marginalization.
I will describe all three of these directions through the lens of my project, Newspaper Navigator, which I began as an Innovator in Residence at the Library of Congress and Ph.D. student in Computer Science & Engineering at the University of Washington. In particular, I will detail the ways in which Newspaper Navigator re-imagines how humanists, social scientists, and the public can navigate and analyze the visual content in over 16 million digitized historic newspaper pages. I will then introduce my ongoing work surrounding the development of open faceted search systems for petabyte-scale web archives. I will conclude by elaborating on how my research is extensible to a wide range of digitized and born-digital collections that are utilized by social scientists and humanists on a daily basis.