Frontiers in Data Science

Symposium Series

The DDSS Frontiers in Data Science Symposium series is organized around specific topics and will feature an interdisciplinary group of speakers with expertise in different areas, including computer science, statistics, social sciences, industry, and government. The topics will center on research questions and societal challenges where data-driven scientific solutions require the integration of the social sciences with statistical methods, computer science, and research design.

Fall 2024: Advances in Record Linkage

Chains

Overview

October 17-18, 2024

REGISTER HERE - REQUIRED 

This symposium will feature state-of-the-art approaches to record linkage, bringing together social scientists, statisticians, academic researchers, and industry professionals. Presenters will showcase technical expertise in the development and use of innovative methods and software.

Topics will include:

  • Census and historical record linkage
  • Scalable probabilistic record linkage
  • Evaluation
  • Streaming and simultaneous analysis
  • Industry collaboration

This event is cosponsored in partnership with theIndustrial Relations Section. Registration required.
 

 

Speakers

Schedule

Click Here for Schedule Details

 

Program Overview
Speaker
Topic
DAY ONE  
Session 1Census Part I
Hannah Postel, Duke UniversitySubgroup Disparities in Automated Census Record Linkage
Allison Green, Princeton UniversityLinking Historical Datasets: An Application to World War II Navy Muster Rolls
Session 2Census Part 2
Joe Price, Brigham Young UniversityBreakthroughs in Historical Record Linking Using Genealogy Data: The Census Tree Project
Adrian Haws, Cornell UniversitySoftware Demo: Census Tree - XGBoost
Jonas Helgertz, Lund University [presented by Joe Price]Examining the Role of Training Data for Supervised Methods of Automated Record Linkage: Lessons for Best Practice in Economic History
 Software Demo
Session 3Methods Part I
Brenda Betancourt, NORC at the University of ChicagoBayesian Clustering for Record Linkage Tasks
Ted Enamorado, Washington University in St. LouisA Locally Sensitive Hashing Approach to Scaling Up Probabilistic Record Linkage
 Software Demo: fastLink
Session 4Methods Part II
Jerry Reiter, DukeSimultaneous Record Linkage and Statistical Modeling
Andee Kaplan, Colorado State UniversityFast Bayesian Record Linkage for Streaming Data Contexts
 
 Software Demo: bstrl(R)
Session 5Project Discussion: Splink
Robin LinacreSplink. Discussion and Software Demo
  
DAY TWO 
Session 6Evaluation and Analysis
Martin Slawski, University of VirginiaSome Recent Advances and Open Problems in Post-Linkage Data Analysis
 Software Demo: Pldamixture
Olivier Binette, Duke UniversityHow to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation 
 Software Demo: ER-EVALUATION
Session 7Further Applications
Cory McCartan, Penn State UniversityA Missing Data Approach to Record Linkage and Measurement Error
Connor Jerzak, University of Texas, AustinNew Directions in Large-scale Record Linkage Using Half a Billion Open Collaborated Records from LinkedIn
 Software Demo: LinkOrgs 

 

 

Spring 2024: The Spread of Misinformation in a World Optimized for Engagement

Person holding up letters K and E to change word Fact to Fake

Overview

May 10, 2024

The spread of misinformation is one of society's current challenges. The vast networks of communication and algorithms established in social media platforms can amplify false or inaccurate information at an unprecedented scale, potentially posing risks for public health, public security, democratic accountability, and many other domains. This DDSS symposium will focus on the role of algorithmic amplification in the spread of misinformation, the role of statistics and machine learning in processing information and misinformation, and some of the strategies that are being developed to counter misinformation.

 

 

Speakers

Schedule

Time
Speaker
Topic
8:55am - 9:00amRocío Titiunik, Princeton UniversityOpening Remarks
9:00am - 9:45amArvind Narayanan, Princeton UniversityUnderstanding Social Media Recommendation Algorithms
9:45am - 10:30am Andy Guess, Princeton UniversitySocial Media, Ranking Algorithms, and Misinformation
11:00am - 11:45amAdam Berinsky, MITThinking About Misinformation Interventions
11:45am - 12:30pmYao Xie, Georgia Institute of TechnologyDiscovery and Mitigation of Disparities by Data
1:30pm - 2:25pmTamar Mitts, Columbia University; Arvind Narayanan, Princeton University; Jacob Shapiro, Princeton UniversityRoundtable Discussion
2:25pm - 2:30pm Rocío Titiunik, Princeton UniversityClosing Remarks

 

Contributions to and/or sponsorship of any event does not constitute departmental or institutional endorsement of the specific program, speakers or views presented.