Research Software Engineering at DDSS

Overview

Software development is an increasingly critical component of data-driven and computationally intensive social science. Advances in machine learning and computational statistics rely on detailed knowledge of computer languages such as Python and their package ecosystem, while the analysis and maintenance of large, messy datasets requires data science and data engineering expertise and know-how. Furthermore, in order to maximize their impact, novel methods must be transformed into stable and sustainable code. Research software engineers—RSEs—enable advances in social science research by partnering with researchers to support these requirements.

 

While relatively new to the social sciences, RSEs are an established presence in the physical, mathematical and engineering sciences. RSEs  contribute to research through  a combination of development skill, domain expertise, and education. Research engineers aim to develop high-quality, dependable code that “just works”— removing unnecessary distractions from researchers’ workflows. Because RSEs primary research output is software, they are able to concentrate on aspects of software development that researchers sometimes neglect: design, documentation, testing and automation (e.g., continuous integration).  These considerations lead to software that is easier for researchers to use, modify and maintain.

Domain Expertise

In the physical sciences, RSEs typically possess domain expertise specific to a research lab (e.g., systems biology).  In order to facilitate research across the social sciences, research engineers at DDSS instead focus on developing expertise in areas that span the social sciences: machine learning and data science/engineering. Machine learning is an important tool for transforming unstructured data—images, videos, graphs—into structured research inputs, while data science and engineering is critical to research projects dealing with large, complex datasets. RSEs use their knowledge and experience in these domains to craft, scale and automate research workflows.

 

Skills

Our group focuses on three core areas of engineering: machine learning, data science/engineering and open source software development. We program primarily in high-level languages such as Python, R and Julia, but also use React and TypeScript for application/user interface development. We strive to apply best practices to our software development process while not allowing perfection to impede research progress. All of our projects are hosted on GitHub and we use Docker to distribute applications.

Education

RSEs also serve an important educational role on campus. Like all researchers, we are constantly learning and are eager to share our knowledge with others. Research engineers do this through a combination of consultations with faculty, collaboration with graduate students and postdoctoral fellows, mentoring of research assistants and by hosting training workshops. We also attend and  present our work at research and development conferences to stay up-to-date on the latest in our areas of interest.

Projects

Secure Surveys

Sangyoon Park, Research Software Engineer

Survey participants often feel reluctant to share their true experience because they are worried about potential retaliation in case their responses are identified. This is especially the case for sensitive survey questions such as those asking about sexual harassment in the workplace…

TotalViewITCH.jl

Colin Swaney, Senior Research Software Engineer

Stock market participants submit billions of requests to buy and sell assets each day. Exchanges provide access to low-latency data feeds detailing market activity to market makers and high-frequency traders to inform algorithmic trading strategies. Nasdaq makes such data available…

NetworkHawkesProcesses.jl

Colin Swaney, Senior Research Software Engineer

Network Hawkes processes (Linderman, 2016) are a class of probabilistic models that combine multivariate Hawkes processes with networks models. In a multivariate Hawkes process, the likelihood of future events depends on the prior history of events, which gives rise to highly…

New Jersey Families Study

Colin Swaney, Senior Research Software Engineer

The New Jersey Families Study (NJFS) is a video ethnographic examination of how families support their children's early learning. It aims to further our understanding of early childhood development by providing researchers with access…

Secure Data Platform

Colin Swaney, Senior Research Software Engineer

Computational social science often relies on large datasets containing sensitive and/or proprietary data. Unfortunately, universities are often poorly equipped to support such datasets, and the solutions arrived at by researchers are often inefficient, insecure, and/or arrived at…

ML-as-a-Service

Colin Swaney, Senior Research Software Engineer

Machine learning is the key to turning unstructured data, such as video and text, into research inputs. However, machine learning is not typically a part of a social scientist’s training, nor can social scientists be expected to stay up-to-date with the latest methods in the field…