Research Software Engineering at DDSS

Overview

Why RSEs?

Software development is an increasingly critical component of data-driven and computationally intensive social science. Advances in machine learning and computational statistics rely on detailed knowledge of computer languages such as Python and their package ecosystem, while the analysis and maintenance of large, messy datasets requires data science and data engineering expertise and know-how. Furthermore, in order to maximize their impact, novel methods must be transformed into stable and sustainable code. Research software engineers—RSEs—enable advances in social science research by partnering with researchers to support these requirements.

While relatively new to the social sciences, RSEs are an established presence in the physical, mathematical and engineering sciences. RSEs  contribute to research through  a combination of development skill, domain expertise, and education. Research engineers aim to develop high-quality, dependable code that “just works”— removing unnecessary distractions from researchers’ workflows. Because RSEs primary research output is software, they are able to concentrate on aspects of software development that researchers sometimes neglect: design, documentation, testing and automation (e.g., continuous integration).  These considerations lead to software that is easier for researchers to use, modify and maintain.

To learn more about what it's like to work with an RSE, please read Research Computing's RSE Partnership Guide.

Domain Expertise

In the physical sciences, RSEs typically possess domain expertise specific to a research lab (e.g., systems biology).  In order to facilitate research across the social sciences, research engineers at DDSS instead focus on developing expertise in areas that span the social sciences: machine learning and data science/engineering. Machine learning is an important tool for transforming unstructured data—images, videos, graphs—into structured research inputs, while data science and engineering is critical to research projects dealing with large, complex datasets. RSEs use their knowledge and experience in these domains to craft, scale and automate research workflows.

Skills

Our group focuses on three core areas of engineering: machine learning, data science/engineering and open source software development. We program primarily in high-level languages such as Python, R and Julia, but also use React and TypeScript for application/user interface development. We strive to apply best practices (external link) to our software development process while not allowing perfection to impede research progress. All of our projects are hosted on GitHub and we use Docker to distribute applications.

Some of the technologies our RSE group uses.

Education

RSEs also serve an important educational role on campus. Like all researchers, we are constantly learning and are eager to share our knowledge with others. Research engineers do this through a combination of consultations with faculty, collaboration with graduate students and postdoctoral fellows, mentoring of research assistants and by hosting training workshops. We also attend and  present our work at research and development conferences to stay up-to-date on the latest in our areas of interest.

Team

Projects

Secure Surveys

Sangyoon Park, Research Software Engineer

 

Survey participants often feel reluctant to share their true experience because they are worried about potential retaliation in case their responses are identified. This is especially the case for sensitive survey questions such as those asking…

TotalViewITCH.jl

Colin Swaney, Senior Research Software Engineer

 

Stock market participants submit billions of requests to buy and sell assets each day. Exchanges provide access to low-latency data feeds detailing market activity to market makers and high-frequency traders to inform algorithmic trading…

NetworkHawkesProcesses.jl

Colin Swaney, Senior Research Software Engineer

 

Network Hawkes processes (Linderman, 2016) are a class of probabilistic models that combine multivariate Hawkes processes with networks models. In a multivariate Hawkes process, the likelihood of future events depends on the prior history…

New Jersey Families Study

Colin Swaney, Senior Research Software Engineer

 

The New Jersey Families Study (NJFS) is a video ethnographic examination of how families support their children's early learning. It aims to further our understanding of early childhood…

Secure Data Platform

Colin Swaney, Senior Research Software Engineer & Eric Manning, Research Data Engineer

Computational social science often relies on large datasets containing sensitive and/or proprietary data. Unfortunately, universities are often…

ML-as-a-Service

Colin Swaney, Senior Research Software Engineer & Alice Fang, Research Software Engineer

 

Machine learning is the key to turning unstructured data, such as video and text, into research inputs. However, machine learning is…