The role of the social sciences in data analytics
By: Bear Braumoeller

To the considerable surprise of those of us who cut our teeth on Minitab and had to derive the OLS estimator by hand, data analytics is a sexy topic these days. Led by the likes of Nate Silver at FiveThirtyEight, Ezra Klein at Vox, and Nate Cohn at the New York Times‘ Upshot, data nerds have made unprecedented forays into both the public sphere and the business world. At a time in which “truthiness” seems to have run amok, an increasingly widespread acceptance of the careful use of data in argumentation and analysis can only be welcome.

That’s the good news.

The bad news is that drawing meaningful inferences from data requires quite a bit more than simply looking at the data. It requires a rigorous understanding of the role of chance in producing outcomes, and it requires an understanding of how to bridge the gap between correlation and causation when working with observational data.

These are areas in which the social sciences can contribute greatly to data analytics. Social scientists are rapt consumers and, increasingly, producers of statistical methodologies to derive coherent conclusions from noisy data. And due to reasonable prohibitions on experimentation on human subjects, social scientists have long had to make do with observational data. In short, while many disciplines have encountered these issues, the social sciences have been plagued by them, and practitioners have developed a comparative advantage in dealing with them. For that reason, the social sciences are uniquely well-positioned to contribute to the further evolution of data analytics.

Three examples help to illustrate this point.

Vote fraud is a common problem in democratizing countries, but the efficacy of election monitors is difficult to gauge; because they tend to be sent to situations in which vote fraud is a major issue, the raw data could well show that elections with monitors are more corruption-prone than those without, even if election monitors are succeeding in reducing corruption. To deal with this problem, political scientist Susan Hyde took advantage of the fact that, in Armenia’s 2003 presidential elections, monitors from the Organization for Security and Cooperation in Europe were assigned to polling stations effectively at random. By examining the differences between results from polling stations with monitors and those without, Hyde is able to demonstrate that candidates who engage in fraud receive a significantly lower share of the votes in monitored polling stations than they do in unmonitored ones[1].

Other examples have to do with political attitudes. Data analysts, especially in the media, often attribute fluctuation in public opinion to changes in current events or recent statements by politicians. In contrast, a small but growing literature argues that political attitudes are remarkably stable over time—even across decades or centuries. Political scientists Avidit Acharya, Matthew Blackwell, and Maya Sen demonstrate that the prevalence of slavery in a county 150 years ago still has an impact on contemporary political attitudes[2], while economists Irena Grosfeld and Ekaterina Zhuravskaya demonstrate that the partitions of Poland in the late 18th century produced changes in political attitudes that persist to this day; Grosfeld and Zhuravskaya came to this conclusion by examining the spatial distribution of public opinion and discovering abrupt and significant discontinuities along the old lines of partition[3].

Even when data analysts do use sophisticated methods, they tend to see them as a collection of useful tools rather than as parts of a coherent body of knowledge. For that reason, they fail to realize that applying those tools without having the necessary background can do more harm than good. A recent example of this from my daily commute was an episode of the Data Skeptic podcast on the subject of Bayesian A/B testing (or split-sample hypothesis testing, for those of us not in the business world)[4]. The podcast’s host expressed excitement at the possible applications of the test and asked if there were general principles guiding its use, to which the guest replied, “Test as much as possible”—apparently unaware of the fact that doing so is a recipe for false-positive results, as the science-savvy web cartoon xkcd once pointed out. A business that followed this strategy would end up designing its strategy around statistical anomalies and flukes rather than meaningful results.

I certainly don’t mean to overstate the ability of social scientists to figure out what makes the world go around using only observational data: there will always be caveats, and no methodology or research design is totally ironclad. But the more I observe the incredible proliferation of data analytics in both the business world and the public sphere, the more convinced I become that their main shortcomings are exactly those areas in which the social sciences excel.

[1] Hyde, Susan (2011) “The Pseudo-Democrat’s Dilemma: Why Election Monitoring Became an International Norm.” Ithaca: Cornell University Press.

[2] Acharya, Avidit, Matthew Blackwell, and Maya Sen (2014) “The Political Legacy of American Slavery.” Harvard Kennedy School Faculty Research Working Paper Series RWP14-057.

[3] Grosfeld, Irena, and Ekaterina Zhuravskaya (2013) “Persistent effects of empires: Evidence from the partitions of Poland,” CEPR Discussion Paper 9371.

 

About The Author

Dr. Braumoeller’s research is in international security, especially systemic theories of international relations and the politics of the Great Powers, and political methodology, with an emphasis on complexity. He is currently involved in projects on evaluating the end-of-war thesis and on addressing the problem of endogeneity when estimating the impact of political institutions.

Discussion

Add a Comment

To add a comment you must be signed in.

SIGN INREGISTER
Share this page
Suggested Articles
Student research opportunity: Apps due April 14

The University of North Carolina – Charlotte is accepting applications for its NSF-funded Research Experiences for Undergraduates Program in the area of crime analytics. Ten undergraduates will be selected to work on data-driven research...

Qin finishes fourth in worldwide IARPA challenge

Assistant Professor and TDA affiliate Rongjun Qin (shown at right) placed fourth out of 364 participants worldwide in the Multi-view Stereo 3D Mapping Master Challenge hosted by the Intelligence Advanced Research Projects...

Small study finds autism symptoms improve after fecal transplants

Matthew Sullivan TDA affiliate Matthew Sullivan, associate professor of microbiology and civil, environmental, and geodetic engineering, has co-authored a study that suggests fecal transplants have positive effects on behavioral symptoms...

TDA seeks faculty for Nagoya exchange program

Translational Data Analytics is seeking faculty interested in participating in a new Data Research Faculty Exchange program with Nagoya University (NU), located in Nagoya, Japan. The program is designed to...

Panda team to provide big data computing expertise for neuroscience in NSF spoke project

DK Panda TDA affiliate Prof. Dhabaleswar K. (DK) Panda will serve as principal investigator of a project entitled Advanced Computational Neuroscience Network (ACNN) thanks to a new grant from the...