To be translational, data analytics must consider data context and meaning
By: Mark Moritz

Data analytics can be computationally powerful and sophisticated but not be translational or have much leverage in the real world. This is a cautionary note about the dangers of big data.

Let me start with a story.

Late at night, a police officer finds a drunk man crawling around on his hands and knees under a streetlight. The drunk man tells the officer he’s looking for his wallet. When the officer asks if he’s sure this is where he dropped the wallet, the man replies that he thinks he more likely dropped it across the street. “Then why are you looking over here?” the befuddled officer asks. “Because the light’s better here,” explains the drunk man (1).

Versions of this story have been told numerous times to illustrate what is called the “streetlight effect,” the tendency of researchers to study what is easy to study. I use this story in my course on Research Design and Ethnographic Methods (ANTHRO 8891) to explain why so much research on disparities in educational outcomes is done in classrooms and not in students’ homes. Children are much easier to study at school than in their homes, even though ethnographic studies show that knowing what happens in homes is critical for understanding why inequality persists in the U.S. (2). Nevertheless, schools will continue to be the focus of most research because they generate big data and homes don’t.

The streetlight effect is one factor that prevents big data studies from being translational—especially studies that use easily available user-generated data from the Internet. Researchers assume that this data offers a window into the world. However, they fool themselves if they think that can study the world with just tweets.

Based on the number of tweets following Hurricane Sandy, for example, it might seem as if the storm hit Manhattan the hardest, not the New Jersey shore (3). Another example: the since-retired Google Flu Trends, which tracked flu search activity in 2013 to predict doctor visits, but their estimates were twice as high as reports from the Centers for Disease Control and Prevention (4).

Big data may offer a window into the world, but the question is: what world?

The problem is similar to the “WEIRD” issue. Joseph Henrich and colleagues have shown that findings based on research conducted with undergraduates at American universities—whom they describe as “some of the most psychologically unusual people on Earth” (5)—apply only to that population and cannot be used to make any claims about other human populations, including other Americans; unlike the typical American research subject, they argue, most people in the world are not from Western, Educated, Industrialized, Rich and Democratic societies, i.e., WEIRD. Twitterers are similarly atypical compared with the rest of humanity, giving rise to what our postdoctoral researcher Sarah Laborde has dubbed the “WEIRDO” problem of data analytics (6): most people are not Western, Educated, Industrialized, Rich, Democratic and Online.

Fixating on the quantity of data is as problematic as fixating on an arbitrary p-value. The American Statistical Association recently released a Statement on Statistical Significance and P-Values that critiqued the focus on p-values as the sole indicator of good science (7). The authors write that, “the validity of scientific conclusions, including their reproducibility, depends on more than the statistical methods themselves”; equally important, they write, are research design, data collection, and the meaning and context of the data.

Considering the context and meaning of data is a key feature of ethnographic research, argues Michael Agar, who has written extensively about how ethnographers come to understand the world (8).

What makes research ethnographic? It is not just the methods. It starts with fundamental assumptions about the world, the first and most important of which is that people see and experience the world in different ways, i.e., they have different points of view (POV). Second, these differences result from growing up and living in different social and cultural contexts. This is why WEIRD people are not like any other people on earth.

The task of the ethnographer, then, is to translate the point of view of the people they study (POV1) into the point of view of their audience (POV2). Discovering other points of view requires an Iterative, Recursive and Abductive (IRA) approach in which ethnographers go through multiple rounds (I) of data collection and analysis (R) and incorporate concepts (A) from the people they study in the development of their theoretical models. The results are models with a high degree of external validity—something the analysis of big data frequently struggles to achieve.

Don’t get me wrong: I think big data is great. In our interdisciplinary research projects studying the ecology of infectious diseases (http://decml.osu.edu) and regime shifts in coupled human and natural systems (http://mlab.osu.edu/morsl), we are building our own big data sets. Of course, they are not as big as those generated by Twitter or Google users, but big enough that the analytical tools of complexity theory are useful to make sense of the data. Moreover, we know what the data represents, how it was collected, and what its limitations are. Understanding the context and meaning of the data allows us to check our findings against our knowledge of the world and validate our models.

For data analytics to be translational, it needs to be theory- or problem-driven, not simply data-driven. It should be more like ethnographic research, with data analysts getting out of their labs and engaging with the world that they aim to understand.

References cited

  1. D. H. Freeman, in Discover Magazine. (2010).
  2. A. Lareau, Unequal childhoods: class, race, and family life. (University of California Press, Berkeley (CA), 2003).
  3. A. Zoldan, in Wired. (2013).
  4. D. Lazer, R. Kennedy, G. King, A. Vespignani, The Parable of Google Flu: Traps in Big Data Analysis. Science 343, 1203-1205 (2014).
  5. J. Henrich, S. J. Heine, A. Norenzayan, Most people are not WEIRD. Nature 466, 29 (2010).
  6. S. Laborde, Twitter is not a social system. (The Ohio State University, 2016).
  7. R. L. Wasserstein, N. A. Lazar, The ASA’s statement on p-values: context, process, and purpose. The American Statistician, 00-00 (2016).
  8. M. Agar, An Ethnography by Any Other Name … . Forum Qualitative Sozialforschung / Forum: Qualitative Social Research 7, (2006).

 

About The Author

Dr. Moritz’s research focuses on the transformation of African pastoral systems. He examines how pastoralists adapt to changing ecological, political and institutional conditions that affect their lives and livelihoods. He has been conducting research with pastoralists in the Far North Region of Cameroon since 1993. The long-term research has resulted in strong collaborations with local researchers, which has allowed him to develop innovative, interdisciplinary research projects with colleagues at Ohio State and the University of Maroua in Cameroon.

Discussion

Add a Comment

To add a comment you must be signed in.

SIGN INREGISTER
Share this page
Suggested Articles
Blending computer science with science and life

The following also appears on biotechin.asia. I believe that computing shall be even more pervasive and ubiquitous than it is today and will evolve into a high-impact, use-inspired basic science1....

Small study finds autism symptoms improve after fecal transplants

Matthew Sullivan TDA affiliate Matthew Sullivan, associate professor of microbiology and civil, environmental, and geodetic engineering, has co-authored a study that suggests fecal transplants have positive effects on behavioral symptoms...

Undergrads and businesses meet in business analytics program

Fisher College of Business is home to a business analytics cluster that’s a hit both with students and with companies who sponsor the program. Students get hands-on experience creating real-world...

TDA seeking seed grant proposals; LOIs due Sept. 15

Translational Data Analytics is seeking seed grant proposals from teams that wish to form new, interdisciplinary teams to generate preliminary study concepts, technologies, data, and results encompassing data analytics. These...

Registration open for TDA Fall Forum Poster Session

Ohio State postdoctoral researchers, graduate and undergraduate students, faculty, and staff will present on their data analytics and decision science research at TDA’s second annual Fall Forum Research Poster Session on...