User-generated data is a social science goldmine
By: Robert Bond

As increasing amounts of digital data are produced and stored online it is important to remember that humans produce much of that data. In an era in which people express themselves on Facebook, follow companies on Twitter, allow their phones’ GPS to track their movements and their online retailers to track their buying decisions, social scientists have tremendous opportunity to help shape the analysis of large-scale data sources to understand human attitudes and behavior.

A particular area of strength among social scientists concerns measurement. In contrast to many other disciplines, social scientists are frequently concerned with a latent variable not easily observed, and must estimate it using some other, more easily observed quantity. For instance, we may be interested in studying how an individual’s political ideology has some effect on their political behavior—e.g., which candidates they support. It is quite difficult to observe something as nebulous as ideology, but social scientists have developed computational methods that we can apply to data to generate an estimate, including simple measures from surveys or behavioral measures like voting records for members of Congress. My own work has shown that digital traces in social media, such as “likes” for a politician on Facebook, can be used to measure ideology as well at a very large scale.

A second area of strength among social scientists concerns the development of causal theories and tests of causal inference. While causal inference is not unique to the social sciences, the problems inherent to developing causal models when considering people’s behavior are often different from those in other domains. For instance, humans select their environments, which makes causal inference difficult. To deal with this, social scientists develop theories that rely on assumptions about the world, along with a wide range of methodological tools, to make causal inference more tractable. In the case of big data, social scientists should play the role of advocating for well-defined theories of human behavior, and for making the assumptions underlying causal tests clear. If we fail to do so, we are likely to understand what the world looks like without having a clear understanding of why it came to be so.

Large-scale data sources also create opportunities for social scientists to conduct research at a scale previously not feasible. Digital traces humans leave behind through their interaction with computers, phones, smart watches, and other digital tools create enormous quantities of data that previously would have been cost prohibitive or impossible to collect. Further, with more people conducting more of their daily lives online, it is possible for social scientific studies to include millions of individuals at once. Through the use of large-scale sources, social scientists are able to study more subtle causal effects through increased statistical power and also to characterize the behavior of ever-larger proportions of the population, thereby using big data both to “zoom in” on small changes and to “zoom out” to examine the effects these small changes have at a societal level.

My particular area of expertise—the study of social networks—has benefited greatly from big data. Social network analysis requires the use of data that traditionally would have been difficult to collect and analyze due to its complexity. Big data and computational tools, however, have largely changed both of these processes. While we have always lived in a network, the ties between individuals have now become more explicit and are more easily tracked and quantified through online interaction, particularly social media. Each friend request we accept, comment we make, Twitter account we follow, or Snapchat we send potentially provides researchers with important information about the social environment we are in. Further, computational tools have advanced such that describing and analyzing a network of millions of individuals is a tractable problem. Not many years ago, either of these would have been impossible.

As our world becomes more computational, and as that change ushers in vast troves of new data about humans, it is critical that social scientists influence how these data are analyzed and the conclusions that are subsequently drawn. Such data offer abundant opportunities to study phenomena of interest at new scale and with increased precision. However, doing so will require careful thought about the processes that have created these data—not only the mechanical processes that translate data from a server to a monitor screen but also the processes through which humans create such data in the first place. If the methods and models we use to understand the data created by humans fail to account for how and why such data were created, we are unlikely to fully appreciate what this kind of data can tell us about human nature.

 

About The Author

Dr. Bond’s program of research covers political communication and behavior, particularly social influence processes. Frequently his work involves using large-scale data sources from social media to study political engagement, ideology, and turnout. In addition to these substantive areas, Dr. Bond works on methodological tools that help social science researchers analyze large-scale data sources.

Discussion

Add a Comment

To add a comment you must be signed in.

SIGN INREGISTER
Share this page
Suggested Articles
Invaluable collaborations with Fujitsu Labs

Fujitsu Laboratories of America helped advance TDAI’s twin priorities of workforce development and data science research on an individual level when it hosted two Ohio State computer scientists earlier this...

Big data in healthcare: Better care, better health

The utilization of big data analytics in healthcare is still in its infancy – partly due to the fact that healthcare data are complex, spread among multiple stakeholders, structured and...

Box-Steffensmeier elected to American Academy of Arts and Sciences

TDA affiliate Janet Box-Steffensmeier was among three Ohio State faculty elected to the 237th class of the American Academy of Arts and Sciences. Janet Box-Steffensmeier The 228 new members elected include some of...

Qin finishes fourth in worldwide IARPA challenge

Assistant Professor and TDA affiliate Rongjun Qin (shown at right) placed fourth out of 364 participants worldwide in the Multi-view Stereo 3D Mapping Master Challenge hosted by the Intelligence Advanced Research Projects...

Smartphone tracking shows fear affects where youth spend time

Over an hour less spent each day in areas with fearful residents By Jeff Grabmeier Chris Browning Youth spend less time in their neighborhoods if area residents have a high fear...