Candidate talk: “Modeling Node Popularity”
Time: 3pm
Date: January 14, 2016
Location: 209 W. 18th Ave., Room 170

The College of Arts and Sciences’ Department of Statistics is hosting a presentation entitled “Modeling Node Popularity in Networks and Bootstrap Methods for Massive Data” by Srijan Sengupta, a PhD candidate at the University of Illinois at Urbana-Champaign. Mr. Sengupta is a candidate for a faculty position in the Department of Sociology.

When: Thursday, Jan. 14, 3 p.m.
Where: 209 W. 18th Ave., Room 170

Abstract: In this talk I will present recent work in two emerging areas of statistics, namely networks and big data.

Network data analysis is a rapidly growing research field in statistics, with a strong emphasis on the study of community structure using blockmodels. A network feature that is closely associated with community structure is the popularity of nodes in different communities. Neither the classical stochastic blockmodel nor its degree-corrected extension can satisfactorily capture the dynamics of node popularity. I will propose a popularity-adjusted blockmodel for flexible modeling of node popularity. I will establish consistency of likelihood modularity for community detection under the proposed model, and illustrate the improved empirical insights that can be gained through this methodology by analyzing the political blogs network and the British MP network.

The bootstrap is a popular and powerful method for assessing precision of estimators and inferential methods. However, for massive datasets which are increasingly prevalent, the bootstrap becomes prohibitively costly in computation even with modern computing platforms. Building on Bag of Little Bootstraps or BLB (Kleiner et al., 2014) and the idea of fast double bootstrap, I will propose a fast resampling method, the subsampled double bootstrap (SDB), for both independent data and time series data. The SDB is consistent under mild conditions for both independent and dependent cases. Methodologically, SDB is superior to BLB in terms of speed, sample coverage and automatic implementation for a given time budget. Its advantage relative to BLB and bootstrap is also demonstrated in simulations and data illustration.

Share this page
Upcoming Events
Researcher Networking: "Research, Short and Sweet"

5-7 p.m.
STEAM Factory, 400 W. Rich St.

ASA Symposium on Data Science and Statistics

Reston, Virginia