Dr. Vijay Gadepally will present a guest lecture entitled “Data management tools to enable complex applications.”
About the speaker: Dr. Vijay Gadepally is a senior member of the technical staff at MIT Lincoln Laboratory and CSAIL. Vijay’s research is in the area of high performance computing, big data/IoT systems, security, analytics, and advanced database technologies. He holds a PhD in Electrical and Computer Engineering from The Ohio State University and a BTech degree in Electrical Engineering from the Indian Institute of Technology (IIT), Kanpur. In 2017, Vijay received the Early Career Technical Achievement Award at MIT Lincoln Laboratory and was named to AFCEA’s inaugural 40 under 40 list. In 2011, Vijay received an Outstanding Graduate Student Award at The Ohio State University. Vijay has also worked at Raytheon Company and Rensselaer Polytechnic Institute.
Abstract: Applying the latest and greatest machine learning algorithm to your data science problem is fun! Unfortunately, life rarely gives us datasets that are curated and well defined enough to directly apply many of the great tools being developed by the wider community. One often spends the first few weeks (sometimes months) trying to figure out how to store data, deal with inconsistencies in the dataset and determine which algorithm (or set of algorithms) will be best suited for their application. Larger, faster and messier datasets such as those from IoT sensors, medical devices or autonomous vehicles only compound these issues. These challenges, often referred to as the 3 V’s of Big Data, require new tools for data management and data cleaning/pre-processing.
In this talk I will describe a few tools developed at MIT’s Lincoln Laboratory and CSAIL to address these challenges. The first is the BigDAWG polystore system. The BigDAWG system allows users to mix and match database technologies in order to support diverse data management operations. For example, in a single application, one may have pieces of a dataset in a relational database, a key-value store and an array database. This allows users to develop complex analytics that leverage highly efficient underlying data stores. The second tool I will discuss is Graphulo – a toolbox that allows people to perform graph operations directly in key-value store databases such as Apache Accumulo. For both, I will describe performance and future research avenues.