Panda lab’s open-source libraries give big data users high-performance speeds

In the world of big data, computer technology holds the keys to the castle—computer software that’s built to store, sort, and interpret information collected in data analytics projects. The better the computer and software, the more efficiently it operates to access and assess data.

DK Panda

DK Panda

The Network Based Computer Laboratory (NBCL) at Ohio State, led by Dhabaleswar Panda, PhD, Professor and Distinguished Scholar of Computer Science and Engineering, has created innovative technology that dramatically improves efficiency in data analysis. Called HiBD, or High Performance Big Data, it is a set of libraries geared for use with applications using big data middleware such as Hadoop and Spark. When used with open-standard InfiniBand networking technology, HiBD permits modern clusters of computers running such applications to work efficiently together to process data like a supercomputer.

Users that include 165 organizations worldwide have downloaded NBCL’s HiBD software packages more than 16,350 times, including well-known entities such as Flickr, HP, IBM, Intel, LinkedIn, and Oracle as well as international institutions such as The Chinese Academy of Sciences and the Swiss National Supercomputing Center.

The origins of HiBD stem from an earlier project NBCL developed for high-performance computing with clusters. When InfiniBand networking technology was launched in 2000, NBCL created the first open source software stack to put the technology to work in supercomputing with commodity clusters of computers. Called MVAPICH (now MVAPICH2), that software stack, which caters to more traditional applications written with Message Passing Interface middleware, has had more than 373,000 downloads since its inception, and has been used by institutions such as NASA and the United States Air Force and Army.

The focus on newer big data applications, says Panda, was a natural step as the field of data analytics grew in academic popularity. It started with a question, he says: “We asked, how can we bring our knowledge about high-performance computing into the field of big data?”

The answers to that question—HiBD and MVAPICH—will continue to evolve as different technology and programming models become available, Panda says.

Unlike many other innovations in supercomputing, both HiBD and MVAPICH are free to any person or institution who might want to download them. As such, the software stacks developed by NBCL often get integrated into other packages from developers that might be viewed as competitors. But, as Panda says, “The community knows where the software libraries came from.”

Share this page
Suggested Articles
Bench to use BETHA grant to catalog modern dance

Harmony Bench Harmony Bench, PhD, Assistant Professor of Dance History and Theory, has received funding from the Battelle Engineering, Technology and Human Affairs (BETHA) Endowment to employ data analytics to...

Watch: Hyder presents on new air pollution project

TDA affiliate Ayaz Hyder and his collaborator, Andy May, presented Jan. 11 on “Translational Data Analytics for Environmental Health: Sensors, Cloud Computer, and Patients” as part of the South Big Data...

TDA seeking seed grant proposals; LOIs due Sept. 15

Translational Data Analytics is seeking seed grant proposals from teams that wish to form new, interdisciplinary teams to generate preliminary study concepts, technologies, data, and results encompassing data analytics. These...

Watch now: AWS Agriculture Analysis in the Cloud Day at Ohio State

View from the stage: AWS’s Jed Sundwall (left) snaps a group selfie at the start of AWS Agriculture Analysis in the Cloud Day at Ohio State TDA and Amazon Web...

TDA welcomes 13 new faculty members to Ohio State

With the arrival of the 2015-16 academic year, TDA welcomed the following new faculty members to Ohio State (shown top row, left to right): Woo-Young Ahn (Psychology), Dena Asta (Statistics), Elizabeth Bond (History), Mehmet...