Home >> Perspective >> Beowulf Cluster

In 1996, health-care analysts at the University of Virginia announced research results that contradicted a long-held assumption about a particular emergency room procedure. The procedure, called right-heart catheterization, was commonly administered to critically ill patients during their first 24 hours of hospitalization, and was thought to lead to better outcomes. Using conventional statistical techniques, the 1996 study demonstrated that the procedure does the opposite, actually increasing death rates.

That study, like most complex assessments, used parametric modeling methods. It targeted a particular question or questions, built certain assumptions into the analysis, and thus incorporated potential limitations. Under normal circumstances, it is pragmatically necessary to focus a study this way. To do otherwise—to engage in an open-ended consideration of a topic area without regard for a specific query—would require analyzing mountains of data (only a fraction of it ultimately relevant), factoring in a mind-boggling web of possible implications.

Thanks to computers, however, the impossible is possible. In 2001, a group of analysts that included Maxwell economist Jeffrey Racine used computationally intensive nonparametric statistical methods to reassess the efficacy of right-heart catheterization, using data obtained from the University of Virginia. The approach they developed involved feeding large amounts of data on the procedure into a supercomputer, with a minimum of accompanying assumptions. To everyone’s surprise, this analysis yielded a completely different result from the 1996 study.  “Once we eliminated the rigid assumptions that were built into the original analysis, a completely different conclusion emerged,” says Racine. “We found that the procedure, if anything, lowered the death rate for critically ill patients.”

Nonparametric modeling, which teases out patterns inherent in data (absent human assumptions), sometimes has that sort of remediating effect. But, until recently, the enormous cost of supercomputing placed nonparametric analysis beyond the reach of many institutions.

This is about to change at the Maxwell School. With a grant of $162,810 from the National Science Foundation, Syracuse University has purchased a computer cluster (housed in Maxwell’s Center for Policy Research) that does the same sophisticated data analysis that formerly only supercomputers could do. It’s become practical to bypass a traditional supercomputer in favor of what’s called a “Beowulf cluster.”

A Beowulf cluster uses open-source software and libraries (developed by a government/private sector consortium in the 1990s) to link and coordinate off-the-shelf processors to achieve power comparable to that of traditional supercomputers. (The third fastest computer in the world today is, in fact, a cluster.) And a cluster does this at a fraction of the cost of a supercomputer.

Racine says that at least nine faculty members and 20 to 30 graduate students will benefit initially, with the number growing as other Maxwell researchers develop projects that take advantage of the Beowulf cluster’s capabilities.

“CPR has a number of faculty members  and students who work with numerically intensive econometric methods or who routinely struggle with computational aspects of modeling large datasets,” he says. “With this cluster computer, we will be able to teach students not only the theory of these methods but how to apply them to large, real-world databases like those with which they will work after graduation.”

For his part, Racine has planned a number of projects for the cluster. One involves verifying, via computer simulation, certain theoretical conjectures regarding new nonparametric estimators; another, reassessing union and gender wage gaps using these methods.

—Jill Leonhardt

This article appeared in the Fall 2003 print edition of Maxwell Perspective; © 2003 Maxwell School of Syracuse University. To request a copy, e-mail dlcooke@maxwell.syr.edu.




Contacts & Copyright / Text-Only Pages