Partnership Eases Data
Transfers to, from Campus
A new Purdue-specific Globus portal is
now available for Purdue faculty, staff and
students who need to transfer large amounts
of research data to and from campus, around
the country or around the globe.
Users can set up a Globus account through
the Purdue Globus portal, located at https://
transfer.rcac.purdue.edu, and then sign in
with their Purdue user name and password.
Globus, which bills itself as something like
Dropbox for scientists, offers an easy, fast
secure way to move large data sets through
an intuitive Web-based interface. The system
takes advantage of Purdue’s upgraded, faster
campus research network and its high-speed
connection to fast national and international
research networks like the Internet2.
Purdue’s Globus service also is integrated
with the Research Data Depot, a new state-of-the-art research data storage system from
ITaP designed, configured and operated for
the needs of Purdue researchers.
For more information on the Globus portal
or Research Data Depot, email rcac-help@
purdue.edu or contact Preston Smith,
manager of research support for ITaP Research
Computing (RCAC), email@example.com or
Hadoop Cluster Now Available for Purdue
Researchers Analyzing Big Data
Professor William Cleveland and colleagues analyze terabytes of cybersecurity
data looking for new ways to identify and combat spammers, data thieves and
other Internet bad guys.
In his investigations, Cleveland employs the popular and versatile R statistical
programming language along with Hadoop, which stores and processes huge data
sets on cluster supercomputers.
Hadoop helps the researchers break up big problems, solve many pieces at the
same time on supercomputers and merge the results into a unified answer. “That
enables a major speedup, enough to make what we do practical,” says Cleveland,
the Shanti S. Gupta Professor of Statistics.
Now, ITaP Research Computing (RCAC) is making a stand-alone cluster specially
set up for Hadoop jobs available to any Purdue researcher involved in big data
analysis. The new cluster is named Hathi, Hindi for elephant and the name of an
elephant character in “The Jungle Book.” (Hadoop’s mascot is an elephant.)
Both Cleveland and Preston Smith, ITaP Research Computing (RCAC) manager
of research support, expect the resource to be popular. Big data analysis, after all,
is the name of the game today.
“This is happening across all of science and technology and business,” says Cleveland, whose lab plays a leading role in the development of Tessera, an open source
environment combining R and Hadoop to enable deep analysis of large complex
Adds Smith, “Hadoop is widely used and useful to probably anybody that is
working with large amounts of data, especially unstructured data.”
For more information, visit www.rcac.purdue.edu, email firstname.lastname@example.org
or contact Smith, email@example.com or 49-49729. Faculty interested in using
the new Hadoop cluster can email firstname.lastname@example.org.