Data center opens on campus

Feb. 23, 2010, 1:06 a.m.

The Institute for Research in the Social Sciences (IRiSS) opened a secure data center on campus last Friday, providing a select group of Stanford researchers local access to detailed census and health statistics.

The new facility is a branch of the University of California’s Census Research Data Center (CCRDC), located in UC-Berkeley.

Researchers access the census information from a stark room in the data center. This highly valued information covers demographics including age, education, employment, disability, citizenship, race and income.

Given the data’s sensitive nature, many precautions have been taken to maintain the center’s airtight security.

“All there are is terminals and screens — there are no output devices,” said Chris Thomsen, executive director of IRiSS. “There’s no printer, there’s no place to plug in a disk or memory chip or anything like that.”

And no data is actually on site — it’s all stored at a Maryland server farm.

If researchers want to take home data, the information has to be nonspecific and cleared by a Census Bureau administrator before it is printed outside the terminal room. This assures no confidential information gets out the door.

To get access to the center, researchers submit proposals to a Census Bureau official, who splits his time between Berkeley and Stanford. The official reviews and then forwards the proposals to Washington.

Proposals that are ultimately approved are for studies that have scientific merit, help improve Census Bureau data collection and utility, and cannot be done without the confidential data.

It took economics professor Nick Bloom about 18 months to get cleared. As part of the process, he went through a two-hour interview and a background check.

Despite the hassle, “I think it’s the right thing to do,” Bloom said. “I mean, the data’s incredibly high quality and if it wasn’t kept secure then future people wouldn’t want to reply to the Census Office.”

“I say it’s easier to get a gold bar out of Fort Knox than [to] get data out of the Census,” Bloom added.
Before the Stanford center opened, researchers had to drive 40 miles across the Bay to the Berkeley center, which was established in 1998.

“That’s not easy,” Thomsen said. “And so obviously it’s nice to have it a lot closer to them on campus.”
Before any of the nine electronic research centers opened, trawling for confidential census data required a much longer trip than the drive to Berkeley.

“What it meant for most people was that you had to travel back to Washington, D.C., and actually work within the building where the data were located,” said C. Matthew Snipp, faculty director of the Secure Data Center.

The Stanford center has been available for about a year to researchers who needed confidential data stored from other sources. Snipp expects that survey firms will be more willing to license similar sensitive data to Stanford researchers for study in the future.

Researchers can bring whatever information they want to the center and “store it in a secure facility and know that it’s safe and can vouch to the providers that it’s in a protected area,” said Snipp.

“It’s incredibly valuable,” Bloom said, alluding to the center’s wealth of statistical information.

According to Bloom, the “census tracks individuals and individual firms in a way you just couldn’t do previously.”

Thomsen is “aware of three faculty and a couple grad students” using the census data, and there are several other social sciences faculty and graduate students working with other secure data.

Among them is Bloom, who is investigating the credit crunch and the causes behind the recession. One of the best ways to study this, he said, is to examine business’ sales in census data from previous recessions. This information is confidential for businesses that are not traded on a stock exchange.

“In the census you have something like 50,000 firms a year . . . [for which] we have detailed output sales data, and you just don’t have that [publicly available],” Bloom said.

“You’ve just got so much more information,” he emphasized.

Bloom said research in the social sciences is moving from just theorizing to theorizing and testing data at the same time.

Today, the Secure Data Center is a small-scale, low-maintenance operation.

“Right now we’re not actually costing much, because this is within IRiSS, and so far we’re just using existing staff time to manage it, with the exception for the census center,” Thomsen said.

The census center is supported by access fees charged to researchers.

The only IRiSS employee working at the center is a research technician who counts secure data management among many other responsibilities.

“IRiSS is involved in providing research infrastructure for the social sciences,” said Snipp. “And so it’s just another element within the larger institute in providing a facility or place for people who need data.”

In the past few years, there have been fewer CCRDC projects at the Berkeley center than desired.

“There have been less than half a dozen over the past couple of years,” said Jon Stiles, CCRDC Director of Research, as opposed to the 10 to 15 active projects at other data centers around the country.

“We’re certainly hoping that will increase,” he added.

Although only a few researchers are working at the Stanford data center today, Thomsen is optimistic that the CCRDC will become a popular future resource.

“We think it’s going to become more and more valuable, because the amount of data is expanding,” Thomsen said.

He further observed that this increase is matched by growing interest in secure data and microdata.

“But at the same time, there are increasing concerns about protecting privacy and protecting [the] security of the data,” Thomsen said.

Ultimately, however, he believes that the census system “is a quite good one, and an effective way to allow for the research, and yet protects things from getting out.”



Login or create an account