Big Data: Controlling the beast by its horns
I look for hot spots in the data, an outbreak of activity that I need to understand. It’s something you can only do with Big Data.” – Jon Kleinberg, a professor at Cornell
Researchers have found a spike in Google search requests for terms like “flu symptoms” and “flu treatments” a couple of weeks before there is an increase in flu patients coming to hospital emergency rooms in a region (and emergency room reports usually lag behind visits by two weeks or so).Global Pulse, a new initiative by the United Nations, wants to leverage Big Data for global development. The group will conduct so-called sentiment analysis of messages in social networks and text messages – using natural-language deciphering software – to help predict job losses, spending reductions or disease outbreaks in a given region. The goal is to use digital early-warning signals to guide assistance programs in advance to, for example, prevent a region from slipping back into poverty.
In economic forecasting, research has shown that trends in increasing or decreasing volumes of housing-related search queries in Google are a more accurate predictor of house sales in the next quarter than the forecasts of real estate economists.
Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, is that “many bits of straw look like needles.”
Data is tamed and understood using computer and mathematical models. These models, like metaphors in literature, are explanatory simplifications. They are useful for understanding, but they have their limits. A model might spot a correlation and draw a statistical inference that is unfair or discriminatory, based on online searches, affecting the products, bank loans and health insurance a person is offered, privacy advocates warn.
Despite the caveats, there seems to be no turning back. Data is in the driver’s seat. It’s there, it’s useful and it’s valuable, even hip. It’s a revolution. We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.