Ronnie05's Blog

Is Big Data in reality only a hyperbole?

Posted in Big Data, Semantic Media and Web, Semantic Web by Manas Ganguly on September 29, 2013

The World Economic Forum (WEF) calls it “the new oil” and “a new asset class”. The vast loads of data have been likened to transformative innovations like the steam locomotive, electricity grids, steel, air-conditioning and the radio.

Big Data

There were 30 billion gigabytes of video, e-mails, Web transactions and business-to-business analytics in 2005. The total is expected to reach more than 20 times that figure in 2013, according to Cisco. Cisco estimates that in 2012, some 2 trillion minutes of video alone traversed the internet every month.

What is sometimes referred to as the internet’s first wave — from the 1990s until around 2005 — brought completely new services like e-mail, the Web, online search and eventually broadband. The next one – connected the world into social grids giving people identity and voice. For its next act, the industry has pinned its hopes, and its colossal public relations machine, on the power of Big Data itself to supercharge the economy. Some call it Web 3.0, some call it Big Data.

Is Big Data pure hyperbole….

There is just one tiny problem: the economy is, at best, in the doldrums and has stayed there during the latest surge in Web traffic. The rate of productivity growth, whose steady rise from the 1970s well into the 2000s has been credited to earlier phases in the computer and internet revolutions, has actually fallen. The overall economic trends are complex, but an argument could be made that the slowdown began around 2005 — just when Big Data began to make its appearance.

All that promise of Big Data or even Social web hasn’t exactly fired the economic engines of the world as they were expected to. The promise is real – so why would such a disturbing trend start building – One theory holds that the Big Data industry is thriving more by cannibalising existing businesses than by creating fundamentally new opportunities. Online companies often eat up traditional advertising, media, music and retailing businesses, said Joel Waldfogel, an economist at the University of Minnesota. “One falls, one rises — it’s pretty clear the digital kind is a substitute to the physical kind,” he said. “So it would be crazy to count the whole rise in digital as a net addition to the economy.”

… or are these early days?

Other economists believe that Big Data’s economic punch is just a few years away, as engineers trained in data manipulation make their way through college and as data-driven start-ups begin hiring. And, of course, the recession could be masking the impact of the data revolution in ways economists don’t yet grasp. Still, some suspect that in the end our current framework for understanding Big Data and “the cloud” could be a mirage.

There is no disputing that a wide spectrum of businesses are now using huge amounts of data as part of their everyday business.

Josh Marks (CEO masFlight) helps airlines use enormous data sets to reduce fuel consumption and improve overall performance.Although his first mission is to help clients compete with other airlines for customers, Marks believes that efficiencies like those his company is chasing should eventually expand the global economy. For now, though, he acknowledges that most of the raw data flowing across the Web has limited economic value: far more useful is specialised data in the hands of analysts with a deep understanding of specific industries.

Some economists argue that it is often difficult to estimate the true value of new technologies, and that Big Data may already be delivering benefits that are uncounted in official economic statistics.

Also, infrastructure investments often take years to pay off in a big way, said Shane Greenstein, economist at Northwestern University. He cited high-speed internet connections laid down in the late 1990s that have driven profits only recently. But he noted that in contrast to internet’s first wave, which created services like the Web and e-mail, the impact of the second wave — the Big Data revolution — is harder to discern above the noise of broader economic activity.

… reproduced from Will big data prove to be an economic big dud?

Advertisements

How dynamic distributed computing and resource allocation is pushing the boundaries of modern computing?

Posted in Big Data by Manas Ganguly on March 15, 2013

Globally, Petabytes and Zettabytes are the new everyday normals for Data and Data Operators. In the era of consumer based massive media generation, web giants such as Google and Twitter are honing their skills at dynamic and distributed computing to serve the global demands of high-speed data realization. Here’s how?

ff_googleinfrastructure_f
A Google data centre

The raw computing power for responding and processing to billions of online requests comes through data centres – clusters and arrays of servers handling queries and searches. Google for instance works on Petabytes of data generated on a daily basis. Management and economics of data centres are key technology and business supports for running the internet. Hence, processes and techniques to enable data centre efficiecies are key to great internet experience and data delivery.

Towards this Google and Twitter, independently have been working on Dynamic distributed computing resource allocation systems. The term is used to imply efficient parcelling of work and applications across Google’s fleet of data centre and armies of computing servers. Google calls this system Borg and Twitter calls it Mesos. Google has a next generation system that is in the works called – Omega.

These systems provide a central brain for controlling tasks across the company’s data centers. Rather than building a separate cluster of servers for each software system — one for Google Search, one for Gmail, one for Google Maps, etc. — Google can erect a cluster that does several different types of work at the same time. All this work is divided into tiny tasks, and the system dynamically assigns these tasks wherever it can find free computing resources, such as processing power or computer memory or storage space i.e resources.

Underneath the concept of dynamic distribution of application/ computing load across sets of servers is the core- the microprocessor! Traditionally, the computer processor — the brain at the center of a machine — ran one task at a time. But a multi-core processor lets the programmer run many tasks in parallel. Basically, it’s a single chip that includes many processors, or processor cores. The numbers could be as high as 64 or 128 cores on the same processor – thereby multiplying the processing capability.

Thus Omega and Mesos let you run multiple distributed systems atop the same cluster of servers. Instead of running one server for Application 1 and the second server for Application 2- this now allows the same server to run both the applications at the same time. Complex computational processes and data hogging activities now are automatically alloted computing resources and (server) core structure to allow the data centre do multiple times activity and computation.

Yes, there are other ways of efficiently spreading workloads across a cluster of servers. One could use virtualization, where virtual servers can run atop physical machines and then load them with relevant software and applications. But with Borg and Mesos, the human element in juggling all those virtual machines is eliminated making the process automated!

Facebook, Big Data and Project Prism

Posted in Big Data by Manas Ganguly on August 24, 2012

Facebook processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. The speed of data ingestion keeps on increasing, and the world is getting hungrier and hungrier for data. Facebook’s latest effort is about putting all this data in some perspective, to mine this data for insights across different storage clusters with efficient use of resources and cost leading to real time live performance management on data outputs. And to achieve a seamless integration of data across huge data centres, Facebook has put in place initiatives such as Project Prism and Corona.

‘Project Prism,’ will allow Facebook to maintain data in multiple data centers around the globe while allowing company engineers to maintain a holistic view of it, thanks to tools such as automatic replication. Corona, makes its Facebooks’ Apache Hadoop clusters less crash-prone while increasing the number of tasks that can be run on the infrastructure.

So while Google is indexing information around the world, Facebook is indexing user behavior and reactions to a wide range of stimulus around the world. Now then, the only thing that Facebook would ideally want to fix is the ability to sell this data and get a good price for its share.

Tagged with: ,
%d bloggers like this: