Facebook, Big Data and Project Prism
Facebook processes 2.5 billion pieces of content and 500+ terabytes of data each day. It’s pulling in 2.7 billion Like actions and 300 million photos per day, and it scans roughly 105 terabytes of data each half hour. The speed of data ingestion keeps on increasing, and the world is getting hungrier and hungrier for data. Facebook’s latest effort is about putting all this data in some perspective, to mine this data for insights across different storage clusters with efficient use of resources and cost leading to real time live performance management on data outputs. And to achieve a seamless integration of data across huge data centres, Facebook has put in place initiatives such as Project Prism and Corona.
‘Project Prism,’ will allow Facebook to maintain data in multiple data centers around the globe while allowing company engineers to maintain a holistic view of it, thanks to tools such as automatic replication. Corona, makes its Facebooks’ Apache Hadoop clusters less crash-prone while increasing the number of tasks that can be run on the infrastructure.
So while Google is indexing information around the world, Facebook is indexing user behavior and reactions to a wide range of stimulus around the world. Now then, the only thing that Facebook would ideally want to fix is the ability to sell this data and get a good price for its share.