Big Data is a term recently invented that applies to huge amounts of information. Big data has been around since quite some time but only recently with the advent of extremely powerful low cost computer has this filtered down to general use. In comparison to normal data stored in databases, big data is extremely gigantic in proportion approaching sizes of terabytes or even Exabyte’s, collected at frequent intervals. Big data may not be structured and could be random collection of information including machine data, transactions taking place, social data, and web visit data and so on. Consider that 90% of existing data has been created in the last two years and data production will increase by 50% in 2020 compared to 2010 growing by 2.5 billion GB every day. Today an average person processes more data in a day than the average person did in his entire lifetime in the 1500s. Information and analytics based on big data will power business in the future, helping raise efficiency and generate revenues. Big data is growing in importance and value, with related technologies and markets assessed to be in excess of USD 16 billion by 2015.
Big data sources
The largest contributors to big data are social media, CRM data, web visit analysis such as those on Facebook and Twitter where hits are in millions; data collected from machinery, web logs, data generated during scientific, meteorology, environmental and biological experiments and researches; tracking e-commerce data and transactional data generated by business trade portals.
Big data covers huge amount of complex information that can prove useful after extensive analysis. Big data is essentially unstructured, large and does not fit in with standard database relational systems. Data can be so large, as when derived from a number of sensors, that querying it would be difficult with normal tools. On the other hand, voluminous data sets could contain essentially basic and small information, against as happens in the case of sensor derived data. In some cases such as mobile telephony or web streaming, data is big but defining parameters can be broken down to a few items.
Big data processing
A typical big data set running into terabytes or Exabyte’s cannot be processed the regular way using traditional relational database management systems. Big data needs hundreds of servers running parallel software simultaneously to come up with meaningful analysis and statistics after sifting through the voluminous information. When it comes to big data volume, speed of processing it and the data types involved assume significance. According to Gartner, a prime research firm, big data could be high speed high volume or high variety information necessitating newer and advanced processing techniques to achieve the objective of discovery and decision making. Big data is different from business intelligence. Whereas the latter uses descriptive statistics as a forecasting tool, big data relies on inductive statistics based on non-linear system to draw inferences, show dependencies and predict behaviors. Handling such unstructured data calls for new open source technologies like Hadoop that permits data processing across multiple systems. Google too has been active in coming up with a framework termed MapReduce that delineates a parallel programming model to handle big data. The framework automatically distributes data to connected nodes for parallel processing. Hadoop is essentially based on Mapreduce. With such hardware infrastructure data of the order of exabytes can be processed and analyzed in real time. SAP’s Hana is another tool to handle big data as is the analytic platform Arcplan. NoSQL is yet another technology that departs from the conventional to handle big data sets where the focus is on storing, accessing and reading huge data sets. The gameplan changes when companies wish to handle and analyze big data calling for radically new and more sophisticated technologies in terms of hardware and software or modifying existing systems to become capable.
Future and career prospects
With a steeply accelerating growth curve big data is the thing of the future necessitating specially trained manpower all along the data chain including managers and analysts. Those planning a career in IT could plan on specializing in big data since there is going to a huge demand and shortage not only in the US but all over the world. This segment is worth over $ 100 billion with giants such as Oracle, IBM, SAP, EMC, HP and Microsoft deeply involved in its growth. Internet, mobile telephony, web transactions and science will drive big data growth. EMC’s Bill Schmarzo goes as far as to say that big data is evolutionary and a game changer and that if companies do not incorporate it as part of their business strategy, they are likely to drop into oblivion. Big data, according to him is not an end to itself but an enabler for business.
As the internet grows; as automated sensors are even more widely deployed; as people use social media and online buying even more and sophisticated technologies for research the amount of big data will grow exponentially, necessitating more intense investment in terms of resources, manpower and technologies. Just as cloud technologies paved the way for a whole new paradigm shift in manpower and technologies, big data will dictate future modes of operation and is a high growth area for all stakeholders.
Image courtesy of nuttakit at FreeDigitalPhotos.net