Big data is now and the future. Wherever you turn, there is a virtual deluge of big data that has led to the evolution of new technologies such as Hadoop, HDFC, Mapreduce and others. Business analytics derived from big data is a big data with large and medium global enterprises. For 100 jobs in big data, only two to three candidates qualify which means the field is promising to get into this rewarding area of IT. Books are your greatest help and you could start off with a few such as those described below.
If you want to know about big data from scratch, this is the book for you, especially for executives who can learn how to take advantage of big data and its growing importance to enterprises. Executives can learn to select the right bigdata solution, analytics and more through this book.
This is for IT personnel and programmers, giving them a head start in learning how to build and maintain Apache Hadoop based big data systems. The book includes case studies on Hadoop. MapReduce and its execution model YARN that enables you to learn how to store large data sets using Hadoop Distributed File Systems, use MapReduce, use data and in-out building blocks, overcome pitfalls in programming, design and develop Hadoop clusters and run it in the clouds and carry Pig based queries. IF you are serious about big data, this is one you must have.
The next logical step in advancement is to learn about relational database applications migration to Hadoop. You learn about Apache Hive and how to use the data warehouse infrastructure, Hive’s SQL syntax, configure Hive and use it within the Hadoop environment. You will learn to create and modify databases, customize data formats, storage, extraction, create user defined functions, hive patterns, storage handlers and integrating with other big data services.
Newbies as well as experienced Pig programmers gain a lot from this book that comprehensively covers the Pig Latin language. You learn about user defined functions in Pig 0.7 and upwards. Pig is an open source project based on Apache and is an engine used to execute parallel Hadoop data flows. Another must have for those intending to gain expertise in big data.
In this book you learn about Apache Hbase and how to use it to accommodate gigantic data streams. Hbase is Google’s big table architecture and is integrated with Hadoop and you learn how to distribute datasets across servers, about Hbase architecture, storage, format, processes, integrate with MapReduce and handle tables. Another book that should be on your list if you wish to become an expert in big data.
Books you should keep on your list
- Real-Time Big Data Analytics: Emerging Architecture by Mike Barlow
- Big Data by Nathan Marz O”Reilly
- Hadoop in Action by Chuck Lam (O’Reilly)
- Hadoop Mapreduce Cookbook by Srinath Perera
- Scaling Up Machine Learning: Parallel and Distributed Approaches by Ron Bekkerman
- Data Jujitsu: The Art of Turning Data into Product by D.J. Patil
- Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions
- by Michael Milton
In a competitive world big data will play a crucial role in decision making and learning the intricacies of big data puts you in the right place for a rewarding career in this promising field.