Language: English
Computers Data Mining Data Processing Data Warehousing Database Management General Internet Java Open Source Operating Systems Programming Programming Languages Quality Assurance & Testing Software Development & Engineering Tools Xml
Publisher: Manning
Published: Oct 10, 2012
Description:
Summary
Hadoop in Practice collects 85 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you'll face, like querying big data using Pig or writing a log file loader. You'll explore each problem step by step, learning both how to build and deploy that specific solution along with the thinking that went into its design. As you work through the tasks, you'll find yourself growing more comfortable with Hadoop and at home in the world of big data.
About the Technology
Hadoop is an open source MapReduce platform designed to query and analyze data distributed across large clusters. Especially effective for big data systems, Hadoop powers mission-critical software at Apple, eBay, LinkedIn, Yahoo, and Facebook. It offers developers handy ways to store, manage, and analyze data.
About the Book
Hadoop in Practice collects 85 battle-tested examples and presents them in a problem/solution format. It balances conceptual foundations with practical recipes for key problem areas like data ingress and egress, serialization, and LZO compression. You'll explore each technique step by step, learning how to build a specific solution along with the thinking that went into it. As a bonus, the book's examples create a well-structured and understandable codebase you can tweak to meet your own needs.
This book assumes the reader knows the basics of Hadoop.
Purchase of the print book comes with an offer of a free PDF, ePub, and Kindle eBook from Manning. Also available is all code from the book.
What's Inside
About the Author Table of Contents
PART 1 BACKGROUND AND FUNDAMENTALS
PART 2 DATA LOGISTICS
PART 3 BIG DATA PATTERNS
Streamlining HDFS for big data
Diagnosing and tuning performance problems
PART 4 DATA SCIENCE
PART 5 TAMING THE ELEPHANT
Programming pipelines with Pig
Crunch and other technologies
About the Author
Alex Holmes is a senior software engineer with over 15 years of experience developing large scale distributed Java systems. For the last four years he has gained expertise in Hadoop solving Big Data problems across a number of projects. He has presented at JavaOne and Jazoon and is currently a technical lead at VeriSign.