|
This book is a practical guide to building solutions with Apache Hadoop. Unlike most books on the subject that are ""a mile wide and an inch deep"", this book provides a deeper, code-level dive that shows how to use Hadoop technologies, in concert, to deliver real-world solutions. The authors provide in-depth code examples in Java and XML from applications that they have successfully built and deployed.
Topics include: Storing data with HDFS and Hbase Processing data with MapReduce and other technologies Automating data processing with Oozie Delivering real-time solutions with Hadoop Hadoop security Running Hadoop with Amazon Web Services And more.
The authors explain not just ""how it works"", but the when and why behind using these tools effectively. For example, they describe best practices for storing data and for calculations; customizing how data is read and executed; automating Hadoop processes in real-time; and building secure enterprise solutions that protect the company`s investment without sacrificing availability. The book also covers recent additions to the Hadoop ecosystem, including multiple namespaces and MapReduce2. Not only does this book cover the use of the APIs that various Hadoop systems are exposing, but exposes their inner workings, allowing architects and developers to better leverage and customize them.
Introduction
Chapter 1: Big Data and the Hadoop Ecosystem Big Data Meets Hadoop Hadoop: Meeting the Big Data Challenge Data Science in the Business World The Hadoop Ecosystem Hadoop Core Components Hadoop Distributions Developing Enterprise Applications with Hadoop Summary
Chapter 2: Storing Data in Hadoop HDFS HDFS Architecture Using HDFS Files Hadoop-Specific File Types HDFS Federation and High Availability HBase HBase Architecture HBase Schema Design Programming for HBase New HBase Features Combining HDFS and HBase for Effective Data Storage Using Apache Avro Managing Metadata with HCatalog Choosing an Appropriate Hadoop Data Organization for Your Applications Summary
Chapter 3: Processing Your Data with MapReduce Getting to Know MapReduce MapReduce Execution Pipeline Runtime Coordination and Task Management in MapReduce Your First MapReduce Application Building and Executing MapReduce Programs Designing MapReduce Implementations Using MapReduce as a Framework for Parallel Processing Simple Data Processing with MapReduce Building Joins with MapReduce Building Iterative MapReduce Applications To MapReduce or Not to MapReduce? Common MapReduce Design Gotchas Summary
Chapter 4: Customizing MapReduce Execution Controlling MapReduce Execution with InputFormat Implementing InputFormat for Compute-Intensive Applications Implementing InputFormat to Control the Number of Maps Implementing InputFormat for Multiple HBase Tables Reading Data Your Way with Custom RecordReaders Implementing a Queue-Based RecordReader Implementing RecordReader for XML Data Organizing Output Data with Custom Output Formats Implementing OutputFormat for Splitting MapReduce Job`s Output into Multiple Directories Writing Data Your Way with Custom RecordWriters Implementing a RecordWriter to Produce Outputtar Files Optimizing Your MapReduce Execution with a Combiner Controlling Reducer Execution with Partitioners Implementing a Custom Partitioner for One-to-Many Joins Using Non-Java Code with Hadoop Pipes Hadoop Streaming Using JNI Summary
Chapter 5: Building Reliable MapReduce Apps Unit Testing MapReduce Applications Testing Mappers Testing Reducers Integration Testing Local Application Testing with Eclipse Using Logging for Hadoop Testing Processing Applications Logs Reporting Metrics with Job Counters Defensive Programming in MapReduce Summary
Chapter 6: Automating Data Processing with Oozie Getting to Know Oozie Oozie Workflow Executing Asynchronous Activities in Oozie Workflow Oozie Recovery Capabilities Oozie Workflow Job Life Cycle Oozie Coordinator Oozie Bundle Oozie Parameterization with Expression Language Workflow Functions Coordinator Functions Bundle Functions Other EL Functions Oozie Job Execution Model Accessing Oozie Oozie SLA Summary
Chapter 7: Using Oozie Validating Information about Places Using Probes Designing Place Validation Based on Probes Designing Oozie Workflows Implementing Oozie Workflow Applications Implementing the Data Preparation Workflow Implementing Attendance Index and Cluster Strands Workflows Implementing Workflow Activities Populating the Execution Context from a java Action Using MapReduce Jobs in Oozie Workflows Implementing Oozie Coordinator Applications Implementing Oozie Bundle Applications Deploying, Testing, and Executing Oozie Applications Deploying Oozie Applications Using the Oozie CLI for Execution of an Oozie Application Passing Arguments to Oozie Jobs Using the Oozie Console to Get Information about Oozie Applications Getting to Know the Oozie Console Screens Getting Information about a Coordinator Job Summary
Chapter 8: Advanced Oozie Features Building Custom Oozie Workflow Actions Implementing a Custom Oozie Workflow Action Deploying Oozie Custom Workflow Actions Adding Dynamic Execution to Oozie Workflows Overall Implementation Approach A Machine Learning Model, Parameters, and Algorithm Defining a Workflow for an Iterative Process Dynamic Workflow Generation Using the Oozie Java API Using Uber Jars with Oozie Applications Data Ingestion Conveyer Summary
Chapter 9: Real-Time Hadoop Real-Time Applications in the Real World Using HBase for Implementing Real-Time ApplicationsISBN - 9788126551071
|
|
Pages : 504
|