Video description
"Transcends individual tools or platforms. Required reading for anyone working with big data systems."
Jonathan Esterhazy, Groupon
Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this Video Editions book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.
Inside:- Introduction to big data systems
- Real-time processing of web-scale data
- Tools like Hadoop, Cassandra, and Storm
- Extensions to traditional database skills
This Video Editions book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.
Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.
A comprehensive, example-driven tour of the Lambda Architecture with its originator as your guide.
Mark Fisher, Pivotal
Contains wisdom that can only be gathered after tackling many big data projects. A must-read.
Pere Ferrera Bertran, Datasalt
The de facto guide to streamlining your data pipeline in batch and near-real time.
Alex Holmes, Author of "Hadoop in Practice"
NARRATED BY MARK THOMAS AND CHRIS PENICK
Table of Contents
A NEW PARADIGM FOR BIG DATA
Chapter 1. A new paradigm for Big Data
Chapter 1. Scaling with a traditional database
Chapter 1. NoSQL is not a panacea
Chapter 1. The problems with fully incremental architectures
Chapter 1. Lambda Architecture
Chapter 1. Batch and serving layers satisfy almost all properties
Chapter 1. Recent trends in technology
PART 1 BATCH LAYER
Chapter 2. Data model for Big Data
Chapter 2. Data is raw
Chapter 2. Data is immutable
Chapter 2. The fact-based model for representing data
Chapter 2. Graph schemas
Chapter 3. Data model for Big Data: Illustration
Chapter 3. Tying everything together into data objects
Chapter 4. Data storage on the batch layer
Chapter 4. Storing a master dataset with a distributed filesystem
Chapter 5. Data storage on the batch layer: Illustration
Chapter 5. Data storage in the batch layer with Pail
Chapter 5. Storing the master dataset for SuperWebAnalytics.com
Chapter 6. Batch layer
Chapter 6. Recomputation algorithms vs. incremental algorithms
Chapter 6. Scalability in the batch layer
Chapter 6. Low-level nature of MapReduce
Chapter 6. Pipe diagrams: a higher-level way of thinking about batch computation
Chapter 7. Batch layer: Illustration
Chapter 7. An introduction to JCascalog
Chapter 7. Grouping and aggregators
Chapter 7. Composition
Chapter 8. An example batch layer: Architecture and algorithms
Chapter 8. Workflow overview
Chapter 8. Deduplicate pageviews
Chapter 9. An example batch layer: Implementation
Chapter 9. URL normalization
PART 2 SERVING LAYER
Chapter 10. Serving layer
Chapter 10. The serving layer solution to the normalization/denormalization problem
Chapter 10. Designing a serving layer for SuperWebAnalytics.com
Chapter 10. Contrasting with a fully incremental solution
Chapter 10. Comparing to the Lambda Architecture solution
Chapter 11. Serving layer: Illustration
Chapter 11. Building the serving layer for SuperWebAnalytics.com
PART 3 SPEED LAYER
Chapter 12. Realtime views
Chapter 12. Storing realtime views
Chapter 12. Challenges of incremental computation
Chapter 12. Asynchronous versus synchronous updates
Chapter 13. Realtime views: Illustration
Chapter 14. Queuing and stream processing
Chapter 14. Stream processing
Chapter 14. Higher-level, one-at-a-time stream processing
Chapter 14. Guaranteeing message processing
Chapter 14. SuperWebAnalytics.com speed layer
Chapter 14. Topology structure
Chapter 15. Queuing and stream processing: Illustration
Chapter 15. Implementing the SuperWebAnalytics.com uniques-over-time speed layer
Chapter 16. Micro-batch stream processing
Chapter 16. Micro-batch processing topologies
Chapter 16. Core concepts of micro-batch stream processing
Chapter 16. Extending pipe diagrams for micro-batch processing
Chapter 16. Bounce-rate analysis
Chapter 16. Another look at the bounce-rate-analysis example
Chapter 17. Micro-batch stream processing: Illustration
Chapter 17. Finishing the SuperWebAnalytics.com speed layer
Chapter 17. Fully fault-tolerant, in-memory, micro-batch processing
Chapter 18. Lambda Architecture in depth
Chapter 18. Batch and serving layers
Chapter 18. Incremental batch processing - part 1
Chapter 18. Incremental batch processing - part 2
Chapter 18. Measuring and optimizing batch layer resource usage
Chapter 18. Speed layer