Video description
In Video Editions the narrator reads the book while the content, figures, code listings, diagrams, and text appear on the screen. Like an audiobook that you can also watch as a video.
A great guide to building data platforms from the ground up!
Mike Jensen, Arcadia
Centralized data warehouses, the long-time defacto standard for housing data for analytics, are rapidly giving way to multi-faceted cloud data platforms. Companies that embrace modern cloud data platforms benefit from an integrated view of their business using all of their data and can take advantage of advanced analytic practices to drive predictions and as yet unimagined data services. Designing Cloud Data Platforms is a hands-on guide to envisioning and designing a modern scalable data platform that takes full advantage of the flexibility of the cloud. As you read, you'll learn the core components of a cloud data platform design, along with the role of key technologies like Spark and Kafka Streams. You'll also explore setting up processes to manage cloud-based data, keep it secure, and using advanced analytic and BI tools to analyze it.
about the technology
Well-designed pipelines, storage systems, and APIs eliminate the complicated scaling and maintenance required with on-prem data centers. Once you learn the patterns for designing cloud data platforms, you’ll maximize performance no matter which cloud vendor you use.
about the book
In Designing Cloud Data Platforms, Danil Zburivsky and Lynda Partner reveal a six-layer approach that increases flexibility and reduces costs. Discover patterns for ingesting data from a variety of sources, then learn to harness pre-built services provided by cloud vendors.
what's inside
- Best practices for structured and unstructured data sets
- Cloud-ready machine learning tools
- Metadata and real-time analytics
- Defensive architecture, access, and security
about the audience
For data professionals familiar with the basics of cloud computing, and Hadoop or Spark.
about the author
Danil Zburivsky has over 10 years of experience designing and supporting large-scale data infrastructure for enterprises across the globe. Lynda Partner is the VP of Analytics-as-a-Service at Pythian, and has been on the business side of data for over 20 years.
A comprehensive overview of cloud data platforms and a valuable resource.
Ubaldo Pescatore, Generali Business Solutions
A clear, concise, and useful guide…provides a great introduction to architectures and tools across the entire spectrum of applications and platforms.
Ken Fricklas, Google
A practical and realistic view of the architecture, challenges, and patterns of a cloud data platform.
Hugo Cruz, People Driven Technology
NARRATED BY CHRISTOPHER KENDRICK
Table of Contents
Chapter 1. Introducing the data platform
Chapter 1. Data warehouses struggle with data variety, volume, and velocity
Chapter 1. Data lakes to the rescue?
Chapter 1. Cloud, data lakes, and data warehouses: The emergence of cloud data platforms
Chapter 1. Processing layer
Chapter 1. How the cloud data platform deals with the three V’s
Chapter 1. Two more V’s
Chapter 2. Why a data platform and not just a data warehouse
Chapter 2. An example cloud data warehouse–only architecture
Chapter 2. Ingesting data
Chapter 2. Processing data
Chapter 2. Accessing data
Chapter 3. Getting bigger and leveraging the Big 3: Amazon, Microsoft Azure, and Google
Chapter 3. Data ingestion layer
Chapter 3. Fast and slow storage
Chapter 3. Technical metadata layer
Chapter 3. Orchestration and ETL overlay layers, Part 1
Chapter 3. Orchestration and ETL overlay layers, Part 2
Chapter 3. The importance of layers in a data platform architecture
Chapter 3. AWS, Part 1
Chapter 3. AWS, Part 2
Chapter 3. Google Cloud, Part 1
Chapter 3. Google Cloud, Part 2
Chapter 3. Azure
Chapter 3. Open source and commercial alternatives
Chapter 3. Summary
Chapter 4. Getting data into the platform
Chapter 4. Files
Chapter 4. Ingesting data from relational databases
Chapter 4. Full-table ingestion
Chapter 4. Incremental table ingestion
Chapter 4. Change data capture (CDC)
Chapter 4. CDC vendors overview
Chapter 4. Data type conversion
Chapter 4. Ingesting data from NoSQL databases
Chapter 4. Capturing important metadata for RDBMS or NoSQL ingestion pipelines
Chapter 4. Ingesting data from files
Chapter 4. Tracking ingested files
Chapter 4. Ingesting data from streams
Chapter 4. Differences between batch and streaming ingestion
Chapter 4. Ingesting data from SaaS applications
Chapter 4. Connecting other networks to your cloud data platform
Chapter 5. Organizing and processing data
Chapter 5. Organizing your cloud storage
Chapter 5. Cloud storage containers and folders
Chapter 5. Common data processing steps
Chapter 5. File format conversion
Chapter 5. Data deduplication
Chapter 5. Data quality checks
Chapter 5. Configurable pipelines
Chapter 6. Real-time data processing and analytics
Chapter 6. Use cases for real-time data processing
Chapter 6. When should you use real-time ingestion and/or real-time processing?
Chapter 6. Organizing data for real-time use
Chapter 6. How does fast storage scale?
Chapter 6. Organizing data in the real-time storage
Chapter 6. Common data transformations in real time
Chapter 6. Deduplicating data in real-time systems
Chapter 6. Converting message formats in real-time pipelines
Chapter 6. Cloud services for real-time data processing
Chapter 6. Google Cloud real-time processing services
Chapter 6. Azure real-time processing services
Chapter 7. Metadata layer architecture
Chapter 7. Taking advantage of pipeline metadata
Chapter 7. Metadata model
Chapter 7. Metadata domains, Part 1
Chapter 7. Metadata domains, Part 2
Chapter 7. Metadata layer implementation options
Chapter 7. Metadata database
Chapter 7. Overview of existing solutions
Chapter 7. Open source metadata layer implementations, Part 1
Chapter 7. Open source metadata layer implementations, Part 2
Chapter 8. Schema management
Chapter 8. Schema-management approaches
Chapter 8. Schema management in the data platform, Part 1
Chapter 8. Schema management in the data platform, Part 2
Chapter 8. Monitoring schema changes
Chapter 8. Existing Schema Registry implementations
Chapter 8. Schema evolution scenarios
Chapter 8. Schema evolution and data transformation pipelines
Chapter 8. Schema evolution and data warehouses
Chapter 8. Schema-management features of cloud data warehouses
Chapter 9. Data access and security
Chapter 9. AWS Redshift
Chapter 9. Azure Synapse
Chapter 9. Google BigQuery
Chapter 9. Application data access
Chapter 9. Cloud key/value data stores
Chapter 9. Machine learning on the data platform
Chapter 9. ML cloud collaboration tools
Chapter 9. Data security
Chapter 10. Fueling business value with data platforms
Chapter 10. The analytics maturity journey
Chapter 10. The data platform: The engine that powers analytics maturity
Chapter 10. User adoption
Chapter 10. The dollar dance