A Beginner’s Guide to Architecting Big Data Applications
Video description
Whether you’re a data engineer who needs to plan and implement a big data pipeline or a manager interested in learning how tools in the Hadoop technology stack address business goals, these videos will walk you through how to plan your big data solution. You’ll receive an introduction to the concepts of Apache Hadoop, and training on key components including Apache HBase, YARN, Cassandra, Kafka, and Spark.
A Beginner’s Guide to Architecting Big Data Applications
Video description
Whether you’re a data engineer who needs to plan and implement a big data pipeline or a manager interested in learning how tools in the Hadoop technology stack address business goals, these videos will walk you through how to plan your big data solution. You’ll receive an introduction to the concepts of Apache Hadoop, and training on key components including Apache HBase, YARN, Cassandra, Kafka, and Spark.
HDFS Architecture - The Name Node And The Data Nodes
Parallel Performance
What Is Yarn? - Scalable Compute
Yarn: Plug-In Processing Engines
Overview Of MapReduce
Using Different Languages
Options For Data Input
Importing Data
The Hadoop Client
Overview Of Sqoop
Overview Of Flume
Other Import Tools
Hadoop Tools
What Is Pig?
What Is Hive?
Comparing Hive To SQL
Hive Architecture
What Is HCatalog?
Hive Interfaces
Apache Storm
Apache Spark
Hadoop Security
Overview Of Oozie
Mahout
HBase And Other Data Stores: Hbase, Accumulo, Etc.
Apache Kafka
Cluster Management
Conclusion
Distributions And Where To Go From Here
Conclusion
Introduction
Course Agenda And Instructor
Core Hadoop Components
Basic Overview Of Hadoop Core Components: HDFS
Hadoop Core Components Overview
What Is Map/Reduce?
YARN: Components And Architecture
Pre-YARN Architecture
YARN Architecture And Daemons
Scheduling, Running And Monitoring Applications In YARN
Running Jobs In YARN
YARN Parameters
YARN Cluster Resource Allocation
Failure Handling
YARN Logs
Hands On With YARN
Conclusion
Summary
Introduction
What Is HBase
What To Expect
About The Author
Administration Basics
HBase Deployment Architecture
HBase Fault Tolerance
Hardware Recommendations
Software Recommendations
HBase Deployment At Scale
Installation With Cloudera Manager
Basic Static Configuration
Rolling Restarts And Upgrades
Interacting With HBase
Troubleshooting
Trouble Shooting Methodology
Trouble Shooting Distributed Clusters
Administration From The Command Line
Using The HBase UI
Using The Metrics
Using The Logs
Tuning
Basic HBase Tuning
Generating Load And Load Test Tool
Generating With YCSB
Region Tuning
Table Storage Tuning
Memory Tuning
Tuning With Failures
Tuning For Modern Hardware
Operations Continuity
Operational Continuity
Corruption: hbck
Corruption: Other Tools
Security
Security Demo
Backups: Snapshots
Backups: Import / Export / Copy Table
Cluster Replication
Ecosystem
HBase Proxy Servers, Thrift And Rest
Hue
HBase With Apache Phoenix
Conclusion
Wrapup And Thank You
Introduction To Cassandra
Introducing The Course
Understanding What Cassandra Is
Learning What Cassandra Is Being Used For
Understanding The System Requirements
Opening The Main Virtual Machine
Pop Quiz - Intro to Cassandra
Getting Started With The Architecture
Understanding That Cassandra Is A Distributed Database
Learning What Snitch Is For
Learning What Gossip Is For
Learning How Data Gets Distributed
Learning About Replication
Learning About Virtual Nodes
Pop Quiz - Getting Started with Architecture
Installing Cassandra
Downloading Cassandra
Ensuring Oracle Java 7 Is Installed
Installing Cassandra
Viewing The Main Configuration File
Providing Cassandra With Permission To Directories
Starting Cassandra
Checking Status
Accessing The Cassandra system.log File
Pop Quiz - Installing Cassandra
Communicating With Cassandra
Understanding Ways To Communicate With Cassandra
Using CQLSH
Pop Quiz - Communicating with Cassandra
Creating A Database
Understanding A Cassandra Database
Defining A Keyspace
Deleting A Keyspace
Pop Quiz - Creating a Database
Lab: Create A Second Database
Creating A Table
Creating A Table
Defining Columns And Data Types
Defining A Primary Key
Recognizing A Partition Key
Specifying A Descending Clustering Order
Pop Quiz - Creating a Table
Lab: Create A Second Table
Inserting Data
Understanding Ways To Write Data
Using The INSERT INTO Command
Using The COPY Command
How Data Is Stored In Cassandra
How Data Is Stored On Disk
Pop Quiz - Inserting Data
Lab: Insert Data
Modeling Data
Understanding Data Modeling In Cassandra
Using A WHERE Clause
Understanding Secondary Indexes
Creating A Secondary Index
Defining A Composite Partition Key
Pop Quiz - Modeling Data
Creating An Application
Understanding Cassandra Drivers
Exploring The DataStax Java Driver
Setting Up A Development Environment
Creating An Application Page
Acquiring The DataStax Java Driver Files
Getting The DataStax Java Driver Files Through Maven
Providing The DataStax Java Driver Files Manually
Connecting To A Cassandra Cluster
Executing A Query
Displaying Query Results - Part 1
Displaying Query Results - Part 2
Using An MVC Pattern
Pop Quiz - Creating an Application
Lab: Create A Second Application - Part 1
Lab: Create A Second Application - Part 2
Lab: Create A Second Application - Part 3
Updating And Deleting Data
Updating Data
Understanding How Updating Works
Deleting Data
Understanding Tombstones
Using TTLs
Updating A TTL
Pop Quiz - Updating and Deleting Data
Lab: Update And Delete Data
Selecting Hardware
Understanding Hardware Choices
Understanding RAM And CPU Recommendations
Selecting Storage
Deploying In The Cloud
Pop Quiz - Selecting Hardware
Adding Nodes To A Cluster
Understanding Cassandra Nodes
Having A Network Connection - Part 1
Having A Network Connection - Part 2
Having A Network Connection - Part 3
Specifying The IP Address Of A Node In Cassandra
Specifying Seed Nodes
Bootstrapping A Node
Cleaning Up A Node
Using cassandra-stress
Pop Quiz - Adding Nodes to a Cluster
Lab: Add A Third Node
Monitoring A Cluster
Understanding Cassandra Monitoring Tools
Using Nodetool
Using JConsole
Learning About OpsCenter
Pop Quiz - Monitoring a Cluster
Repairing Nodes
Understanding Repair
Repairing Nodes
Understanding Consistency - Part 1
Understanding Consistency - Part 2
Understanding Hinted Handoff
Understanding Read Repair
Pop Quiz - Repairing Nodes
Lab: Repair Nodes For A Keyspace
Removing A Node
Understanding Removing A Node
Decommissioning A Node
Putting A Node Back Into Service
Removing A Dead Node
Pop Quiz - Removing a Node
Lab: Put A Node Back Into Service
Redefining A Cluster For Multiple Data Centers
Redefining For Multiple Data Centers - Part 1
Redefining For Multiple Data Centers - Part 2
Changing Snitch Type
Modifying cassandra-rackdc.properties
Changing Replication Strategy - Part 1
Changing Replication Strategy - Part 2
Pop Quiz - Redefining a Cluster
Resources For FurTher Learning
Accessing Documentation
Reading Blogs And Books
Watching Video Recordings
Posting Questions
Attending Events
Wrap Up
The Case for Kafka
The Basics
Setting up a Kafka Cluster
Writing a Kafka Producer
Writing a Kafka Consumer
Using Kafka from Python
Troubleshooting Kafka
Integrating Kafka and Hadoop with Flafka
Kafka Availability and Consistency
Kafka Ecosystem
Future of Kafka
Pre-Flight Check
Spark Deconstructed
A Brief History
Simple Spark Apps
Spark Essentials
Spark Examples
Unifying the Pieces - Spark SQL
Unifying the Pieces - Spark Streaming
Unifying the Pieces - MLlib and GraphX
Unified Workflows Demo
The Full SDLC
Developer Certification
Resources
Introduction - Why DataFrames?
ETL to Prepare the Data from Capital Bikeshare
Create a DataFrame, Explore using SQL
Data Preparation for Machine Learning Models
Build a Classifier Using Naive Bayes
Build a Classifier Using Decision Trees
Build a Classifier Using Random Forests
Use a DataFrame to Compare Models
Parquet as a Best Practice with DataFrames
How to Store a DataFrame with Parquet
How to Read a DataFrame Back in From Parquet
Use SQL to Estimate Route Durations
Data Preparation for GraphX - Model Route Costs
Use PageRank to Rank Popular Stations
Optimize Routes to Columbus Circle
Compare Results with Google Maps
Analyze a Popular Tourist Route
Examples of How to Use DataFrames in Python
Summary - The New DataFrames Features in Spark
Introduction
About Alluxio And The Course
About The Author
Using Alluxio Locally
Downloading Alluxio
Starting The System Locally
Interacting Via The Shell
Browsing The Web UI
Examples With Alluxio
Setting Up Alluxio With Spark And S3
Running Spark on Alluxio with S3
Using Alluxio With Unified Namespace
Deploying Alluxio On A Cluster
Deploying Alluxio In AWS
Conclusion
Contributing To The Project And Conclusion
Start your Free Trial Self paced Go to the Course We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.
This site uses cookies. By continuing to use this website, you agree to their use.I Accept