50 Hours of Big Data, PySpark, AWS, Scala, and Scraping
Video description
Learn, build, and execute big data strategies with Scala and Spark, PySpark and AWS, data scraping and data mining with Python, and master MongoDB
About This Video
Data scraping and data mining for beginners to pro with Python
Clear unfolding of concepts with examples in Python, Scrapy, Scala, PySpark, and MongoDB
Master Big Data with PySpark and AWS
In Detail
Part 1 is designed to reflect the most in-demand Scala skills. It …
50 Hours of Big Data, PySpark, AWS, Scala, and Scraping
Video description
Learn, build, and execute big data strategies with Scala and Spark, PySpark and AWS, data scraping and data mining with Python, and master MongoDB
About This Video
Data scraping and data mining for beginners to pro with Python
Clear unfolding of concepts with examples in Python, Scrapy, Scala, PySpark, and MongoDB
Master Big Data with PySpark and AWS
In Detail
Part 1 is designed to reflect the most in-demand Scala skills. It provides an in-depth understanding of core Scala concepts. We will wrap up with a discussion on Map Reduce and ETL pipelines using Spark from AWS S3 to AWS RDS (includes six mini-projects and one Scala Spark project).
Part 2 covers PySpark to perform data analysis. You will explore Spark RDDs, Dataframes, a bit of Spark SQL queries, transformations, and actions that can be performed on the data using Spark RDDs and dataframes, the ecosystem of Spark and Hadoop, and their underlying architecture. You will also learn how we can leverage AWS storage, databases, computations, and how Spark can communicate with different AWS services.
Part 3 is all about data scraping and data mining. You will cover important concepts such as Internet Browser execution and communication with the server, synchronous and asynchronous, parsing data in response from the server, tools for data scraping, Python requests module, and more.
In Part 4, you will be using MongoDB to develop an understanding of the NoSQL databases. You will explore the basic operations and explore the MongoDB query, project and update operators. We will wind up this section with two projects: Developing a CRUD-based application using Django and MongoDB and implementing an ETL pipeline using PySpark to dump the data in MongoDB.
By the end of this course, you will be able to relate the concepts and practical aspects of learned technologies with real-world problems.
Audience
This course is designed for absolute beginners who want to create intelligent solutions, study with actual data, and enjoy learning theory and then putting it into practice. Data scientists, machine learning experts, and drop shippers will all benefit from this training.
A basic understanding of programming, HTML tags, Python, SQL, and Node JS is required. However, no prior knowledge of data scraping, and Scala is needed.
Understanding the Response (flags, certificate, ip_address, copy)
Understanding the Response (replace, urljoin, follow, follow_all)
Response CSS and Scrapy Shell
Extracting quotes with Scrapy
Understanding Nested Selectors
Extracting the Author and Quotes
Checking for Next Page
Checking for Next Page in Spider
Checking for Next Page URL
Scraping Quotes from Next Pages
Exporting Extracted Data
Quiz (Get the Tags)
Solution (Get the Tags)
Next Website
CSS Selectors for Movie Names and URLs
Combined CSS Selectors for Movie Names and URLs
Sent Request to the Film Info Page
Merge Data from Two Callbacks
Extracting Movie Duration and Genres
Exporting the Extracted Data
Quiz (Extracting the Year)
Solution (Extracting the Year)
Getting Director Name and URL
Getting Top Four Movies of Directors
Extracting Data Anomaly (dont_filter Flag)
Chapter 6 : Scrapy Project
Hugo Boss Website for Scraping
Understanding Site Structure
Writing CSS Selectors for Listings
Listings in Scrapy Shell
Sending Request to Listings URLs
Extracting Products URL from the Listings
Sending Requests to Products of the Listings
Writing CSS to Get the Product Info
Getting the Bigger Images of the Product
Checking Next Page URL
Adding Pagination to Spider and Running It
Output of the Spider
Chapter 7 : Selenium
Introduction to Selenium
Getting Started with Selenium
Configuring the Webdriver
Extracting Quotes with Selenium
Extracting Quotes and Author Names
Quiz (Extracting Quotes)
Solution (Extracting Quotes)
Clicking on Button
Pagination and Extracting Data
Exception Handling for Unavailable Element
Navigating the Website for Login
Quiz (Login and Extract Quote)
Solution (Login and Extract Quote)
Chapter 8 : Project Selenium
Overview of Project
Closing the Cookie Button
Setting the Language for Translation
Sending the Text for Translation
Downloading the Translation
Reading Data from File for Translation
Chapter 9 : Part 2 - Scala and Spark - Master Big Data with Scala and Spark
Why Scala
Scala Applications
About the Instructor
Introduction to Scala and Spark Section
Projects Overview for Scala and Spark
Chapter 10 : Scala Overview
What is Scala
Scala Setup (Local Machine)
Scala Setup (Online)
Variables in Scala
Arithmetic Operations on Variables-1
Arithmetic Operations on Variables-2
Quiz (Arithmetic Operations)
Solution (Arithmetic Operations)
Quiz (Strings)
Solution (Strings)
Type Casting
Taking Input from User
Quiz (User Input and Type Casting)
Solution (User Input and Type Casting)
Chapter 11 : Flow Control
Overview of Control Statements
If Else Statements
Conditions in If
Quiz (If Statement)
Solution (If Statement)
Nested If Else
Quiz (Nested If Else)
Solution (Nested If Else)
Logical Operators
Quiz (Logical Operators)
Solution (Logical Operators)
If Else If
Quiz (If Else If)
Solution (If Else If)
Overview of Loops
Overview of While Loop
While Loop
Quiz (While Loop)
Solution 1 (While Loop)
Solution 2 (While Loop)
Do While Loop
For Loop
Quiz 1 (For Loop)
Solution 1 (For Loop)
Quiz 2 (For Loop)
Solution 2 (For Loop)
Break
Break Fix
Project Overview for Flow control
Project Solution Design
Project Solution Code 1
Project Solution Code 2
Project Solution Code 3
Project Solution Code 4
Chapter 12 : Functions
Overview of Functions
Writing Addition Function
Quiz (Basic Function)
Solution (Basic Function)
Functions Common Issues
Named Arguments
Quiz (String Concatenation Function)
Solution (String Concatenation Function)
Quiz (Dividing Code in Functions)
Solution (Dividing Code in Functions)
Default Arguments
Quiz (Default Arguments)
Solution (Default Arguments)
Anonymous Functions
Quiz (Anonymous Functions)
Solution (Anonymous Functions)
Scopes
Project Overview for Functions
Checking Credentials
Prompting the menu
Basic Functions
Breaking Code in More Functions
Final Run (Functions)
Chapter 13 : Classes
Introduction to Classes
Creating Class
Class Constructor
Functions and Classes
Project Overview for Classes
Basic Structure
Final Run
Chapter 14 : Data Structures
Introduction of Data Structures
Lists Introduction
Lists Create and Delete Elements
Lists Take
ListBuffer Introduction
Add Data in ListBuffer
Remove Data from ListBuffer
Take Data from ListBuffer
Project Overview for Data Structures
Project Architecture Discussion
Project Architecture Implementation
User Input for Objects
Implementing the Control Flow
Creating Required Functions Inside Class
Overview of Maps
Creating Maps
Check Key in Map
Update Value in Map
Add and Remove Items from Maps
Iterating on Maps
Project Overview for Data Structures
Project Architecture for Data Structures
Project Structure Code
Using Maps for Word Count
Final Run
Sets Overview
Add and Remove Item from the Set
Set Operations
Overview of Stack
Push and Pop in Stack
Stack Attributes
Project Overview
Project Architecture
Extra Closing Bracket Use Case
Extra Starting Bracket Use Case
Chapter 15 : Project for Scala and Spark
Project Introduction
Why Spark
Hadoop Ecosystem
Spark Architecture
Spark Ecosystem
DataBricks Account
Setting up DataBricks Cluster
Spark Local Setup
Spark Hadoop Setup
Spark RDDs
Spark RDDs (textFile, collect)
Spark Local Run
Understanding Map
Understanding Flat Map
Understanding Reduce by Key
Word Count Example
Spark DFs
Spark DF Read Data
Spark Print Schema, Select
Spark GroupBy
Spark DF Write
Creating S3 Bucket
Creating Database in RDS
Performing ETL
Chapter 16 : Part 3 - PySpark and AWS - Master Big Data with PySpark and AWS
Why Big Data
Applications of PySpark
Introduction to Instructor
Introduction to Course
Projects Overview
Chapter 17 : Introduction to Hadoop, Spark Ecosystems and Architectures
Why Spark
Hadoop Ecosystem
Spark Architecture and Ecosystem
DataBricks Signup
Create DataBricks Notebook
Download Spark and Dependencies
Java Setup on Windows
Python Setup on Windows
Spark Setup on Windows
Hadoop Setup on Windows
Running Spark on Windows
Java Download on MAC
Installing JDK on MAC
Setting Java Home on MAC
Java check on MAC
Installing Python on MAC
Set Up Spark on MAC
Chapter 18 : Spark RDDs
Spark RDDs Introduction
Creating Spark RDD
Running Spark Code Locally
RDD Map (Lambda)
RDD Map (Simple Function)
Quiz (Map)
Solution 1 (Map)
Solution 2 (Map)
RDD FlatMap
RDD Filter
Quiz (Filter)
Solution (Filter)
RDD Distinct
RDD GroupByKey
RDD ReduceByKey
Quiz (Word Count) with Spark RDDs
Solution (Word Count) with Spark RDDs
RDD (Count and CountByValue)
RDD (saveAsTextFile)
RDD (Partition)
Finding Average-1
Finding Average-2
Quiz (Average)
Solution (Average)
Finding Min and Max
Quiz (Min and Max)
Solution (Min and Max)
Project Overview for Spark RDDs
Total Students
Total Marks by Male and Female Student
Total Passed and Failed Students
Total Enrolments Per Course
Total Marks Per Course
Average Marks Per Course
Finding Minimum and Maximum Marks
Average Age of Male and Female Students
Chapter 19 : Spark DFs
Introduction to Spark DFs
Creating Spark DFs
Spark Infer Schema
Spark Provide Schema
Create DF from RDD
Rectifying the Error
Select DF Columns
Spark DF withColumn
Spark DF withColumnRenamed and Alias
Spark DF Filter Rows
Quiz (select, withColumn, filter)
Solution (select, withColumn, filter)
Spark DF (Count, Distinct, Duplicate)
Quiz (Distinct, Duplicate)
Solution (Distinct, Duplicate)
Spark DF (sort, orderBy)
Quiz (sort, orderBy)
Solution (sort, orderBy)
Spark DF (Group By)
Spark DF (Group By - Multiple Columns and Aggregations)
Spark DF (Group By -Visualization)
Spark DF (Group By - Filtering)
Quiz (Group By)
Solution (Group By)
Quiz (Word Count) with Spark DFs
Solution (Word Count) with Spark DFs
Spark DF (UDFs)
Quiz (UDFs)
Solution (UDFs)
Solution (Cache and Persist)
Spark DF (DF to RDD)
Spark DF (Spark SQL)
Spark DF (Write DF)
Project Overview
Project (Count and Select)
Project (Group By)
Project (Group By, Aggregations, and Order By)
Project (Filtering)
Project (UDF and WithColumn)
Project (Write)
Chapter 20 : Collaborative Filtering
Introduction to Collaborative Filtering
Utility Matrix
Explicit and Implicit Ratings
Expected Results
Dataset
Joining Dataframes
Train and Test Data
ALS Model
Hyperparameter Tuning and Cross Validation
Best Model and Evaluate Predictions
Recommendations
Chapter 21 : Spark Streaming
Introduction to Spark Streaming
Spark Streaming with RDD
Spark Streaming Context
Spark Streaming Reading Data
Spark Streaming Cluster Restart
Spark Streaming RDD Transformations
Spark Streaming DF
Spark Streaming Display
Spark Streaming DF Aggregations
Chapter 22 : ETL Pipeline
Introduction to ETL
ETL Pipeline Flow
Dataset with ETL Pipeline
Extracting Data
Transforming Data
Loading Data (Creating RDS-I)
Load Data (Creating RDS-II)
RDS Networking
Downloading Postgres
Installing Postgres
Connect to RDS Through PgAdmin
Loading Data
Chapter 23 : Project - Change Data Capture / Replication On Going
Introduction to Project
Project Architecture
Creating RDS MySQL Instance
Creating S3 Bucket
Creating DMS Source Endpoint
Creating DMS Destination Endpoint
Creating DMS Instance
MySQL WorkBench
Connecting with RDS and Dumping Data
Querying RDS
DMS Full Load
DMS Replication Ongoing
Stopping Instances
Glue Job (Full Load)
Glue Job (Change Capture)
Glue Job (CDC)
Creating Lambda Function and Adding Trigger
Checking Trigger
Getting S3 File Name in Lambda
Creating Glue Job
Adding Invoke for Glue Job
Testing Invoke
Writing Glue Shell Job
Full Load Pipeline
Change Data Capture Pipeline
Chapter 24 : Part 4 - MongoDB-Mastering MongoDB for Beginners (Theory and Projects)
Why MongoDB
Applications of MongoDB
Instructor Introduction
What’s Inside
Methodology
Project
Chapter 25 : Overview
SQL Schema
NoSQL Schema
Installing MongoDB
Setting Environment Variable
Analogies
Chapter 26 : Basic Mongo Operations
Basic Database commands Part 1
Basic Database commands Part 2
Basic Collection Commands
Introduction to Module
Create Document (Single)
Create Documents (Many)
Quiz (Create Documents)
Solution (Create Documents)
Quiz (Create Document)
Solution (Create Document)
Outro
Chapter 27 : Basic Update Operation
Introduction
Update Documents (Single Filter)
Update Documents
Quiz 1 (Update Operation)
Solution 1 (Update Operation)
Quiz 2 (Update Operation)
Solution 2.1 (Update Operation)
Solution 2.2 (Update Operation)
Outro
Chapter 28 : Basic Read Operation
Introduction
Read Documents
Quiz 1 (Read Documents)
Solution 1 (Read Documents)
Quiz 2 (Read Documents)
Solution 2 (Read Documents)
Outro
Chapter 29 : Basic Delete Operation
Introduction
Delete Document
Quiz 1 (Delete Operation)
Solution 1 (Delete Operation)
Quiz 2 (Delete Operation)
Solution 2 (Delete Operation)
Outro
Chapter 30 : Query and projection operators
Module Introduction
$eq Operator
$gt Operator
$lt Operator
$in Operator
$ne Operator
$nin operator
$and Operator
$or Operator
$not Operator
$exists Operator
$types Operator
$expr Operator
$mod Operator
$text Operator
$all Operator
$elemMatch Operator
$size Operator
$ Operator
$slice Operator
Quiz ($eq)
Solution ($eq)
Quiz ($gt)
Solution ($gt)
Quiz ($gte)
Solution ($gte)
Quiz ($in)
Solution ($in)
Quiz ($lt)
Solution ($lt)
Quiz ($lte)
Solution ($lte) Part F10401
Solution ($lte)
Quiz ($ne)
Solution ($ne)
Quiz ($nin)
Solution ($nin) Part 1
Solution ($nin) Part 2
Solution ($nin) Part 3
Quiz ($and)
Solution ($and)
Quiz ($or)
Solution ($or) Part 1
Solution ($or) Part 2
Quiz ($not)
Solution ($not) Part 1
Solution ($not) Part 2
Solution ($not) Part 3
Quiz ($exists)
Solution ($exists)
Quiz ($expr)
Solution ($expr)
Quiz ($mod)
Solution ($mod)
Quiz ($text)
Solution ($text)
Quiz ($all)
Solution ($all) Part 1
Solution ($all) Part 2
Quiz ($elemMatch)
Solution ($elemMatch) Part 1
Solution ($elemMatch) Part 2
Quiz ($size)
Solution ($size)
Chapter 31 : Update Operators
$currentDate Operator
$inc Operator Part 1
$inc Operator Part 2
$min Operator
$max Operator
$mul Operator
$rename Operator
$set Operator Part 1
$set Operator Part 2
$unset Operator
$addToSet Operator
$pop Operator
$pull Operator
$push Operator
$each Operator
$position Operator
$sort Operator
Quiz 1 (Update Operators)
Solution 1 (Update Operators) Part 1
Solution 1 (Update Operators) Part 2
Solution 1 (Update Operators) Part 3
Solution 1 (Update Operators) Part 4
Quiz 2 (Update Operators)
Solution 2 (Update Operators) Part 1
Solution 2 (Update Operators) Part 2
Solution 2 (Update Operators) Part 3
Chapter 32 : Mongo with Node
Installing Node on Local Machine
Installing VS Code
Mongo Atlas
Create Cluster on Mongo Atlas
Creating User in Atlas
Network Access
Database and Collections
Connect Node with Mongo
Get Databases
Insert in Mongo Using Node
Read from Mongo Using Node
Update in Mongo Using Node
Delete from Mongo Using Node
Chapter 33 : Mongo with Python
PyCharm
Creating Connection
Insert in Mongo Using Python
Read from Mongo Using Python
Update in Mongo Using Python
Delete in Mongo Using Python
Chapter 34 : Django with Mongo
Django Installation
Creating App
Setting Up Django with Mongo
Django Migrations
Django URLs and Views
Django with Postman
Django Get Data from Postman
Insert in Mongo Using Django
Read from Mongo Using Django
Update in Mongo Using Django
Delete in Mongo Using Django
Chapter 35 : Spark with Mongo
Databricks for Spark
Installing Libraries
Data Overview
ETL
Start your Free Trial Self paced Go to the Course We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.
This site uses cookies. By continuing to use this website, you agree to their use.I Accept