Video description
Data Just Right LiveLessons provides a practical introduction to solving common data challenges, such as managing massive datasets, visualizing data, building data pipelines and dashboards, and choosing tools for statistical analysis. You will learn how to use many of today's leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery.
Table of Contents
Introduction to Data Just Right LiveLessons
00:09:15
Learning objectives
00:00:33
1.1 Why is big data such a hot concept now?
00:02:52
1.2 Four strategies for tackling big data problems
00:04:13
1.3 Anatomy of a data pipeline
00:03:58
1.4 What the ideal database would look like
00:02:23
Learning objectives
00:00:35
2.1 Challenges of hosting and sharing large amounts of data
00:02:48
2.2 Choosing the right data format
00:07:33
2.3 Best practices for physically storing and sharing large amounts of data
00:04:43
2.4 Understanding data serialization formats
00:03:56
Learning objectives
00:00:50
3.1 History and use of relational databases
00:03:59
3.2 Databases and the Internet: Understanding the CAP theorem
00:05:07
3.3 Non-relational databases: Document and key-value stores
00:04:20
3.4 Introduction to Redis
00:06:31
3.5 Sharding Redis across a cluster of machines
00:07:16
3.6 Future trends in database technology
00:04:03
Learning objectives
00:00:35
4.1 History and meaning of business intelligence
00:06:27
4.2 Data warehousing and Hadoop
00:03:00
4.3 Data silos can be good
00:03:59
4.4 Convergence and the future of the business intelligence concept
00:02:25
Learning objectives
00:00:42
5.1 Introduction to Apache Hive
00:02:53
5.2 Loading data into Hive
00:07:07
5.3 Querying data with Hive
00:06:03
5.4 Introduction to AMPLab’s Shark
00:02:18
5.5 Data warehousing in the cloud
00:02:01
Learning objectives
00:00:36
6.1 Introduction to analytical databases
00:02:47
6.2 Google’s Dremel and BigQuery
00:02:15
6.3 Running a BigQuery query and retrieving the result
00:06:35
6.4 Visualizing BigQuery query results
00:06:12
6.5 The future of analytical query engines
00:02:27
Learning objectives
00:00:47
7.1 History and goals of data visualization
00:02:59
7.2 Strategies for dealing with visualization of very large datasets
00:02:19
7.3 Building interactive visualizations with R and ggplot()
00:08:09
7.4 Building 2D plots with Python and matplotlib
00:05:49
7.5 Building interactive visualizations for the Web with D3.js
00:07:26
Learning objectives
00:00:40
8.1 Writing a simple data pipeline script
00:04:53
8.2 Introduction to the Hadoop MapReduce framework
00:03:36
8.3 Writing a Hadoop streaming MapReduce job in Python
00:06:05
8.4 Writing a multistep MapReduce job using the mrjob Python library
00:06:30
8.5 Running mrjob scripts on Amazon Elastic MapReduce
00:04:26
Learning objectives
00:00:41
9.1 Challenges of building complex data workflows
00:02:00
9.2 Writing a MapReduce workflow script with Apache Pig
00:05:52
9.3 Creating a MapReduce workflow application with Cascading
00:04:23
9.4 When to use Pig versus Cascading
00:02:19
Learning objectives
00:00:40
10.1 Use cases and limitations of machine learning
00:02:49
10.2 Bayesian classification, clustering, and recommendation engines
00:05:17
10.3 Using Apache Mahout for bayesian classification
00:07:35
10.4 Introduction to MLbase
00:02:35
Learning objectives
00:00:53
11.1 Understanding memory usage with R
00:05:49
11.2 Working with large matrices using bigmemory and biganalytics
00:05:17
11.3 Manipulating large data frames with ff
00:04:47
11.4 Running a linear regression over large datasets using biglm
00:05:23
11.5 Interfacing with Hadoop using R and RHadoop
00:03:25
Learning objectives
00:00:42
12.1 Choosing a programming language for analytics
00:02:35
12.2 Working with NumPy and SciPy
00:06:01
12.3 Using the Pandas library for analysing time series data
00:09:08
12.4 Using the iPython notebook
00:06:08
Learning objectives
00:00:48
13.1 Understanding Your Data Problem
00:03:40
13.2 A playbook for the build versus buy problem
00:03:14
13.3 Investing in a data center: Public versus private
00:03:30
13.4 Understanding the costs of open-source software
00:04:05
13.5 Using analytics as a service technologies
00:03:55
Learning objectives
00:00:40
14.1 Trends driving innovation in data analytics technology
00:03:04
14.2 Hadoop: The disruptor and the disrupted
00:03:11
14.3 Analytics move toward the cloud
00:03:15
14.4 The evolving definition of “data scientist”
00:04:18
14.5 Converging technologies
00:03:01
Summary of Data Just Right LiveLessons
00:00:55