Overview
Alternative Backends for R LiveLessons teaches R programmers
techniques for dealing with large data, both in memory and in
databases.
Description
In this video training Jared starts with some common data
manipulation operations using various base R functions and packages
like plyr, comparing the speed of in memory calculations. He then
demonstrates more advanced techniques for accomplishing the same
task such as data.table, dplyr, Rcpp and parallel computation for
increased speed. Finally, for when data size is an even bigger
factor than speed he introduces external memory and database
techniques using bibmemory, ff, SciDB, dplyr and Hadoop.
About the Instructor
Jared P. Lander is the Founder and CEO of Lander
Analytics, the Organizer of the New York Open Statistical
Programming Meetup and an Adjunct Professor of Statistics at
Columbia University. With a masters from Columbia University in
statistics and a bachelors from Muhlenberg College in mathematics,
he has experience in both academic research and industry. Jared
oversees the long-term direction of the company and acts as Lead
Data Scientist, researching the best strategy, models and
algorithms for modern data needs. This is in addition to his
client-facing consulting and training. He specializes in data
management, multilevel models, machine learning, generalized linear
models, data management, visualization and statistical computing.
He is the author of
R for Everyone, a book about R Programming geared toward
Data Scientists and Non-Statisticians alike. The book is available
from Amazon, Barnes & Noble, and InformIT. The material is
drawn from the classes he teaches at Columbia and is incorporated
into his corporate training. Very active in the data community,
Jared is a frequent speaker at conferences, universities and
meetups around the world. He is a member of the 2014 Strata New
York selection committee.
Skill Level
What You Will Learn
Basic Aggregation
plyr
dplyr
data.table
Rcpp
Parallel Processing
Code Benchmarking
Who Should Take This Course
Course Requirements
Table of Contents
Lesson 1: Reading XML Data
1.1. Read HTML Table
1.2. Use xpath for complex searches in HTML
1.3. xmlToList for easier parsing
Lesson 2: Faster Group Operations
2.1. Aggregate normally
2.2. tapply
2.3. ddply
2.4. data.table
2.5. dplyr
2.6. ddply parallel
2.7. foreach
2.8. dplyr with a database
Lesson 3: Rcpp for faster code
3.1. Basics of C++ with R
3.2. Writing a C++ function for R
3.3. Using C++ code in an R package
Lesson 4: Advanced Machine Learning
4.1. Recommendation Engine with RecommenderLab
4.2. Text Mining with RTextTools
Lesson 5: Network Analysis
5.1. igraph
5.2. Reading edgelists
5.3. Base plots
5.4. tkplots
5.5. rglplots
5.6. Network metrics like diameter, shortest path
5.7. Node metrics like centrality and betweenness
Lesson 6: Advanced Graphics
6.1. ggvis
6.2. rCharts
About LiveLessons Video Training
LiveLessons Video Training series publishes hundreds of
hands-on, expert-led video tutorials covering a wide selection of
technology topics designed to teach you the skills you need to
succeed. This professional and personal technology video series
features world-leading author instructors published by your trusted
technology brands: Addison-Wesley, Cisco Press, IBM Press, Pearson
IT Certification, Prentice Hall, Sams, and Que. Topics include: IT
Certification, Programming, Web Development, Mobile Development,
Home and Office Technologies, Business and Management, and more.
View all LiveLessons on InformIT at:
http://www.informit.com/livelessons
.