This course aims to introduce the principles and practice of Data Science using Python Libraries. You’ll learn how to use popular Python data science libraries, implement Big Data solutions, and more. You will create distributed big data solutions using PySpark. The essential features of Data Science are explored with practical examples. Data Science will be demystified.
Duration
5 days
Prerequisites
- Approx. 6 months Python experience
What you’ll learn
- Object-oriented Python programming
- Functional Python programming
- REST services and web sockets
- Defining and using decorators
- Asynchronous programming
- Python data science techniques
- Python Big Data
- Getting Started with PySpark
- PySpark data structures
Course details
Recap Essential Python Features – optional
- Language Fundamentals
- Functions
- Data Structures
- Defining and Using Packages
- Additional Techniques
Object-Oriented Programming
- Essential Concepts
- Defining and Using a Class
- Class-Wide Members
Additional Object-Oriented Techniques
- A Closer Look at Attributes
- Implementing Special Methods
- Inheritance
Functional Programming
- Functional Programming in Python
- Higher Order Functions
- Additional Techniques
Decorators
- Getting Started with Decorators
- Additional Decorator Techniques
- Parameterized Decorators
Asynchronous Processing in Python
- Getting Started with Asynchrony in Python
- Creating Tasks to Run in Different Threads
- Additional Task Techniques
Getting Started with Python Data Science and NumPy
- Introduction to Python Data Science
- NumPy Arrays
- Manipulating Array Elements
- Manipulating Array Shape
NumPy Techniques
- NumPy Universal Functions
- Aggregations
- Broadcasting
- Manipulating Arrays using Boolean Logic
- Additional Techniques
Getting Started with Pandas
- Introduction to Pandas
- Creating a Series
- Using a Series
- Creating a DataFrame
- Using a DataFrame
Pandas Techniques
- Universal Functions
- Merging and Joining Datasets
- A Closer Look at Joins
Working with Time Series Data
- Introduction to Time Series Data
- Indexing and Plotting Time Series Data
- Testing Data for Stationarity
- Making Data Stationary
- Forecasting Time Series Data
- Scaling Back the ARIMA Results
Case Study
- Worked example of a real-world data science problem
Introduction to Big Data
- Setting the scene
- Introduction to Hadoop
- Hadoop components
Getting Started with PySpark
- Introduction to Spark
- Spark architecture
- Application execution
- Using the Python Spark Shell
Using the PySpark API
- Essential concepts
- Creating an RDD
- Working with RDDs
RDD Operations – Part 1
- RDD transformations
- RDD transformations on key-value pairs
RDD Operations – Part 2
- RDD actions
- Caching
- Spark jobs – the big picture
Getting Started Spark SQL
- Overview of Spark SQL
- Getting started with the Spark SQL API
- Creating DataFrames from a data source
- Creating DataFrames from an RDD
Spark SQL DataFrame Operations
- Basic operations
- Language-integrated query operations
- RDD operations
- Output operations
Appendix Additional Big Data Technologies
- Data serialization
- Columnar storage
- Messaging systems
- NoSQL