Apache Spark
Course Presentation
You can download the course presentation by right clicking the link and chose save link as: Download Presentation
Ch.04-01: Introduction
In this lesson, we will explain the following topics:
- Understand the course structure and objectives.
 - Familiarize with the course references and resources.
 - Learn about the prerequisites needed for the course.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-02: Python Vs. Scala
In this lesson, we will explain the following topics:
- Compare the differences between Python and Scala in the context of Spark.
 - Understand the performance implications of using Python vs. Scala.
 - Learn about the advantages and disadvantages of each language for Spark development.
 
Go to lesson | Watch on YouTube Download the video
Ch.04-03: Introduction
In this lesson, we will explain the following topics:
- Learn about the origin and development of Apache Spark.
 - Understand the key milestones and contributions to the Spark project.
 - Explore the unified engine design of Spark for large-scale distributed data processing.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-04: About Databricks
In this lesson, we will explain the following topics:
- Understand the role of Databricks in the Spark ecosystem.
 - Learn about Databricks’ contributions to Spark development and the community.
 - Explore the capabilities of the Databricks analytics platform.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-05: Spark In The Data Platforms
In this lesson, we will explain the following topics:
- Understand the role of Spark in data platforms.
 - Learn about the technical components of a data lake.
 - Explore how Spark integrates with other big data technologies.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-06: Running Spark
In this lesson, we will explain the following topics:
- Learn the different methods for running Spark, including Databricks, local installations, and Docker.
 - Understand the steps to set up and run Spark in various environments.
 - Explore the benefits of using the Databricks Community Edition for learning and small projects.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-07: Demo: Running Spark on Linux Ubuntu
In this lesson, we will explain the following topics:
- Demonstrate the process of installing and running Spark on Linux Ubuntu.
 - Understand the configuration steps required for Spark installation on Ubuntu.
 - Explore the execution of Spark applications on a Linux environment.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-08: Demo: Running Spark on MacOS
In this lesson, we will explain the following topics:
- Demonstrate the process of installing and running Spark on macOS.
 - Understand the configuration steps required for Spark installation on macOS.
 - Explore the execution of Spark applications on a macOS environment.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-09: Demo: Running Spark on Windows
In this lesson, we will explain the following topics:
- Demonstrate the process of installing and running Spark on Windows.
 - Understand the configuration steps required for Spark installation on Windows.
 - Explore the execution of Spark applications on a Windows environment.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-10: Demo Running Spark On Databricks
In this lesson, we will explain the following topics:
- Demonstrate the process of running Spark on Databricks.
 - Understand the benefits of using Databricks for Spark workloads.
 - Explore practical examples of Spark applications running on Databricks.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-11: From Map Reduce To Spark
In this lesson, we will explain the following topics:
- Understand the basic idea and stages of MapReduce.
 - Learn about the limitations of MapReduce and the motivation for Spark.
 - Explore the improvements offered by Spark over MapReduce, including in-memory processing and optimized execution.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-12: Spark Characteristics
In this lesson, we will explain the following topics:
- Learn about the key characteristics of Spark, including speed, ease of use, modularity, and extensibility.
 - Understand how Spark achieves its high performance through hardware utilization, DAG scheduling, and the Tungsten execution engine.
 - Explore the benefits of Spark’s modular and extensible architecture.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-13: Spark Applications
In this lesson, we will explain the following topics:
- Understand the components of a Spark application, including the driver and executors.
 - Learn about the execution process of Spark applications in a distributed environment.
 - Explore the different languages supported by Spark for application development.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-14: Spark Driver
In this lesson, we will explain the following topics:
- Learn about the role and key functions of the Spark driver.
 - Understand how the driver schedules and distributes tasks to executors.
 - Explore the communication and resource management responsibilities of the driver.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-15: Spark Session
In this lesson, we will explain the following topics:
- Understand the concept and purpose of a SparkSession.
 - Learn how to create and use a SparkSession in a Spark application.
 - Explore the benefits of SparkSession for simplifying Spark interactions and configurations.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-16: Spark Cluster Manager
In this lesson, we will explain the following topics:
- Understand the role of the cluster manager in Spark applications.
 - Learn about the different cluster managers supported by Spark, including Standalone, Hadoop YARN, Apache Mesos, and Kubernetes.
 - Explore the resource allocation and management responsibilities of the cluster manager.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-17: Spark Execution Mode
In this lesson, we will explain the following topics:
- Learn about the different execution modes in Spark, including cluster mode, client mode, and local mode.
 - Understand the differences and use cases for each execution mode.
 - Explore how to configure and execute Spark applications in various modes.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-18: Spark Executors
In this lesson, we will explain the following topics:
- Understand the role and functions of Spark executors.
 - Learn how executors execute tasks and communicate results.
 - Explore the resource management and lifecycle of executors in a Spark application.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-19: Spark Data Partitioning
In this lesson, we will explain the following topics:
- Learn about data distribution and partitioning in Spark.
 - Understand the benefits of partitioning for efficient parallelism and task allocation.
 - Explore practical examples of data partitioning and its impact on Spark performance.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-20: Spark Operations
In this lesson, we will explain the following topics:
- Understand the two types of Spark operations: transformations and actions.
 - Learn about the immutability of Spark operations and its implications.
 - Explore examples of transformations and actions, including lazy evaluation and its benefits.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-21: Transformations Narrow Vs Wide
In this lesson, we will explain the following topics:
- Learn about the two types of Spark transformations: narrow and wide.
 - Understand the characteristics and benefits of narrow transformations.
 - Explore the implications and performance considerations of wide transformations.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-22: Demo: Immutability In Spark
In this lesson, we will explain the following topics:
- Demonstrate the concept of immutability in Spark.
 - Understand how Spark ensures immutability and its impact on data processing.
 - Explore practical examples of immutable operations in Spark.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-23: Demo: RDD Text Manipulation
In this lesson, we will explain the following topics:
- Demonstrate text manipulation using RDDs in Spark.
 - Learn how to apply transformations and actions on text data.
 - Explore practical examples of RDD operations for text processing.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-24: Demo: GroupByKey Vs. ReduceByKey
In this lesson, we will explain the following topics:
- Compare the differences between groupByKey and reduceByKey in Spark.
 - Understand the performance implications of each operation.
 - Explore practical examples to illustrate the use cases and benefits of both operations.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-25: Demo: Joining RDDs
In this lesson, we will explain the following topics:
- Demonstrate the process of joining RDDs in Spark.
 - Learn about the different types of joins supported by Spark.
 - Explore practical examples of RDD joins and their applications in data processing.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-26: Demo: RDD Operations Part 1
In this lesson, we will explain the following topics:
- Demonstrate the use of Spark RDD APIs, including map, flatMap, filter, reduce, groupBy, groupByKey, and reduceByKey for data transformation, extraction, organization, and reduction.
 - Learn to apply various operations for efficient data processing and aggregation.
 - Showcase how to navigate and utilize the Spark documentation effectively.
 
Go to lesson | Watch on YouTube | Download the video
Ch.04-27: Demo: Repartition Vs. Coalesce
In this lesson, we will explain the following topics:
- Explain in detail the difference between repartition and coalesce in Spark RDD APIs.
 - Analyze the Spark source code implementation for repartition and coalesce to understand their differences.
 - Demonstrate practical examples of how to use repartition and coalesce functions in Spark.