Apache Spark
Course Presentation
You can download the course presentation by right clicking the link and chose save link as: Download Presentation
Ch.04-01: Introduction
In this lesson, we will explain the following topics:
- Understand the course structure and objectives.
- Familiarize with the course references and resources.
- Learn about the prerequisites needed for the course.
Go to lesson | Watch on YouTube | Download the video
Ch.04-02: Python Vs. Scala
In this lesson, we will explain the following topics:
- Compare the differences between Python and Scala in the context of Spark.
- Understand the performance implications of using Python vs. Scala.
- Learn about the advantages and disadvantages of each language for Spark development.
Go to lesson | Watch on YouTube Download the video
Ch.04-03: Introduction
In this lesson, we will explain the following topics:
- Learn about the origin and development of Apache Spark.
- Understand the key milestones and contributions to the Spark project.
- Explore the unified engine design of Spark for large-scale distributed data processing.
Go to lesson | Watch on YouTube | Download the video
Ch.04-04: About Databricks
In this lesson, we will explain the following topics:
- Understand the role of Databricks in the Spark ecosystem.
- Learn about Databricks’ contributions to Spark development and the community.
- Explore the capabilities of the Databricks analytics platform.
Go to lesson | Watch on YouTube | Download the video
Ch.04-05: Spark In The Data Platforms
In this lesson, we will explain the following topics:
- Understand the role of Spark in data platforms.
- Learn about the technical components of a data lake.
- Explore how Spark integrates with other big data technologies.
Go to lesson | Watch on YouTube | Download the video
Ch.04-06: Running Spark
In this lesson, we will explain the following topics:
- Learn the different methods for running Spark, including Databricks, local installations, and Docker.
- Understand the steps to set up and run Spark in various environments.
- Explore the benefits of using the Databricks Community Edition for learning and small projects.
Go to lesson | Watch on YouTube | Download the video
Ch.04-07: Demo: Running Spark on Linux Ubuntu
In this lesson, we will explain the following topics:
- Demonstrate the process of installing and running Spark on Linux Ubuntu.
- Understand the configuration steps required for Spark installation on Ubuntu.
- Explore the execution of Spark applications on a Linux environment.
Go to lesson | Watch on YouTube | Download the video
Ch.04-08: Demo: Running Spark on MacOS
In this lesson, we will explain the following topics:
- Demonstrate the process of installing and running Spark on macOS.
- Understand the configuration steps required for Spark installation on macOS.
- Explore the execution of Spark applications on a macOS environment.
Go to lesson | Watch on YouTube | Download the video
Ch.04-09: Demo: Running Spark on Windows
In this lesson, we will explain the following topics:
- Demonstrate the process of installing and running Spark on Windows.
- Understand the configuration steps required for Spark installation on Windows.
- Explore the execution of Spark applications on a Windows environment.
Go to lesson | Watch on YouTube | Download the video
Ch.04-10: Demo Running Spark On Databricks
In this lesson, we will explain the following topics:
- Demonstrate the process of running Spark on Databricks.
- Understand the benefits of using Databricks for Spark workloads.
- Explore practical examples of Spark applications running on Databricks.
Go to lesson | Watch on YouTube | Download the video
Ch.04-11: From Map Reduce To Spark
In this lesson, we will explain the following topics:
- Understand the basic idea and stages of MapReduce.
- Learn about the limitations of MapReduce and the motivation for Spark.
- Explore the improvements offered by Spark over MapReduce, including in-memory processing and optimized execution.
Go to lesson | Watch on YouTube | Download the video
Ch.04-12: Spark Characteristics
In this lesson, we will explain the following topics:
- Learn about the key characteristics of Spark, including speed, ease of use, modularity, and extensibility.
- Understand how Spark achieves its high performance through hardware utilization, DAG scheduling, and the Tungsten execution engine.
- Explore the benefits of Spark’s modular and extensible architecture.
Go to lesson | Watch on YouTube | Download the video
Ch.04-13: Spark Applications
In this lesson, we will explain the following topics:
- Understand the components of a Spark application, including the driver and executors.
- Learn about the execution process of Spark applications in a distributed environment.
- Explore the different languages supported by Spark for application development.
Go to lesson | Watch on YouTube | Download the video
Ch.04-14: Spark Driver
In this lesson, we will explain the following topics:
- Learn about the role and key functions of the Spark driver.
- Understand how the driver schedules and distributes tasks to executors.
- Explore the communication and resource management responsibilities of the driver.
Go to lesson | Watch on YouTube | Download the video
Ch.04-15: Spark Session
In this lesson, we will explain the following topics:
- Understand the concept and purpose of a SparkSession.
- Learn how to create and use a SparkSession in a Spark application.
- Explore the benefits of SparkSession for simplifying Spark interactions and configurations.
Go to lesson | Watch on YouTube | Download the video
Ch.04-16: Spark Cluster Manager
In this lesson, we will explain the following topics:
- Understand the role of the cluster manager in Spark applications.
- Learn about the different cluster managers supported by Spark, including Standalone, Hadoop YARN, Apache Mesos, and Kubernetes.
- Explore the resource allocation and management responsibilities of the cluster manager.
Go to lesson | Watch on YouTube | Download the video
Ch.04-17: Spark Execution Mode
In this lesson, we will explain the following topics:
- Learn about the different execution modes in Spark, including cluster mode, client mode, and local mode.
- Understand the differences and use cases for each execution mode.
- Explore how to configure and execute Spark applications in various modes.
Go to lesson | Watch on YouTube | Download the video
Ch.04-18: Spark Executors
In this lesson, we will explain the following topics:
- Understand the role and functions of Spark executors.
- Learn how executors execute tasks and communicate results.
- Explore the resource management and lifecycle of executors in a Spark application.
Go to lesson | Watch on YouTube | Download the video
Ch.04-19: Spark Data Partitioning
In this lesson, we will explain the following topics:
- Learn about data distribution and partitioning in Spark.
- Understand the benefits of partitioning for efficient parallelism and task allocation.
- Explore practical examples of data partitioning and its impact on Spark performance.
Go to lesson | Watch on YouTube | Download the video
Ch.04-20: Spark Operations
In this lesson, we will explain the following topics:
- Understand the two types of Spark operations: transformations and actions.
- Learn about the immutability of Spark operations and its implications.
- Explore examples of transformations and actions, including lazy evaluation and its benefits.
Go to lesson | Watch on YouTube | Download the video
Ch.04-21: Transformations Narrow Vs Wide
In this lesson, we will explain the following topics:
- Learn about the two types of Spark transformations: narrow and wide.
- Understand the characteristics and benefits of narrow transformations.
- Explore the implications and performance considerations of wide transformations.
Go to lesson | Watch on YouTube | Download the video
Ch.04-22: Demo: Immutability In Spark
In this lesson, we will explain the following topics:
- Demonstrate the concept of immutability in Spark.
- Understand how Spark ensures immutability and its impact on data processing.
- Explore practical examples of immutable operations in Spark.
Go to lesson | Watch on YouTube | Download the video
Ch.04-23: Demo: RDD Text Manipulation
In this lesson, we will explain the following topics:
- Demonstrate text manipulation using RDDs in Spark.
- Learn how to apply transformations and actions on text data.
- Explore practical examples of RDD operations for text processing.
Go to lesson | Watch on YouTube | Download the video
Ch.04-24: Demo: GroupByKey Vs. ReduceByKey
In this lesson, we will explain the following topics:
- Compare the differences between groupByKey and reduceByKey in Spark.
- Understand the performance implications of each operation.
- Explore practical examples to illustrate the use cases and benefits of both operations.
Go to lesson | Watch on YouTube | Download the video
Ch.04-25: Demo: Joining RDDs
In this lesson, we will explain the following topics:
- Demonstrate the process of joining RDDs in Spark.
- Learn about the different types of joins supported by Spark.
- Explore practical examples of RDD joins and their applications in data processing.
Go to lesson | Watch on YouTube | Download the video
Ch.04-26: Demo: RDD Operations Part 1
In this lesson, we will explain the following topics:
- Demonstrate the use of Spark RDD APIs, including map, flatMap, filter, reduce, groupBy, groupByKey, and reduceByKey for data transformation, extraction, organization, and reduction.
- Learn to apply various operations for efficient data processing and aggregation.
- Showcase how to navigate and utilize the Spark documentation effectively.
Go to lesson | Watch on YouTube | Download the video
Ch.04-27: Demo: Repartition Vs. Coalesce
In this lesson, we will explain the following topics:
- Explain in detail the difference between repartition and coalesce in Spark RDD APIs.
- Analyze the Spark source code implementation for repartition and coalesce to understand their differences.
- Demonstrate practical examples of how to use repartition and coalesce functions in Spark.