Spark Session

Spark Session

Lesson objectives

In this lesson, we will explain the following topics:

  • Understand the concept and purpose of a SparkSession.
  • Learn how to create and use a SparkSession in a Spark application.
  • Explore the benefits of SparkSession for simplifying Spark interactions and configurations.

Spark Distributed Execution: SparkSession

What is a Session?

  • A session refers to an interaction between two or more entities.
  • In computing, it’s especially common in networked computers on the internet.

Types of Sessions in Computing

  • TCP session: A basic form of interaction in network communication.
  • Login session: The period when a user is logged into a system.
  • HTTP session: A series of interactions between a web server and a client.
  • User session: The time a user interacts with a software application.

Introducing SparkSession

  • Similar to the sessions mentioned, Spark has its own SparkSession.
  • SparkSession provides a unified entry point to Spark’s functionalities.

Functionality of SparkSession

  • SparkSession: An object that provides a point of entry to interact with underlying Spark functionality.
  • It allows programming Spark with its APIs.
  • In an interactive Spark shell, the Spark driver instantiates a SparkSession for you.
  • In a Spark application, you create a SparkSession object yourself.
  • You can program Spark using DataFrame and Dataset APIs through SparkSession.
  • In Scala and Python, the variable is available as spark when you start the console.

SparkSession

  • The SparkSession instance is the way Spark executes user-defined manipulations across the cluster.
  • There is a one-to-one correspondence between a SparkSession and a Spark Application.
  • It connects the Spark driver program with the cluster manager.
  • SparkSession determines the resource manager (YARN, Mesos, or Standalone) for communication.
  • It allows configuration of Spark parameters.

Interacting with Spark in Earlier Versions

  • In earlier versions of Spark, setting up a Spark application required creating a SparkConf and SparkContext.
# Create SparkContext in old Spark versions
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext

sparkConf = SparkConf().setAppName("SparkSessionExample").setMaster("local")
sc = SparkContext(conf=sparkConf)
sqlContext = SQLContext(sc)
// Create SparkContext in old Spark versions
//set up the spark configuration and create contexts
val sparkConf = new SparkConf().setAppName("SparkSessionZipsExample").setMaster("local")
val sc = new SparkContext(sparkConf)
sc.set("spark.some.config.option", "some-value")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

Simplification in Spark 2.0 with SparkSession

  • Spark 2.0 introduced SparkSession, simplifying the way you interact with Spark.
  • SparkSession encapsulates SparkConf, SparkContext, and SQLContext.
# Pyspark: Create SparkSession
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("SparkSessionExample") \
    .config("spark.some.config.option", "value") \
    .getOrCreate()
// Spark: Create SparkSession
// Create a SparkSession. No need to create SparkContext.
val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
val spark = SparkSession
    .builder()
    .appName("SparkSessionZipsExample")
    .config("spark.sql.warehouse.dir", warehouseLocation)
    .enableHiveSupport()
    .getOrCreate()

Using SparkSession

  • Spark 2.0 introduces SparkSession.
  • With SparkSession, you can access all Spark functionalities.
  • A unified entry point to Spark’s functionality, reduces the need for multiple context initializations.
  • Encapsulates the functionalities of SQLContext, HiveContext, and more.

Reference

Watch on Youtube

Watch on our Servers

You can download the videog the link and chose save link as: Download Video