PySpark

$100.00$300.00

Clear
SKU: N/A Category:

Course Curriculum

Section- 1

  • Introduction to Python
  • Installation
  • Indentation
  • Lists, For Loops, Tuples, and Dictionaries
  • Strings and Print
  • Functions, Lambda
  • Exceptions Handling

Section- 2

  • Data Frames
  • Importing data
  • Parsing data
  • Renaming Columns of a Dataframe
  • Filtering a Data Frame
  • Basic operations

Section- 3

  • Introduction to NumPy, Pandas

Section- 4

  • Introduction to Spark
  • Detailed examples for each of the below:
  • RDD, Architecture, Components
  • Pair RDD, SparkSQL
  • Running Spark in Local & Cluster mode
  • Processing datasets – csv, json, parquet
  • Complex joins – left-outer join, self
  • Configuring Spark jobs – executors, driver memory

Section- 5

  • Hadoop and eco system
  • HDFS Commands
  • Unix shell scripting
  • Hive – External, Internal tables, UDF
  • Sqoop

Section- 6

  • Monitoring Jobs

Section- 7

  • Running spark on Cloud (Demonstrate)
  • AWS
  • Input from S3
  • Athena
  • AWS Glue
  • EMR, EC2

Course Credentials

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.