Course Curriculum
Section- 1
- Introduction to Python
- Installation
- Indentation
- Lists, For Loops, Tuples, and Dictionaries
- Strings and Print
- Functions, Lambda
- Exceptions Handling
Section- 2
- Data Frames
- Importing data
- Parsing data
- Renaming Columns of a Dataframe
- Filtering a Data Frame
- Basic operations
Section- 3
- Introduction to NumPy, Pandas
Section- 4
- Introduction to Spark
- Detailed examples for each of the below:
- RDD, Architecture, Components
- Pair RDD, SparkSQL
- Running Spark in Local & Cluster mode
- Processing datasets – csv, json, parquet
- Complex joins – left-outer join, self
- Configuring Spark jobs – executors, driver memory
Section- 5
- Hadoop and eco system
- HDFS Commands
- Unix shell scripting
- Hive – External, Internal tables, UDF
- Sqoop
Section- 6
- Monitoring Jobs
Section- 7
- Running spark on Cloud (Demonstrate)
- AWS
- Input from S3
- Athena
- AWS Glue
- EMR, EC2




Reviews
There are no reviews yet.