Description
Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lakes or Data Warehouses and then from Data Lakes or Data Warehouses to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using Azure Data Analytics Stack. It includes services such as Azure Storage (both Blob and ADLS), ADF Data Flow, ADF Pipeline, Azure SQL, Azure Synapse, Azure Databricks, and many more.
- As part of this course, first, you will go ahead and set up the environment to learn using VS Code on Windows and Mac.
- Once the environment is ready, you need to sign up for Azure Portal. We will provide all the instructions to sign up for Azure Portal Account including reviewing billing as well as getting USD 200 Credit valid for up to a month.
- We typically use Azure Storage as Data Lake. As part of this course, you will learn how to use Azure Storage as Data Lake along with how to manage the files in Azure Storage using tools such as Azure Storage Explorer.
- ADF is used for both ETL as well as Orchestration. First, you will understand how to perform ETL using ADF Data Flow. The source and target will be Files in Azure Storage Account. As part of this process, you will also learn how to set up Linked Services and Data Sets in ADF.
- Once ADF Data Flow is ready, you will go ahead and build Pipeline for Orchestration using ADF Pipeline. You will also learn how to parameterize and also how to take care of baseline load.
- You will also understand key performance tuning techniques using ADF Pipeline such as controlling the number of partitions, custom integration runtimes (IR), etc.
- Azure provides RDBMS as different services for Postgres, SQL Server, etc. You will learn how to set up Azure SQL Once the Azure SQL is set up, you will also understand how to create required tables and run queries against them.
- ADF provides ADF Data Copy to copy data from different sources and different targets. Once the Database tables are ready you will use ADF Data Copy to copy data into the tables.
- Azure provides Synapse Analytics for Data Warehouse. You will get an overview of both serverless as well as dedicated pools. You will end up setting up Dedicated Pool for ETL using ADF.
- Once Azure SQL and Azure Synapse are ready, you will build ETL Pipeline using ADF Data Flow and Orchestrate using ADF Pipeline.
- Azure Databricks is the service for Big Data Processing using Spark Engine. You will learn how to set up Azure Databricks, integrate with ADLS, and also managing secrets.
- You will also get an overview of Spark SQL and Pyspark Data Frame APIs using Azure Databricks.
- You will also build ELT Pipeline using Databricks Jobs and Workflows where tasks are defined based on Pyspark as well as Spark SQL.
- You will also understand how to build ADF Pipelines to orchestrate Databricks Notebooks.
Who this course is for:
- Beginner or Intermediate Data Engineers who want to learn Key Azure Analytics Services for Data Engineering such as Azure Storage, ADF, Synapse, Databricks, etc
- Intermediate Application Engineers who want to explore Data Engineering using Azure Analytics Services for Data Engineering such as Azure Storage, ADF, Synapse, Databricks, etc
- Data and Analytics Engineers who want to learn Data Engineering Azure Analytics Services for Data Engineering such as Azure Storage, ADF, Synapse, Databricks, etc
- Testers who want to learn key skills to test Data Engineering applications built using Azure Analytics Services for Data Engineering such as Azure Storage, ADF, Synapse, Databricks, etc
Requirements
- A Computer with at least 8 GB RAM
- Programming Experience using Python is highly desired as some of the topics are demonstrated using Python
- SQL Experience is highly desired as some of the topics are demonstrated using SQL
- Nice to have Data Engineering Experience using Pandas or Pyspark
- This course is ideal for experienced data engineers to add GCP Analytics Services as key skills to their profile
Last Updated 1/2023
Download Links
Direct Download
Master Data Engineering using Azure Data Analytics.zip (5.1 GB) | Mirror