DP-201: Designing an Azure Data Solution - Exam Prep
/A collection of resources, study notes, and learning material that helped me, and can hopefully help others, prepare for and pass exam DP-201: Designing an Azure Data Solution. Note: Passing DP-201 is one out of two steps required to become a Microsoft Certified: Azure Data Engineer, you must pass both DP-200 and DP-201.
Suggested Approach
Briefly skim through and familiarise yourself with the list of Resources and Skills Measured below (Tip: In regards to the Skills Measured, always refer to the latest skills outline available directly from the exam home page as the content changes from time to time. The copy in this article is as of the 9th February 2020, key phrases have been pre-highlighted to aid learning).
Complete the DP-200 Microsoft Learn Collection. Note: As the name of the learning path suggests, this playlist was originally intended for DP-200. That said, the material is just as relevant for DP-201 albeit, the questions on the exam will be coming at the topics from a different perspective (i.e. Design vs. Implementation).
Re-visit the Skills Measured section below upon completing the learning path and dive into the links for any areas that were not covered by the Microsoft Learn collection and/or require a deeper understanding.
Lastly, read-over the Reference Tables below as well as those covered in the DP-200 guide to consume succinct summaries of key topics and comparisons.
Resources
Resource | Link |
---|---|
Exam | Exam DP-201: Designing an Azure Data Solution |
Related Certification | Microsoft Certified: Azure Data Engineer Associate |
Microsoft Learn | Azure Data Engineer Learning Paths |
Hands-On Labs | Microsoft Learning - GitHub: DP-201 Hands-On Labs |
MS Learn Collection | Custom Microsoft Learn Collection for DP-200 |
Skills Measured
1. Design Azure data storage solutions (40-45%)
Recommend an Azure data storage solution based on requirements
• choose the correct data storage[1] solution to meet the technical and business requirements
• choose the partition distribution type[1]
Design non-relational cloud data stores
• design data distribution[1][2] and partitions[1][2]
• design for scale, including multi-region[1], latency[1], and throughput[1]
• design a solution that uses Cosmos DB, Data Lake Storage Gen2, or Blob storage[1]
• select the appropriate Cosmos DB API[1]
• design a disaster recovery[1] strategy
• design for high availability[1]
Design relational cloud data stores
• design data distribution[1] and partitions[1]
• design for scale, including multi-region[1], latency, and throughput[1]
• design a solution that uses SQL Database[1] and SQL Data Warehouse[1]
• design a disaster recovery strategy[1][2][3]
• design for high availability[1]
2. Design data processing solutions (25-30%)
Design batch processing solutions
• design batch processing solutions by using Data Factory and Azure Databricks[1]
• identify the optimal data ingestion method for a batch processing solution[1]
• identify where processing should take place, such as at the source, at the destination, or in transit[1]
Design real-time processing solutions
• design for real-time[1] processing by using Stream Analytics[1] and Azure Databricks[1]
• design and provision compute resources[1]
3. Design for data security and compliance (25-30%)
Design security for source data access
• plan for secure endpoints (private/public)[1][2][3]
• choose the appropriate authentication mechanism, such as access keys[1], shared access signatures (SAS)[1], and Azure Active Directory (Azure AD)[1]
Design security for data policies and standards
• design data encryption[1] for data at rest and in transit[1]
• design for data auditing[1] and data masking[1]
• design for data privacy and data classification[1][2]
• design a data retention[1] policy
• plan an archiving[1] strategy
• plan to purge data[1] based on business requirements
Reference Tables
Azure Databricks - Workload Types
Data Engineering Light | Data Engineering | Data Analytics | |
---|---|---|---|
Managed Apache Spark | Y | Y | Y |
Job scheduling with libraries | Y | Y | Y |
Job scheduling with Notebooks | Y | Y | |
Autopilot clusters | Y | Y | |
Databricks Runtime for ML | Y | Y | |
Managed MLflow | Y | Y | |
Managed Delta Lake | Y | Y | |
Interactive clusters | Y | ||
Notebooks and collaboration | Y | ||
Ecosystem integrations | Y |
Cosmos DB - SLAs
Distribution | Reads | Writes |
---|---|---|
Single Region | 99.99 | 99.99 |
Multi-Region (Single Region Writes) | 99.999 | 99.99 |
Multi-Region (Multi-Region Writes) | 99.999 | 99.999 |
Azure Blob Storage - Tiers
Tier | Access | Period | Latency |
---|---|---|---|
Hot | Frequently | N/A | Lowest |
Cool | Infrequently | 30+ days | Medium |
Archive | Rarely | 180+ days | Highest |
Azure SQL - Network Access Controls
Control | Description |
---|---|
Allow Azure Services | When set to ON, other resources within the Azure boundary can access the SQL resource. |
IP firewall rules | Use this feature to explicitly allow connections from a specific IP address. |
Virtual Network firewall rules | Use this feature to allow traffic from a specific Virtual Network within the Azure boundary. |