Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

Tips for the Databricks Certified Data Analyst Associate Certification

Following on from my previous blog post about the Data Engineer Associate certification, this blog will take you through all of the information, updates and study tips you’ll need to pass the Databricks Certified Data Analyst Associate exam. As before, we’ll look at any updates to the exam and its training course, the high-level exam information provided on the Databricks website, a few example questions and finally the best study materials available to pass the exam!

 

COURSE AND CERTIFICATION UPDATES

The Data Analyst Associate course and exam do not have any content updates as of 27/07/2023, but Databricks notes on the exam page that the Fundamentals of the Databricks Lakehouse Platform (V2) Learning Plan is a prerequisite for the Data Analyst Associate content (it is a prerequisite for all Associate-level content and above). I would assume if you’re interested in an Associate exam that you’ve already completed this learning path, but if you haven’t yet you can find in the Partner Academy or Customer Academy sites.

 

EXAM INFORMATION

As with the Data Engineer Associate certification, the webpage for the Data Analyst Associate certification is usually updated by Databricks before any training content, so it’s the best place to keep track of exam changes or practice tests (the website currently says that a Certification Overview course is ‘coming soon’, but it’s already on the Academy!).

Just before the Exam Details section, Databricks reminds us that the Lakehouse Fundamentals accreditation should be completed first, so you have a base understanding of the different components of the Databricks platform.

As well as listing the knowledge for a minimally-qualified candidate, Databricks provide another breakdown of the major topics for this exam. It has 45 multiple-choice questions with a 70% pass mark (32/45), split into these areas:

  • Databricks SQL – 22% (10/45)

  • Data Management – 20% (9/45)

  • SQL – 29% (13/45)

  • Data Visualization and Dashboard – 18% 8/45)

  • Analytics Applications – 11% (5/45)

There is currently no practice exam available for this certification, however several Udemy courses provide handwritten practice exams of their own!

As with the Data Engineer Associate exam, this exam is proctored by Kryterion via their  Webassessor platform, which you’ll need to make an account for. If you have any queries about how the exam is proctored, you can visit their FAQ page (or contact them directly). It’s important to note here that you have an unlimited number of reschedules available, but be aware that there are late fees if you reschedule within 24 hours of an online exam or 72 hours of an in-person exam!

 

EXAMPLE CODE QUESTIONS

Below we’ll look at some example code questions I’ve written to show you the typical question layout in the exam. We’ll first look at a question about granting access to an object in Databricks SQL, and then a question about the clearing stale data files from Delta tables.

Question 1:

A Junior Data Analyst needs access to a new table called na_orders in the orders schema in Databricks SQL, so they can create a dashboard showing key insights from the data. The Junior Data Analyst has already created several dashboards from other tables in the orders schema.

They ask a Senior Data Analyst to grant them access programmatically via SQL, so that the Junior Data Analyst can learn how to grant permissions with SQL statements rather than within the Databricks SQL UI. What SQL query does the Senior Data Analyst need to run in order to give the appropriate access to this table?

  1. GRANT FULL ACCESS ON na_orders TO `juniorDA@email.com`

  2. GRANT SELECT ON na_orders TO `juniorDA@email.com`

  3. ALLOW ALL PRIVILEGES ON na_orders TO `juniorDA@email.com`

  4. GRANT USE SCHEMA ON orders TO `juniorDA@email.com`

  5. ALLOW SELECT ON na_orders TO `juniorDA@email.com`

The correct answer is below:

  • Answer 1 uses ‘FULL ACCESS’ which is incorrect syntax

  • Answer 2 correctly assigns the ‘SELECT’ permission to the Junior Data Analyst!

  • Answer 3 uses ‘ALLOW’ which is incorrect syntax and grants all permissions to the table unnecessarily

  • Answer 4 grants a privilege that the Junior Data Analyst already has on the schema, as they have queried tables in the orders schema before and must have the ‘USE SCHEMA’ permission on orders to do so

  • Answer 5 uses ‘ALLOW’ which is incorrect syntax

Question 2:

A Data Analyst needs to remove data files in Delta table ny_store_sales with a large transaction log in order to free up space and save on storage costs. The Data Analyst knows that ny_store_sales is being streamed into the Gold layer of the Lakehouse for reporting workloads. The latency on this workload means that it sometimes uses snapshots of data that are several days out of date with the most recent transaction log.

A Senior Data Analyst has explained that ny_store_sales has been effectively partitioned and does not suffer from a small files problem, so the operation to clear out data files will be performant. Which SQL command should the Data Analyst run to clear out old data files, ensuring that they don’t remove any files that may be used by the streaming workload?

  1. ALTER STREAMING TABLE ny_store_sales RETENTION 7 DAYS

  2. OPTIMIZE ny_store_sales REMOVE 7 DAYS

  3. VACUUM ny_store_sales HOLD 168 HOURS

  4. VACUUM ny_store_sales RETAIN 168 HOURS

  5. ALTER STREAMING TABLE ny_store_sales RETAIN 168 HOURS

The correct answer is below:

  • Answer 1 uses ‘ALTER STREAMING TABLE’ which is incorrect syntax and has no ‘RETENTION <number> DAYS’ parameter

  • Answer 2 uses ‘OPTIMIZE’ which is incorrect syntax (the Delta table has already been optimized as it doesn’t suffer a small files problem) and has no ‘REMOVE <number> DAYS’ parameter

  • Answer 3 correctly uses ‘VACUUM’ but uses ‘HOLD <number> HOURS’ which is not an available parameter of ‘VACUUM’

  • Answer 4 correctly uses ‘VACUUM’ with the correct ‘RETAIN <number> HOURS’ parameter!

  • Answer 5 uses the right parameter, but on the incorrect ‘ALTER STREAMING TABLE’ method

 

STUDY MATERIALS

The best resource for the Data Analyst Associate exam is Data Analysis with Databricks SQL – it’s designed specifically to prepare you for the exam (much like the Data Engineer Associate course).

The only prerequisite for this course is basic experience with SQL (the ability to write a SELECT <columns> FROM <table> WHERE <condition> statement). Whilst you’ll be writing ANSI-standard SQL code in this course, there will also be Databricks-specific syntax to learn which is typically well-documented on Microsoft Learn.

The associated course for a Databricks exam is more than enough to help you pass, but there are other resources out there like the aforementioned Udemy practice tests if you want more preparation. I’d generally recommend these if you feel like you need experience with a varied set of questions that can often be as difficult or even harder than those on the exam. As the Data Analyst exam doesn’t yet have a practice test, one of these courses could be a great option to solidify your knowledge before attempting the exam!

As of 27/07/2023, there’s just one fairly new Udemy course covering the Data Analyst exam, which you can find here. It contains three tests of 45 questions each, so once you’re comfortable with the course content in the Databricks Academy, this would be the best level of preparation you could have to pass the exam. The course has a rating of 4.4/5, although I did find a review that mentioned some key topics such as higher-order functions, window functions and others are missing from the available questions – make sure you cover all the content listed on the webpage thoroughly, especially if there are no questions on it!

As mentioned on my Data Engineer Associate blog, following Databricks staff on LinkedIn is a great way to keep an eye out for exam vouchers, which can be applied to any live Databricks exam! I’d recommend Youssef Mrini and Samantha Menot, who both post regularly about new Databricks features, training and exam vouchers!

Once again, best of luck for your exam! 😊

Dylan JonesComment