I’ve been put off taking AWS Beta exams ever since the 2016 Security Specialty debacle, so when it came to the AWS Certified Machine Learning Specialty Exam (MLS-C01), I decided to wait it out, and I took the ‘real’ exam the first day it was released. In this post, I will go through my thoughts on the exam, and how to pass it.

(* Actually it was the second day, but that sounds less dramatic!)

TL;DR

  • Exam should be called: The AWS Certified Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) Specialty Exam
  • Around 50% of the exam is focused on pure Machine Learning or Deep Learning
  • While you don’t need a deep understanding of the statistical mathematics and probability behind the subject you need a good working knowledge of the key aspects and be able to do some simple maths
  • AWS’s flagship ML/DL service - AWS SageMaker - is the single most mentioned service in the exam
  • Understand how to architect AWS’s AI API services
  • At times there is a decidedly Big Data feel
  • If you’re comfortable with ML/DL and AWS, this exam is do-able

General Thoughts

Let’s get this out of the way right up front; The AWS Certified Machine Learning Specialty Exam should actually be called The AWS Certified Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) Specialty Exam. More verbose (to say the least) but it would be accurate.

Naming aside, this is still unlike any of the other AWS exams…

At the time of writing, each of the other AWS exams focuses on - as you would expect - the AWS services within the subject area of the exam. This includes the Specialty exams which focus on a deep level of knowledge in their respective areas, Networking, Big Data and, Security but primarily through AWS services and AWS ethos.

The AWS Certified Machine Learning Specialty Exam is different, as around 50% of the exam is focused on pure Machine Learning or Deep Learning. This means you could gain a reasonable percentage in this exam with no knowledge of AWS at all. (But you wouldn’t pass!)

Subject Breakdown

(As usual) forget the AWS Exam Guide, here is the breakdown:

Subject Percentage
ML/DL 50%
AWS SageMaker 25%
Other AWS Services 25%

Let’s look into each of these subject areas in more detail:

Machine Learning / Deep Learning

By far the biggest portion of the exam is dedicated to proving your knowledge of ML and DL. While you don’t need a deep understanding of the statistical mathematics and probability behind the subject you need a good working knowledge of the key aspects and be able to do some simple maths.

You need to have a solid grasp on these concepts:

  • Understand the difference between supervised, unsupervised and reinforcement learning.
  • Know the purpose behind training, validation and testing data and how its managed.
  • Have a general understanding of hyperparameters and some of the common ones used by various algorithms.
  • Understand regularisation, what it does and some ways to achieve it.
  • Know about regression, and gradient decent.

You should know about ‘input data’:

  • Data Visualisation using Jupyter notebooks, but also other tools like AWS QuickSight.
  • Understand Feature engineering and when to use what technique.
  • Know what to do when data in unbalanced or missing.

You need to know when to use which model types (algorithms) - and what they are, and some high level configuration:

  • Logistical Regression
  • Linear Regression
  • Support Vector Machines
  • Decision Trees / Random Forests
  • K-means Clustering
  • K-Nearest Neighbours

And you need to have a good understanding of deep learning models and their uses:

  • Convolutional Neural Networks (CNN)
  • Recurrent Neural Networks (RNN)

Many questions focus on how to measure, understand and improve the performance of ML/DL models. You might get questions with graphs and charts, and be expected to make calculations. You should understand:

  • Accuracy
  • Gini Impurity
  • Confusion Matrix
  • F1 Score
  • Sensitivity / Specificity
  • Precision
  • Recall

All of the above is from the point of view of ML/DL computer science. There is no AWS knowledge in the above. That all follows on from here…

AWS SageMaker

AWS’s flagship ML/DL service - AWS SageMaker - is the single most mentioned service in the exam. That should come as no surprise.

AWS has dropped its ‘support’ for AWS ML which is no longer available to anyone who’s not already using it - however, AWS ML still popped up in a couple of answers in this exam.

For the exam you should know:

  • What SageMaker actually is, and generally speaking how it’s architected
  • How Jupyter notebooks fit into SageMaker and how they are used
  • How to access secure data
  • How to secure a Jupyter Notebook
  • How you train models
  • How to deploy models
  • Know about Hyperparameter Optimisation

It goes without saying that you should have a general knowledge of all the AWS optimized algorithms, including:

  • BlazingText
  • Image Classification Algorithm
  • K-Means
  • K-Nearest Neighbours
  • Latent Dirichlet Allocation
  • Linear Learner
  • Object2Vec
  • Object Detection
  • Principal Component Analysis (PCA)
  • Random Cut Forest (RCF)
  • Sequence-to-Sequence (seq2seq)
  • XGBoost

…as well as knowing how to import your own (or someone else’s) custom algorithm from outside SageMaker.

Other AWS AI Services

After all that heavy ML/DL is can seem like a little light relief to get a question or two about some of AWS’s AI services. But don’t get complacent, for each of these you should have a good understanding of its use case, its limitations, and how they should be architected. Especially make sure you understand how to combine these services together to make complete solutions. (These questions are a lot like ‘AWS Architecture’ questions.)

  • Rekognition (Images and Video)
  • Polly
  • Transcribe
  • Lex
  • Translate
  • Comprehend

Other AWS Services

A good general understanding of AWS, to at least associate level, will see you through many of the AWS aspects of the questions you will face. However there are some services that feature more than others, and at times there is a decidedly Big Data feel.

Make sure you are comfortable and preferably have experience with:

  • S3 including how to secure your data
  • Athena including performance
  • Kinesis Firehose and Analytics for moving data around and transforming
  • Elastic Map Reduce (EMR) including with Spark
  • AWS Glue
  • QuickSight including how it sources data from various other services

Final Thoughts

So did this exam add anything to the ‘which is harder ‘pro’ or ‘specialty’ debate? Well from a 10,000-foot view the exam looked like a ‘pro’ exam:

  • The questions were wordy
  • The answers could be wordy
  • The time limit is 3 hours
  • There are 65 questions

But at the end of the day, whether this is harder than a pro exam or not, yet again falls to your background:

  • If you’re new to ML/DL, this exam is hard.
  • If you’re new to AWS, this exam is hard.
  • If you’re comfortable with ML/DL and AWS, this exam is do-able.

Preparing for this exam depends a lot on your background. There really is no substitute for getting hands-on, especially with AWS SageMaker. AWS has a number of awesome pre-made example Jupyter notebooks should be seen as ‘copy-n-paste’ reference guides even for some production developments.