Creating AWS CloudWatch Dashboards and Alarms with CDK
Monitoring is a crucial part of software operations, but can also be one of the most tedious to implement. If you’re using a traditional config-based IAC tool such as CloudFormation, SAM, Serverless, Terraform, et al. you’ll find yourself copy-pasting dozens of lines for each new metric that you want to graph and alert on.
Modern IAC tools such as CDK enable you to define your infrastructure using actual code (often TypeScript) rather than config (JSON, YAML). I like to refer to these two approaches to IAC as IACode and IAConfig respecively.
In this article we’ll cover how to build a CloudWatch Dashboard and Alarms to monitor a Full Stack Serverless AWS application consisting of API Gateway, Lambda, Cognito, DynamoDB, and a web app hosted on Amplify Hosting. Here’s what the end result looks like (though you’ll need to use your imagination to fill in the empty charts with pretty lines).
Monitoring Construct
Enough preamble. Let’s get into the code! We’ll start by creating a Monitoring
construct that allows us to encapsulate all of the relevant infrastructure in a single place.
The Monitoring
construct class accepts details on other resources within our app such as our Amplify Hosting App, API Gateway, Cognito User Pool and Client, and a list of Lambda Functions and DynamoDB Tables that we want to monitor.
The constructor creates an SNS Topic that’s set up to send an email whenever it receives a message (i.e whenever an alarm is triggered). Make sure you replace ALARM_NOTIFICATION_EMAIL
with the email address you want to receive notifications.
We’re also creating the CloudWatch Dashboard
resource and calling methods to add widgets and alarms (we’ll get to the implementation of these next).
Finally, we have some helper methods for adding headings to our Dashboard and subscribing alarms to our SNS Topic.
Next we’ll get into creating metrics, alarms, and dashboard widgets for our API Gateway resource.
API Gateway Metrics, Alarms and Dashboard Widgets
Many CDK Constructs come with helper metric*
methods that return metrics for the most popular standard metrics of each AWS service. In API Gateway’s case, all of the standard metrics we’re interested in have these helper metrics. Thank you CDK Team! 🙏
We create alarms for both Client Errors and Server Errors. The thresholds are low and you should tweak them to meet your needs (maybe you don’t want to alarm on Client Errors at all since they’re often not actionable). We also add these error metrics to our dashboard and add “red lines” (horizontal annotations) so that we can see how close we get to breaching our alarms. Once again, the CDK Team shows great attention to detail by including a toAnnotation()
method on their alarms to make this super easy! 🙌
We also add metrics for latency (consider adding alarms for this), number of requests, and amount of data processed.
Let’s move onto Lambda.
Lambda Metrics, Alarms and Dashboard Widgets
The main difference with the Lambda metrics is that we’re dealing with multiple Lambda Functions (as opposed to a single API Gateway Resource). Each graph includes the metrics for all of the Lambda Functions passed into the Monitoring construct.
The metrics we’re graphing are duration (both average and P95), number of invocations, number of errors, and number of times a function was throttled. We’re only alarming on the error and throttle metrics here, though you should also consider alarming on duration.
We’re creating 2 alarms per Lambda Function for throttle and error. These alarms aren’t set up to notify. Instead, we create a composite alarm so that we don’t get inundated with notifications when a single issue causes multiple functions to fail.
See the Lambda Metrics Documentation for other available standard metrics.
DynamoDB Metrics and Dashboard Widgets
There’s nothing new to learn here that isn’t covered by the Lambda section. For each DynamoDB Table provided we’re graphing the number of consumed Read and Write Capacity Units, and the number of User Errors.
Consider adding an alarm for the UserErrors
metric since it’s likely due to a bug in your code. If you’re not using DynamoDB’s On Demand billing, you should also consider adding alarms for RCUs and WCUs.
DynamoDB offers significantly more standard metrics than most AWS Services, so you should definitely familiarize yourself with them and decide what’s important for you to graph and alarm on. But don’t go overboard! It’s important to find the right balance between alarming on too many things and not enough.
See the DynamoDB Metrics Documentation for other available standard metrics.
Cognito Metrics and Dashboard Widgets
Adding Cognito User Pools graphs gives great insights on the number of new and returning users. You may even be able to use the TokenRefreshSuccesses
metric to get a rough number of currently active users.
Unfortunately, the Cognito User Pool Construct doesn’t have those handy metric*
methods, so we need to manually create them using new Metric
.
See the Cognito Metrics Documentation for other available standard metrics.
Amplify Hosting Metrics and Dashboard Widgets
I’ve encountered way too many people that don’t know about Amplify Hosting. If you need to host a web app on AWS: I highly recommend Amplify Hosting over the more traditional approach of S3 + CloudFront.
Alarm Status Dashboard Widget
Finally we’ll display a list of our alarms and their current status.
You should also check out the other CDK CloudWatch Dashboard Widgets. The LogQueryWidget
is especially useful for adding a quickview of logs that can be filtered using query insights (e.g. you can display the latest error logs).
Using the Construct
Now that we’ve created the Monitoring
construct, we can add it to our CDK stack by instantiating an instance and passing in details about our full stack application.
A note on laying out widgets
CloudWatch Dashboards are rendered as a 24 column layout. Keep this in mind when deciding on the width of each widget.
According to the docs, every widget included in a single addWidgets
is rendered next to each other, and every new call to addWidgets
creates a new row. In practice, I found that when my widgets within a single addWidgets
didn’t add up to a multiple of 24, the layout started to go a little wonky, and widgets that I expected to be rendered in a new row (because they were in a separate addWidgets
) were not.
There are Row
and Column
constructs that can also give you more control over layout.
Custom Metrics
Graphing and alarming on the standard metrics that AWS provides us with out-of-the-box is only half the story. To improve your “operational excellence” you’ll want to add Custom Metrics relevant to your business requirements.
Try it yourself with Code Genie
Building a full stack Serverless app on AWS takes a lot of time, effort, and expertise. Code Genie lets you get up and running fast with a solid software foundation based on your data model. In minutes you can have a full stack application deployed to your own AWS account, and the source code downloaded for you to start hacking. Metrics, Monitoring, and a Dashboard like the one described in this article is included out-of-the-box. Check out the Getting Started Guide for more details.