Uptime Checks: What Should I Monitor?
Uptime monitoring is a must for web applications. Monitors will alert you to errors and issues so you can fix them before customers are affected. It’s critical that you setup monitors so you can effectively track what’s working and what’s not.
Depending on the complexity of your app, there may be different things you want to monitor. A static file server is monitored different than a dynamic web application.
Think of the adage ‘If it’s not covered by tests, then it’s broken’. You should think of your monitors as unit tests for your production environment. Monitors are constantly checking that everything is working correctly (and telling you when they aren’t). So let’s get into specifics.
Monitoring Critical Application Functions
Let’s talk about your web application. Just because your HTTP process is replying that doesn’t mean things are working correctly. Anyone who’s seen a fail whale on Twitter can tell you that. It’s critical that your application is working correctly. Without it, the rest of your system won’t be useful.
There’s a bunch of fundamental things you should check here:
- Can my server process access the filesystem it correctly?
- Does my server process talk to the database(s) correctly?
- Can I talk to the upstream micro-service(s)?
- If I rely on an external service (like S3 or SendGrid), is my server process configured correctly to talk to them? Are those external services up?
Monitoring the Network
Your network is the connective tissue that allows customers to communicate with your app. If your network is having trouble, your customers won’t be able to use your service (We’ve all played the jumping dinosaur game on Chrome).
With a network monitor we can monitor a few things at once. I can check if the network path to the server up. I can check if my server on/responsive. And I can check if my web server is working.
You should monitor each server port that is listening for HTTP traffic. If you have a static service, that might only be 1 NGINX process. If you have a load balancer and/or micro-services, setup monitors for each process that you can.
If you use auto-scaling it’s more difficult to monitor each server. In that case, hit the load balancer more frequently. The increased frequency of the checks will cycle through the processes behind the load balancer. You can also use load balancer checks (HAProxy How To).
Monitor DNS Changes
DNS is the record system that ensures customers can reach your service when they type in yourservice.com. If someone makes a mistake in configuring your DNS, the entire service will be broken for your customers. You need to monitor your DNS so you can be alerted when this happens.
Uptime monitors like Status List can check this for you if you set it up. It’s pretty easy actually.
Create a monitor each A/AAA/CNAME record on your system. e.g. sample.com, app.sample.com, us-east.sample.com, upstream-video-transcoding.sample.com. Then, when the DNS points to the wrong place, you’ll know about it.
At the very least you should have a monitor that checks your primary domain. If your primary domain is down, it’s going to be real bad if no one notices right? Monitor it.
You wouldn’t believe how many HTTPS certs aren’t renewed on time! This affects everyone, even Microsoft misses this sometimes. But it’s so easy to throw an uptime monitor on that.
On uptime monitoring services like Status List, each request will check that the certificate is valid. Create a monitor for each HTTPS certificate and HTTPS termination point you have. One termination point, one domain – create one monitor for that domain. One termination point, multiple domains on a wildcard certificate – create one monitor for one domain on the certificate. Multiple termination points, wildcard certificate – create a monitor that will hit each termination point.
If your system isn’t performing well, customers are going to be unhappy and start to look for alternatives. We need to be on top of performance issues so we can fix them before our customers notice.
Performance problems can come from many sources. A database table may have grown and introduced slowness on certain queries. Customer load might spike on a CPU intensive task like image processing. Those problems aren’t always easy to diagnose, so we want to know about them early!
We can monitor our performance using a real data check. This will track how our system is performing on real customer endpoints. When it runs slow, we’ll be alerted so we can investigate.
Your site needs to be running at the same high level regardless of where it’s being viewed. An outage only affecting Australia could go undetected by your US-based team for days. If your reach is global, your clientele are global, your service needs to be up globally.
How do I set it up? Many uptime services will have a series of checkboxes in their monitor configuration. Select all the countries you’d like to monitor. You’re off to the races!
What regions should I target?
This really depends on where your target market is. If you’re selling American tax software, you may only want to check the US. Running an email service? Your clients may live in the US. But they’re going to travel to places like Europe, Australia etc and want access to their email.
How to set it up?
It’s really easy to set up multi-region monitoring. Open up your uptime service’s dashboard. Under your monitor’s settings, there will be options for which regions would like to have checks from. Choose regions that represent your user-base and you’re set. It’s that easy!
Putting it all together
It’s really important to monitor ALL components of your system. You need web servers, network, DNS and multi-region support for your application. If any of those fail, you’re app will stop functioning. Monitor every angle and you’ll receive alerts at the first sign of trouble.
If you enjoyed this article, you may also enjoy some of our technology specific guides:
You can find a full list of guides at Guides.