Resources
What should I monitor?
What to monitor depends on the complexity of your app. A static file server is monitored different than a dynamic web application. Think of the adage ‘If it’s not covered by tests, then it’s broken’. You should think of your monitors as unit tests for your production environment. Monitors are constantly checking that everything is working correctly (and telling you when they aren’t). So let’s get into specifics.
Monitor Critical Application Functions
Let’s move up to the application level. The HTTP process is replying, but that doesn’t mean things are working correctly. Anyone who’s seen a fail whale on Twitter can tell you that. There’s a bunch of fundamental things to check here. Pick what applies to you. Can my server process access the filesystem it correctly? Does my server process talk to the database(s) correctly? …can I talk to the upstream micro-service(s)? If I rely on an external service (like S3 or SendGrid), is my server process configured correctly to talk to them? Are those external services up?
How do you monitor those things? There’s a couple ways, each have their pros and cons. We’ll get into that in a minute. There’s the health check endpoint, the real data check, and the smoke check.
Monitor the Network
Let’s start at base level. Can I reach the server and have a process accept the HTTP connection? This is the basis of any HTTP service. Whether you’re running a static site or the most complex application, you need a connection between your process and the outside world. Without it, you have nothing. (Think of all those times you spent playing the jumping dinosaur game on Chrome). With a network monitor we’re watching a few things. Is the network path to the server up? My server on/responsive? Is my HTTP process running?

Where should I be using a network monitor? Monitor each process listening on the HTTP ports in your application. If you have a static service, that might only be 1 NGINX process. If you have a load balancer and/or micro-services, you’re going to setup monitors for each process that you can. With auto-scaling you can’t really re-configure your monitoring each time you scale. In that case, hit the load balancer more frequently. The increased frequency of the checks will cycle through the processes behind the load balancer.
Monitor DNS Changes
DNS can be confusing. Mistakes happen. People won’t be able to access your site. Monitor your DNS.
Uptime monitors like Status List can check this for you if you set it up. It’s pretty easy actually. Create a monitor each A/AAA/CNAME record on your system. e.g. sample.com, app.sample.com, us-east.sample.com, upstream-video-transcoding.sample.com. Then, when the DNS points to the wrong place, you’ll know about it.
At the very least you should have a monitor that checks your primary domain. If your primary domain is down, it’s going to be real bad if no one notices right? Monitor it.
Status List Uptime Monitoring
Get uptime monitoring, hosted status page and debug tools all in one dashboard. The base plan includes your custom status domain (status.myapp.com). Join over 2,000 companies and it for free today.
Monitor HTTPS
You wouldn’t believe how many HTTPS certs aren’t renewed on time! This affects everyone, even Microsoft misses this sometimes. But it’s so easy to throw an uptime monitor on that.
On uptime monitoring services like Status List, each request will check that the certificate is valid. Create a monitor for each HTTPS certificate and HTTPS termination point you have. One termination point, one domain – create one monitor for that domain. One termination point, multiple domains on a wildcard certificate – create one monitor for one domain on the certificate. Multiple termination points, wildcard certificate – create a monitor that will hit each termination point.
Monitoring Performance
So our system is up and functioning. The next thing to look out for is performance bottlenecks. Performance problems can come from many sources. A database table may have grown and introduced slowness on certain queries. Customer load might spike on a CPU intensive task like image processing. Those problems aren’t always easy to diagnose, so we want to know about them early!
Multi-Region Monitoring
Your site needs to be running at the same high level regardless of where it’s being viewed. An outage only affecting Australia could go undetected by your US-based team for days. If your reach is global, your clientele are global, your service needs to be up globally.
How do I choose which regions to target? This really depends on where your target market is. If you’re selling American tax software, you may only want to check the US. Running an email service? Your clients may live in the US. But they’re going to travel to places like Europe, Australia etc and want access to their email.
Some services will offer uptime checks in every little country on the globe. You don’t need to monitor them all if you don’t have a presence there. Just choose regions that represent your user-base. If you get the odd person in Indonesia, maybe you just monitor Papua New Guinea. Not every island in Indonesia.
How do you set it up? Many uptime services will have a series of checkboxes in their monitor configuration. Select all the countries you’d like to monitor. You’re off to the races!