How to Prevent Incidents
There’s a lot to do when there’s an incident. There’s a lot of unknowns. There are communication and investigation that all needs to happen. All the while, your team is in the hot seat. Your customers grow impatient and start thinking about using other competitors. You do not want to be in this situation and if you find yourself there, you want it resolved ASAP. Before you find yourself in an incident there are a few things you can do to stay ahead of the problem.
Use your Status Page
There are service provider status pages – but you already knew that. Status pages can give you information on what’s happening with your service provider. The problem with status pages is that usually, they’re out of date. Like there was an issue 6 hrs ago and it still hasn’t made it on the page. And there’s another problem. If there’s an issue with your specific account, usually it won’t be up on the status page. Status pages are great when providers do a good job of updating them, but most of the time they don’t.
Use Built-In Monitoring
You can bake some monitoring into your application. The monitoring will let you know when there are problems with your service provider. This can be very helpful because you get notified right away when there’s a problem. Track things like response time and HTTP statuses. Set up customized alerts so you know what’s going on right away. Now you know what’s going on faster. Then your dev team can start investigation way before the first customer notices. During this time you can get your other teams up to speed. You update support, sales and marketing teams before any customer reaches out! You’ve given yourself a 20 min head start on resolving the issue! Who doesn’t want a 20 min head start?
Add Built-In Debug Info
You can also bake some debug information into your application. Setup your application to log interactions that end in unexpected ways. Then when you have an incident with your service provider, you’re ready. You can pull the debug logs and send your findings to their support team. Giving detailed debug information will help your service provider resolve the issue faster. Fast issue resolution is in everyone’s best interest, any help we can give to our service provider is a win for us.
Putting it all together
Building monitoring, alerting and log storage is takes engineering effort. It takes away the resources you would use on things like customer feature requests. That’s why I created https://statuslist.app. Status List uses your account-specific credentials to track endpoints for you. We poll your service providers’ endpoints for you. You get alerts right away when incidents happen, so you can have that head start. Status List monitors for HTTP status codes, uptime and response time. When there’s an alert we give all the detail you need. The alert has full HTTP transcripts of the interaction including event timestamps. You can use that to help your service provider get back up and running. Best of all, you can set up your monitors in 5 mins. Instead of spending weeks of your precious engineering resources, you can start today.
Providing reliable SAAS when you rely on external services can be tough. But if we leverage some of these smart strategies we can stay ahead of the problems. And who knows, you may even win the trust of your customers by the way you handle the incidents that come your way.