Ready to know about downtime before your customers?
Status List delivers uptime monitoring and professional hosted status pages for sites of all shapes and sizes.
Trusted by 1000+ companies
Unfortunately, AWS doesn’t have a pre-packaged monitoring solution for ZooKeeper. But, with a little bit of effort, we can still use AWS’s built-in CloudWatch system and all it’s benefits.
Writing custom code isn’t for everyone. If you prefer to use existing solutions, check out our self-hosted guide or our third party comparison list
You can also interface with ZooKeeper using Soabase Exhibitor. While this approach may seem simpler, it’s actually more complex. You will need to setup Exhibitor, an http ingestor AND a CloudWatch pusher. The approach below uses a single, bundled component to do all three. Exhibitor is also an archived project and no longer maintained.
First, we need a watcher application to connect to ZooKeeper and pull the metrics we need. You can put this watcher in your main application or as a standalone app. Open up your project and create a new package for your watcher.
We’ll start with a stub method that we can call every few seconds.
public static void publishCloudWatchData(CloudWatchClient cw, String zkHost, String zkPort) {
// get zookeeper stats and push to cloud watch
}
Next, let’s fill out the method body. Start by pulling the ZooKeeper status information. We’ll use the mntr command. You can see the official documentation for more details on this command.
try {
String commandOutput;
// connect to zookeeper host
using(Socket zk = new Socket(zkHost, zkPort)) {
// send mntr command
OutputStream zkInput = zk.getOutputStream();
zkInput.write("mntr\n".getBytes());
zkInput.close();
// get response from zookeeper
StringWriter writer = new StringWriter();
IOUtils.copy(zk.getInputStream(), writer, encoding);
commandOutput = writer.toString();
}
} catch (CloudWatchException | IOException e) {
System.err.println(e.getMessage());
}
Once we have our mntr output, we can convert that into CloudWatch data.
// get current time
String time = ZonedDateTime.now( ZoneOffset.UTC ).format( DateTimeFormatter.ISO_INSTANT );
Instant instant = Instant.parse(time);
// convert command output to CloudWatch format
String[] lines = commandOutput.split("\n");
List metricDataList = new ArrayList<>();
for(String line : lines) {
String[] parts = line.split("\t");
// parse metric value
Scanner sc = new Scanner(parts[1]);
if(!sc.hasNextDouble()) {
continue; // only include numeric values
}
double value = sc.nextDouble()
String metricName = parts[0];
// setup metric units
StandardUnit unit = StandardUnit.NONE;
if(metricName.endsWith("latency")) {
unit = StandardUnit.MILLISECONDS;
}
// CloudWatch format
MetricDatum datum = MetricDatum.builder()
.metricName(metricName)
.unit(unit)
.value(value)
.timestamp(instant)
.build();
// add to CloudWatch list
metricDataList.add(datum);
}
Finally, let’s put it all together. We connect to ZooKeeper, get the mntr output, convert to CloudWatch and submit to CloudWatch.
aws cloudwatch put-metric-data --metric-name Buffers --namespace MyNameSpace --unit Bytes --value 231434333 --dimensions InstanceId=1-23456789,InstanceType=m1.small
aws cloudwatch put-metric-data --metric-name PageViewCount --namespace MyService --statistic-values Sum=11,Minimum=2,Maximum=5,SampleCount=3 --timestamp 2016-10-14T12:00:00.000Z
public static void publishCloudWatchData(CloudWatchClient cw, String zkHost, String zkPort) {
try {
String commandOutput;
// connect to zookeeper server
using(Socket zk = new Socket(zkHost, zkPort)) {
// send mntr command
OutputStream zkInput = zk.getOutputStream();
zkInput.write("mntr\n".getBytes());
zkInput.close();
// read zookeeper response
StringWriter writer = new StringWriter();
IOUtils.copy(zk.getInputStream(), writer, encoding);
commandOutput = writer.toString();
}
// get current time
String time = ZonedDateTime.now( ZoneOffset.UTC ).format( DateTimeFormatter.ISO_INSTANT );
Instant instant = Instant.parse(time);
// convert command output to CloudWatch format
String[] lines = commandOutput.split("\n");
List metricDataList = new ArrayList<>();
for(String line : lines) {
String[] parts = line.split("\t");
// parse metric value
Scanner sc = new Scanner(parts[1]);
if(!sc.hasNextDouble()) {
continue; // only include numeric values
}
double value = sc.nextDouble()
String metricName = parts[0];
// setup metric units
StandardUnit unit = StandardUnit.NONE;
if(metricName.endsWith("latency")) {
unit = StandardUnit.MILLISECONDS;
}
// CloudWatch format
MetricDatum datum = MetricDatum.builder()
.metricName(metricName)
.unit(unit)
.value(value)
.timestamp(instant)
.build();
// add to CloudWatch list
metricDataList.add(datum);
}
// send info to CloudWatch
PutMetricDataRequest request = PutMetricDataRequest.builder()
.namespace("zooKeeper/" + zkHost)
.metricData(metricDataList)
.build();
cw.putMetricData(request);
System.out.println("Published to CloudWatch");
} catch (CloudWatchException | IOException e) {
System.err.println(e.getMessage());
}
}
Let’s login to the AWS management console. Open up CloudWatch and create a new alarm. Under metrics browse for zooKeeper/*. If you just started submitting metrics, it can take up to 2 minutes for your data to appear.
Choose your ZooKeeper metric and select next. Set your threshold and press next. Configure your trigger to send your operations team an email when the alarm is triggered.
Here are a few metrics we recommend you setup:
- znode total occupied memory is too big
- query: approximate_data_size /1024 /1024 > 1 * 1024 # more than 1024 MB(1 GB)
- timebucket: 1m
- alert title: Instance {{ $labels.instance }} znode total occupied memory is too big
- avg latency is too high
- query: avg_latency > 100
- timebucket: 1m
- alert title: Instance {{ $labels.instance }} avg latency is too high
- create too many znodes
- query: znode_count > 1000000
- timebucket: 1m
- alert title: Instance {{ $labels.instance }} create too many znodes
- open too many files
- query: open_file_descriptor_count > 300
- timebucket: 1m
- alert title: Instance {{ $labels.instance }} open too many files
- create too many connections
- query: num_alive_connections > 50 # suppose we use the default maxClientCnxns: 60
- timebucket: 1m
- alert title: Instance {{ $labels.instance }} create too many connections
- set too many watch
- query: watch_count > 10000
- timebucket: 1m
- alert title: Instance {{ $labels.instance }} set too many watch
You now have ZooKeeper metrics flowing into your CloudWatch panel. Your team will be alerted when ZooKeeper goes down or degrades.
Trusted by 1000+ companies
© Status List 2024