Skip to content
🚀 Limited-time offer $130 for Lifetime Pro access. Buy once and use forever

Status List

Uptime & Status Pages

ZooKeeper Monitoring on AWS

Unfortunately, AWS doesn’t have a pre-packaged monitoring solution for ZooKeeper. But, with a little bit of effort, we can still use AWS’s built-in CloudWatch system and all it’s benefits.

ZooKeeper and CloudWatch

Building A Watcher

Writing custom code isn’t for everyone. If you prefer to use existing solutions, check out our self-hosted guide or our third party comparison list

You can also interface with ZooKeeper using Soabase Exhibitor. While this approach may seem simpler, it’s actually more complex. You will need to setup Exhibitor, an http ingestor AND a CloudWatch pusher. The approach below uses a single, bundled component to do all three. Exhibitor is also an archived project and no longer maintained.

First, we need a watcher application to connect to ZooKeeper and pull the metrics we need. You can put this watcher in your main application or as a standalone app. Open up your project and create a new package for your watcher.

We’ll start with a stub method that we can call every few seconds.

				
					public static void publishCloudWatchData(CloudWatchClient cw, String zkHost, String zkPort) {
    // get zookeeper stats and push to cloud watch
}
				
			

Next, let’s fill out the method body. Start by pulling the ZooKeeper status information. We’ll use the mntr command.  You can see the official documentation for more details on this command.

				
					try {
    String commandOutput;

    // connect to zookeeper host
    using(Socket zk = new Socket(zkHost, zkPort)) {
        
        // send mntr command
        OutputStream zkInput = zk.getOutputStream();
        zkInput.write("mntr\n".getBytes());
        zkInput.close();

        // get response from zookeeper
        StringWriter writer = new StringWriter();
        IOUtils.copy(zk.getInputStream(), writer, encoding);
        commandOutput = writer.toString();
    }
} catch (CloudWatchException | IOException e) {
    System.err.println(e.getMessage());
}
				
			

Once we have our mntr output, we can convert that into CloudWatch data.

				
					// get current time
String time = ZonedDateTime.now( ZoneOffset.UTC ).format( DateTimeFormatter.ISO_INSTANT );
Instant instant = Instant.parse(time);

// convert command output to CloudWatch format
String[] lines = commandOutput.split("\n");
List<MetricDatum> metricDataList = new ArrayList<>();

for(String line : lines) {
    String[] parts = line.split("\t");
    
    // parse metric value
    Scanner sc = new Scanner(parts[1]);
    if(!sc.hasNextDouble()) {
        continue; // only include numeric values
    }

    double value = sc.nextDouble()

    String metricName = parts[0];

    // setup metric units
    StandardUnit unit = StandardUnit.NONE;
    if(metricName.endsWith("latency")) {
        unit = StandardUnit.MILLISECONDS;
    }

    // CloudWatch format
    MetricDatum datum = MetricDatum.builder()
        .metricName(metricName)
        .unit(unit)
        .value(value)
        .timestamp(instant)
        .build();    

    // add to CloudWatch list
    metricDataList.add(datum);
}
				
			

Finally, let’s put it all together. We connect to ZooKeeper, get the mntr output, convert to CloudWatch and submit to CloudWatch.

				
					
aws cloudwatch put-metric-data --metric-name Buffers --namespace MyNameSpace --unit Bytes --value 231434333 --dimensions InstanceId=1-23456789,InstanceType=m1.small

aws cloudwatch put-metric-data --metric-name PageViewCount --namespace MyService --statistic-values Sum=11,Minimum=2,Maximum=5,SampleCount=3 --timestamp 2016-10-14T12:00:00.000Z


public static void publishCloudWatchData(CloudWatchClient cw, String zkHost, String zkPort) {
        try {
            String commandOutput;

            // connect to zookeeper server
            using(Socket zk = new Socket(zkHost, zkPort)) {
                
                // send mntr command
                OutputStream zkInput = zk.getOutputStream();
                zkInput.write("mntr\n".getBytes());
                zkInput.close();

                // read zookeeper response
                StringWriter writer = new StringWriter();
                IOUtils.copy(zk.getInputStream(), writer, encoding);
                commandOutput = writer.toString();
            }

            // get current time
            String time = ZonedDateTime.now( ZoneOffset.UTC ).format( DateTimeFormatter.ISO_INSTANT );
            Instant instant = Instant.parse(time);

            // convert command output to CloudWatch format
            String[] lines = commandOutput.split("\n");
            List<MetricDatum> metricDataList = new ArrayList<>();

            for(String line : lines) {
                String[] parts = line.split("\t");
                
                // parse metric value
                Scanner sc = new Scanner(parts[1]);
                if(!sc.hasNextDouble()) {
                    continue; // only include numeric values
                }

                double value = sc.nextDouble()

                String metricName = parts[0];

                // setup metric units
                StandardUnit unit = StandardUnit.NONE;
                if(metricName.endsWith("latency")) {
                    unit = StandardUnit.MILLISECONDS;
                }

                // CloudWatch format
                MetricDatum datum = MetricDatum.builder()
                    .metricName(metricName)
                    .unit(unit)
                    .value(value)
                    .timestamp(instant)
                    .build();    

                // add to CloudWatch list
                metricDataList.add(datum);
            }

            // send info to CloudWatch
            PutMetricDataRequest request = PutMetricDataRequest.builder()
                .namespace("zooKeeper/" + zkHost)
                .metricData(metricDataList)
                .build();

            cw.putMetricData(request);
            System.out.println("Published to CloudWatch");

        } catch (CloudWatchException | IOException e) {
            System.err.println(e.getMessage());
        }
    }
				
			

Configuring CloudWatch

Let’s login to the AWS management console. Open up CloudWatch and create a new alarm. Under metrics browse for zooKeeper/*. If you just started submitting metrics, it can take up to 2 minutes for your data to appear.

Choose your ZooKeeper metric and select next. Set your threshold and press next. Configure your trigger to send your operations team an email when the alarm is triggered.

Here are a few metrics we recommend you setup:

				
					- znode total occupied memory is too big
  - query: approximate_data_size /1024 /1024 > 1 * 1024 # more than 1024 MB(1 GB)
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} znode total occupied memory is too big

- avg latency is too high
  - query: avg_latency > 100
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} avg latency is too high

- create too many znodes
  - query: znode_count > 1000000
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} create too many znodes

- open too many files
  - query: open_file_descriptor_count > 300
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} open too many files

- create too many connections
  - query: num_alive_connections > 50 # suppose we use the default maxClientCnxns: 60
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} create too many connections
- set too many watch
  - query: watch_count > 10000
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} set too many watch
				
			

Summary

You now have ZooKeeper metrics flowing into your CloudWatch panel. Your team will be alerted when ZooKeeper goes down or degrades.

Ready to know about downtime before your customers?
 
Status List delivers uptime monitoring and professional hosted status pages for sites of all shapes and sizes.

Trusted by 1000+ companies