Skip to content

Monitoring ZooKeeper

Other Resources

How to Setup ZooKeeper Monitoring: InfluxDb

Let’s setup ZooKeeper monitoring with InfluxDb (a popular time series database and visualizer). In this guide, we’ll get you completely set up to monitor your existing ZooKeeper cluster.

1. Let’s get InfluxDb installed. Head over to the official download page. Install the database, Telegraf data collector and cli packages.

2. Start the influxdb and telegraf service

				
					service influxdb start
service telegraf start
				
			

3. Open the InfluxDb Dashboard by going to http://localhost:8083

4. Install the ZooKeeper dashboard template

				
					export ZOOKEEPER_HOST=https://localhost:2181
influx apply -u https://raw.githubusercontent.com/influxdata/community-templates/master/zookeeper/zookeeper.yml

				
			

5. Reload your browser and view the new dashboard. You should see metrics start to appear within a few seconds.

Ready to get the whole story on your uptime?
 
Status List delivers uptime checks with technical diagnostics in a one dashboard. A pass/fail isn’t the whole story.
 
Join over 2,000 companies and try it for free today.
statuslist dashboard, monitor expanded
Alerts in InfluxDb

6. Let’s create a place for your notifications to go. Go to Alerts > Alerts > Notification Endpoints. Click + Create.
      a. Give a name and description.
      b. Choose an HTTP/Slack or PagerDuty notification and fill out the appropriate details.

7. Configure your alerts. Go to Alerts > Alerts and click + Create. Add threshold and deadman alerts for each of the following (feel free to customize to your needs). 

				
					- znode total occupied memory is too big
  - query: approximate_data_size /1024 /1024 > 1 * 1024 # more than 1024 MB(1 GB)
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} znode total occupied memory is too big

- avg latency is too high
  - query: avg_latency > 100
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} avg latency is too high

- create too many znodes
  - query: znode_count > 1000000
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} create too many znodes

- open too many files
  - query: open_file_descriptor_count > 300
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} open too many files

- create too many connections
  - query: num_alive_connections > 50 # suppose we use the default maxClientCnxns: 60
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} create too many connections
- set too many watch
  - query: watch_count > 10000
  - timebucket: 1m
  - alert title: Instance {{ $labels.instance }} set too many watch
				
			

For more details on how to configure alerts, please see the influxdb alert documentation.

Congratulations, that’s it! You’re ready to go.

Ready to get the whole story on your uptime?
 
Status List delivers uptime checks with technical diagnostics in a one dashboard. A pass/fail isn’t the whole story.
 
Join over 2,000 companies and try it for free today.
Optimized with PageSpeed Ninja