- Coordination services are notoriously hard to get right.
- Race conditions and deadlock.
The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.
- Distributed co-ordination service
- eases the development of distributed systems
- Used in many clusters HBase, Hadoop, Kafka, etc
- shared hierarchical namespace
- Like file system
- Namespace consists of data registers - called znodes, in ZooKeeper parlance
Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can achieve high throughput and low latency numbers.
Focus on
- high performance
- highly available
- strictly ordered access
Strict ordered access means sophisticated synchronization primitives can be implemented at the client
ZooKeeper is replicated, Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a set of hosts called an ensemble.
- Leader election
- Configuration management
- Consensus (Quorum)
- Node co-ordination
- Server lease management
- Naming
- Distributed synchronization
- Providing group services
Nodes within ZooKeeper cluster store their data in a shared hierarchal namespace which is similar to a standard file system or a tree data structure.
- Sequential consistency
- Atomicity
- Single system image
- Reliability
- Timeliness
- Hierarchical namespace
- Like distributed file system
- Stores information like status information, coordination information, location information on different nodes
Interact with zookeeper using CLI
bin/zkCli.sh -server 127.0.0.1:2181
create /MyFirstNode ZNodeVal FirstNodeVal /FirstZnode
We just created a ZNode ‘MyFirstNode’ at the root of namespace and written ‘FirstNodeVal’ as value.
** It will be presistent node, as we didn't pass any value **
get /FirstZNode
Will return the data as well as metadata associated with ZNode