Summary of "Chord - A Distributed Hash Table"

Concise summary — Chord (distributed hash table)

What Chord is

Chord is a peer-to-peer lookup protocol that implements a distributed hash table (DHT) using consistent hashing. Its purpose is to map a key to a physical (or virtual) node; it does not prescribe how the actual data for a key→value pair is stored — that is left to the client or node storing the data.

Design goals:

Decentralized
Scalable
Fault-tolerant
Simple to implement
Predictable lookup cost

Core technical ideas

Consistent hashing Keys and nodes are hashed into a circular identifier space (a ring). A key is stored at the first node whose identifier follows the key (the key’s successor). This approach minimizes the movement of keys when nodes join or leave.
Node and key identifiers The paper uses SHA-1 as the example hash. Nodes typically hash their IP (or other identifier) to get node IDs; keys are hashed from their original names.
Finger table Each node maintains a finger table with O(log N) entries. The i-th entry at node n points to the first node ≥ n + 2^{i-1}. Finger tables allow lookups to jump in exponentially increasing steps, achieving O(log N) hops per lookup on average.
Successor list and predecessor pointer Nodes keep their immediate successor (and a list of successors) and a predecessor pointer. These structures support joins, leaves, and failure tolerance.
Stabilization protocol A periodic background protocol updates finger tables, successor/predecessor pointers, and notifies neighbors when nodes join so the ring remains consistent.
Join / leave / failure handling
- When a node joins, it takes responsibility for a subset of keys from its successor.
- When a node leaves gracefully, it transfers keys to neighbors.
- Successor lists and the stabilization protocol handle crashes, keeping disruption localized rather than global.

Performance / evaluation highlights

Lookups: O(log N) hops on average; experiments show path length grows logarithmically with node count.
Robustness: lookup hop-count degrades slowly under node failures (example in the paper: from ~5 hops to ~8 hops when ~50% of nodes fail in that setup).
Load balancing: consistent hashing combined with a good hash function spreads keys evenly across nodes.

Comparisons to related systems

DNS: hierarchical namespace with special root servers. Chord is flat and decentralized; it can emulate DNS lookups but with different trade-offs.
Freenet: tree-like lookup with aggressive caching and no strong lower bound on retrieval cost.
Tapestry: similar ring-based DHT but with a more complex protocol for joins and failures.
Napster: centralized index (single point of failure).
Gnutella: gossip-style flooding with high communication costs.

Chord’s strengths relative to these systems:

Predictable lookup bounds
Low per-node state (O(log N) fingers)
No central bottleneck
Localized effect of churn

Notable derived systems / applications mentioned

Cassandra: adopts DHT / ring concepts (also inspired by Dynamo and BigTable).
Coral: a P2P content-distribution system using a concentric ring variant of DHTs.
WebRTC signaling: using a DHT for peer discovery/signaling to reduce reliance on a central signaling server and improve scalability/reliability (peers often still use a server for initial bootstrap).

Topics covered in the presentation / tutorial

Overview of Chord and consistent hashing
Related work and contrasts (DNS, Freenet, Tapestry, Napster, Gnutella)
Chord architecture: ring, identifiers, finger tables, successor/predecessor, successor lists
Protocols: lookup, join, leave, stabilization, failure recovery
Experimental results: lookup scaling and resilience to failures
Systems built on or inspired by Chord (Cassandra, Coral, WebRTC use cases)

Main speakers / sources

Video presenter (unnamed in subtitles) — walkthrough of the Chord paper and its concepts.
Primary source: the Chord paper — “Chord: A Scalable Peer-to-peer Lookup Service” (Stoica et al.).
Other referenced systems/sources: DNS, Freenet, Tapestry, Napster, Gnutella, Amazon Dynamo, Google Bigtable, Cassandra, Coral, WebRTC.