First, it should be clear that lookup is not inherently an easy task. Or at least if we assume that we want resiliency to failures, i.e. no central database, or even a collection of root databases under some hierarchical scheme (like DNS). Given that, a straw man approach would be something like broadcasting the message out ... but that clearly sucks. We would prefer a solution that bothers the least number of nodes possible. The authors of [CHORD] present an O(log N) solution! They do have an assumption, of course, which is that the individual nodes are willing to store information about content that they might not themselves actually have (i.e. they store a pointer(s) to who does actually have the content, but they are the ones responsible for resolving any lookups for that data).
Some key parts of Chord:
- Essentially a distributed hash table.
- Uses consistent hashing, which has the nice property that each node in the system sees approximately an equal number of keys, and when a node joins or leaves the system only O(1/N) keys need to be moved to a new node.
- Avoids the requirement that every node knows about every other node. Each node only needs to store information about O(log N) nodes; this is especially helpful in routing and keeps lookups at O(log N).
- Was implemented for correctness, i.e. handles potentially conflicting concurrent operations (joining or leaving, failure, etc) well.
The authors show a few simulations of Chord, as well as one real Internet prototype, in which it scales well as the number of nodes increases. I'm curious, however, how well this performs in smaller settings where latencies are actually much lower (like within a single data center). Lots of applications are being written in distributed styles where they have lookup needs that they would like to perform efficiently. I'm curious if the design decisions that went into Chord in any way limit it from performing as optimally as it could in a smaller data center like environment ... or if there are instead much better mechanisms for doing lookups there.
3 comments:
I would expect it to work better in constrained environments. The bad case seems to be when even O(log n) is huge, due to whatever latency constant exists.
I'm with Kurtis on this. The issue for me is not the O(log n) traversal of the Chord, but the constants associated with how you hop around the circle, which could be large if network distances are involved.
Huh ... good point! I'm glad you guys brought that up!
Post a Comment