Difference between revisions of "Distributed Tile Caching Model"
m (→References) |
|||
Line 168: | Line 168: | ||
= References = | = References = | ||
− | * This model is based largely on the Kademlia algorithm discussed at length in [Distributed Tile Caching], with the addition of a directory server to keep latency down. | + | * This model is based largely on the Kademlia algorithm discussed at length in [[Distributed Tile Caching]], with the addition of a directory server to keep latency down. |
* [http://www8.org/w8-papers/2a-webserver/caching/paper2.html Web Caching with Consistent Hashing] is a seminal work in distributed caching. | * [http://www8.org/w8-papers/2a-webserver/caching/paper2.html Web Caching with Consistent Hashing] is a seminal work in distributed caching. | ||
* [http://code.sixapart.com/svn/memcached/trunk/api/perl/dev/cons-hash.pl A pure-Perl implementation] of the algorithm described in the above paper. | * [http://code.sixapart.com/svn/memcached/trunk/api/perl/dev/cons-hash.pl A pure-Perl implementation] of the algorithm described in the above paper. |
Revision as of 00:54, 12 November 2006
This design outlines a model for a distributed tile cache. The primary design goals are minimizing response latency for tile requests, and maintaining redundant storage of tiles across the cache.
Definitions
A key is a 20 byte SHA-1 sum.
Each peer has its own persistent peer key, generated randomly.
The directory of peers consists of a list of (key, IP, port, weight) tuples, where weight is the bandwidth that the peer is willing to serve in KB/s, expressed as an integer.
The directory server will serve the directory via HTTP from a well-known URL in gzip compressed whitespace-separated text.
Peers
Discovering other peers
- Request the directory listing from the server, passing the peer's directory tuple.
- Normalize the weights of each peer.
- Create an empty balanced binary tree.
- For every other peer listed:
- Set the peer's timeout value to v.
- Add the peer to the tree using its key.
- Calculate r = normalized weight x 64.
- For i in range(1, r):
- Calculate a subsidiary key by concatenating the peer key with the binary value of i, and take the SHA-1 sum of the result.
- Insert the peer into the tree using the subsidiary key.
Maintaining the local directory
- After d minutes have passed, request a new directory listing with an If-Modified-Since header.
- If the server responds with a 304, wait another d minutes and check again.
- Otherwise:
- For every peer in the binary tree not in the new listing, remove it.
- For every peer in the directory listing not in the binary tree, add it.
Selecting peers for a given tile
- Concatenate layer + level + row + column and take the SHA-1 sum. This is the tile key.
- If there are fewer than k peers in the directory, select all other known peers.
- Otherwise, until k distinct peers are selected:
- Select the first peer from the binary tree with key greater than or equal to the tile key.
- If there are no matching peers in the tree, select the first peer in tree.
- Set the tile key to the key of the peer just selected.
Issuing requests
Seeding a tile
- Fetch the tile from the data source (or render the tile or whatever).
- Select k peers for the tile.
- If the storing peer is selected, discard it.
- Send a PUT message to each peer asynchronously.
Fetching a tile
- Select k peers for the tile.
- Send a GET message for the given tile to each of the selected peers asynchronously.
- Decrement the timeout value for each selected peer.
- If a peer's timeout value drops to zero, remove that peer from the binary tree.
- If no peer responds within t seconds, seed the tile in the network.
Expiring a tile
- A DELETE request can cover a rectangular range of tiles at a given level for a given layer.
- Start by selecting k peers for the lower left tile in the expiration request.
- Send a DELETE message for the given tile to each of the selected peers asynchronously.
Pinging other peers
If the peer is behind a NAT gateway, it can use PING messages to keep the UDP port open on the firewall:
- Every 60 seconds, select a random peer from the binary tree and send it a PING message.
- If the peer responds with a PONG message within t seconds, reset its timeout value to v.
- Otherwise, remove the peer from the binary tree.
Receiving requests
Responding to a GET request
- If the tile is present in the local cache, send a PUT message in response.
- Select k peers for the tile.
- If the peer's own key is greater than or equal to the original tile key and less than the first peer selected, seed the tile in the network.
- Otherwise, attempt to fetch the tile from the network. If and when the tile is received, send a PUT message to the originating peer.
Receiving a PUT request
- Store the tile in the local cache.
- Reset the sending peer's timeout value to v.
Receiving a DELETE request
A peer should keep track of the last 2k DELETE messages received.
- If this DELETE message is a duplicate, discard it.
- Otherwise:
- Remove the tile(s) from the local cache.
- Select k peers for the lower left tile of the DELETE request.
- If the tile key is greater than the peer's key but less than the first peer selected, stop.
- Otherwise, propagate the DELETE request to the k peers asynchronously.
Receiving PINGs
Every PING message should be responded to with a matching PONG message.
Directory service
- When a peer requests the directory listing, store its directory tuple in the database, along with the time it made the request.
- Every d x 2 minutes, remove any peers from the database that have not made a directory request since the last check.
- Peers can and perhaps should be whitelisted to insure data integrity over the network. Peers not on the whitelist should not be added to the directory.
- The directory listing should be about 60 bytes per peer. With 10,000 peers, and assuming a 6:1 gzip compression ratio, a fresh listing should be at most 100k compressed.
- The directory service can seek high availability through a shared database backend and round-robin DNS.
Plausible parameter values
- d = 5
- v = 5
- t = 1
- k = 3
Protocol format
Protocol messages will be served via UDP.
Each message is a tuple consisting of (Peer Key, Type, Sequence, Checksum, Payload), for a total of 29 + n bytes.
Message type may be one of:
- PING
- PONG
- GET
- PUT
- DELETE
Message sequence must be a monotonically increasing 32-bit number.
Message checksum is a CRC-32 checksum of the data payload
The message payload takes up the remainder of the message.
Message integrity
- Compare the peer key embedded in the message with the sending IP's recorded peer key. If they do not match, discard the message.
- Compare the checksum embeeded in the message with the checksum of the payload. If they do not match, discard the message.
- Check the message sequence against the sending peer's most recent sequence ID.
- If the message sequence is less than or equal to the previously set sequence ID for that peer, discard the message.
- Otherwise, update the peer's sequence ID.
PING messages
PING messages have a checksum of 0 and no payload.
PONG messages
A PONG message payload consists of the sequence number from the corresponding PING packet.
GET messages
A GET message payload consists of the tuple (Layer, Level, Row, Column). The layer value is a zero-terminated string. The row and column values are 32-bit integers.
PUT messages
A PUT message payload consists of the tuple (Layer, Level, Row, Column, Data).
DELETE messages
A DELETE message payload consists of the tuple (Layer, Level, MinRow, MinCol, MaxRow, MaxCol).
References
- This model is based largely on the Kademlia algorithm discussed at length in Distributed Tile Caching, with the addition of a directory server to keep latency down.
- Web Caching with Consistent Hashing is a seminal work in distributed caching.
- A pure-Perl implementation of the algorithm described in the above paper.