Difference between revisions of "Distributed Tile Caching Model"
m (→Discovering other peers: clarification) |
|||
Line 112: | Line 112: | ||
* Peers can and perhaps should be whitelisted to insure data integrity over the network. Peers not on the whitelist should not be added to the directory. | * Peers can and perhaps should be whitelisted to insure data integrity over the network. Peers not on the whitelist should not be added to the directory. | ||
* The directory listing should be about 60 bytes per peer. With 10,000 peers, and assuming a 6:1 gzip compression ratio, a fresh listing should be at most 100k compressed. | * The directory listing should be about 60 bytes per peer. With 10,000 peers, and assuming a 6:1 gzip compression ratio, a fresh listing should be at most 100k compressed. | ||
+ | * If/when the directory listing needs to contain more than ~10,000 peers, a Kademlia-like mechanism for thinning keys farther away from a given peer can be used to keep the list that any one peer sees down to a reasonable number. | ||
* The directory service can seek high availability through a shared database backend and round-robin DNS. | * The directory service can seek high availability through a shared database backend and round-robin DNS. | ||
Revision as of 13:45, 13 November 2006
This design outlines a model for a distributed tile cache. The primary design goals are minimizing response latency for tile requests, and maintaining redundant storage of tiles across the cache.
Definitions
A key is a 20 byte SHA-1 sum.
Each peer has its own persistent peer key, generated randomly.
The directory of peers consists of a list of (key, IP, port, weight) tuples, where weight is the bandwidth that the peer is willing to serve in KB/s, expressed as an integer.
The directory server will serve the directory via HTTP from a well-known URL in gzip compressed whitespace-separated text.
Peers
Discovering other peers
- Request the directory listing from the server, passing the peer's directory tuple.
- Normalize the weights of each peer to a float in the range [0, 1).
- Create an empty balanced binary tree.
- For every other peer listed:
- Set the peer's timeout value to v.
- Set the peer's message sequence ID to 0.
- Add a reference to the peer to the tree using its key.
- Calculate r = normalized weight x 64.
- For i in range(1, r):
- Calculate a subsidiary key by concatenating the peer key with the binary value of i, and take the SHA-1 sum of the result.
- Insert a reference to the peer into the tree using the subsidiary key.
Maintaining the local directory
- After d minutes have passed, request a new directory listing with an If-Modified-Since header.
- If the server responds with a 304, wait another d minutes and check again.
- Otherwise:
- For every peer in the binary tree not in the new listing, remove it.
- For every peer in the directory listing not in the binary tree, add it.
Selecting peers for a given tile
- Concatenate layer + level + row + column and take the SHA-1 sum. This is the tile key.
- Until k distinct peers are selected:
- If there are fewer than k peers in the directory, select all remaining known peers and break.
- Select the first peer from the binary tree with key greater than or equal to the tile key.
- If there are no matching peers in the tree, select the first peer in tree.
- Decrement the timeout value for each selected peer.
- If a peer's timeout value drops to zero, remove that peer from the binary tree.
- Set the tile key to the key of the peer just selected.
Issuing requests
Seeding a tile
- Fetch the tile from the data source (or render the tile or whatever).
- Select k peers for the tile.
- Send a PUT message to each peer asynchronously.
Fetching a tile
- Select k peers for the tile.
- Send a GET message for the given tile to each of the selected peers asynchronously.
- If no peer responds with a PUT within t seconds, seed the tile in the network.
Expiring a tile
- A DELETE request can cover a rectangular range of tiles at a given level for a given layer.
- Start by selecting k peers for the lower left tile in the expiration request.
- Send a DELETE message for the given tiles to each of the selected peers asynchronously.
Pinging other peers
If the peer is behind a NAT gateway, it can use PING messages to keep the UDP port open on the firewall:
- Every 60 seconds, select a random peer from the binary tree.
- Decrement the timeout value for each selected peer.
- If a peer's timeout value drops to zero, remove that peer from the binary tree, and start over.
- Otherwise, send that peer a PING message.
Receiving requests
Responding to a GET request
- If the tile is present in the local cache, send a PUT message in response.
- Otherwise, send a PONG message in response.
Receiving a PUT request
- Store the tile in the local cache.
- Reset the sending peer's timeout value to v.
Receiving a DELETE request
A peer should keep track of the last 2k DELETE messages received.
- If this DELETE message is a duplicate, discard it.
- Otherwise:
- Remove the tile(s) from the local cache.
- Select k peers for the lower left tile of the DELETE request.
- If the tile key is greater than the peer's key but less than the first peer selected, stop.
- Otherwise, propagate the DELETE request to the k peers asynchronously.
Receiving PINGs
Every PING message should be responded to with a matching PONG message.
Receiving PONGs
When a PONG is received from a peer, reset its timeout counter to v.
Directory service
- When a peer requests the directory listing, store its directory tuple in the database, along with the time it made the request.
- Every d x 2 minutes, remove any peers from the database that have not made a directory request since the last check.
- Peers can and perhaps should be whitelisted to insure data integrity over the network. Peers not on the whitelist should not be added to the directory.
- The directory listing should be about 60 bytes per peer. With 10,000 peers, and assuming a 6:1 gzip compression ratio, a fresh listing should be at most 100k compressed.
- If/when the directory listing needs to contain more than ~10,000 peers, a Kademlia-like mechanism for thinning keys farther away from a given peer can be used to keep the list that any one peer sees down to a reasonable number.
- The directory service can seek high availability through a shared database backend and round-robin DNS.
Plausible parameter values
- d = 5
- v = 5
- t = 1
- k = 3
Protocol format
Protocol messages will be served via UDP.
Each message is a tuple consisting of (Peer Key, Type, Sequence, Checksum, Payload), for a total of 29 + n bytes.
Message type may be one of:
- PING
- PONG
- GET
- PUT
- DELETE
Message sequence must be a monotonically increasing 32-bit number.
Message checksum is a CRC-32 checksum of the data payload
The message payload takes up the remainder of the message.
Message sequence
Before sending a message, a peer should increment its own internal message sequence ID.
Message integrity
To ensure message integrity, a peer should make the following checks when a message is received:
- Compare the peer key embedded in the message with the sending IP's recorded peer key. If they do not match, discard the message.
- Compare the checksum embedded in the message with the checksum of the payload. If they do not match, discard the message.
- Check the message sequence against the sending peer's most recent sequence ID.
- If the message sequence is less than or equal to the previously set sequence ID for that peer, discard the message.
- Otherwise, update the peer's sequence ID.
PING messages
PING messages have a checksum of 0 and no payload.
PONG messages
A PONG message payload consists of the sequence number from the corresponding PING packet.
GET messages
A GET message payload consists of the tuple (Layer, Level, Row, Column). The layer value is a zero-terminated string. The row and column values are 32-bit integers.
PUT messages
A PUT message payload consists of the tuple (Layer, Level, Row, Column, Data).
DELETE messages
A DELETE message payload consists of the tuple (Layer, Level, MinRow, MinCol, MaxRow, MaxCol).
References
- This model is based largely on the Kademlia algorithm discussed at length in Distributed Tile Caching, with the addition of a directory server to keep latency down.
- Web Caching with Consistent Hashing is a seminal work in distributed caching.
- A pure-Perl implementation of the algorithm described in the above paper.