Difference between revisions of "Distributed Tile Caching Model"
m |
(reformulation) |
||
Line 1: | Line 1: | ||
− | This design outlines a model for a [[Distributed Tile Caching|distributed tile cache]]. | + | This design outlines a model for a [[Distributed Tile Caching|distributed tile cache]]. The primary design goals are minimizing response latency for tile requests, and maintaining redundant storage of tiles across the cache. |
= Definitions = | = Definitions = | ||
Line 7: | Line 7: | ||
Each ''peer'' has its own persistent ''peer key'', generated randomly. | Each ''peer'' has its own persistent ''peer key'', generated randomly. | ||
− | The ''directory'' of peers consists of a list of (key, IP, port, weight), where ''weight'' is the bandwidth that the peer is willing to serve in KB/s. | + | The ''directory'' of peers consists of a list of (key, IP, port, weight) tuples, where ''weight'' is the bandwidth that the peer is willing to serve in KB/s, expressed as an integer. |
The ''directory server'' will serve the directory via HTTP from a well-known URL in gzip compressed whitespace-separated text. | The ''directory server'' will serve the directory via HTTP from a well-known URL in gzip compressed whitespace-separated text. | ||
Line 43: | Line 43: | ||
#* Set the tile key to the key of the peer just selected. | #* Set the tile key to the key of the peer just selected. | ||
− | == Seeding a tile == | + | == Issuing requests == |
+ | |||
+ | === Seeding a tile === | ||
# Fetch the tile from the data source (or render the tile or whatever). | # Fetch the tile from the data source (or render the tile or whatever). | ||
Line 50: | Line 52: | ||
# Send a PUT message to each peer asynchronously. | # Send a PUT message to each peer asynchronously. | ||
− | == Fetching a tile == | + | === Fetching a tile === |
# Select ''k'' peers for the tile. | # Select ''k'' peers for the tile. | ||
Line 58: | Line 60: | ||
# If no peer responds within ''t'' seconds, seed the tile in the network. | # If no peer responds within ''t'' seconds, seed the tile in the network. | ||
− | == Responding to a GET request == | + | === Expiring a tile === |
+ | |||
+ | * A DELETE request can cover a rectangular range of tiles at a given level for a given layer. | ||
+ | * Start by selecting ''k'' peers for the lower left tile in the expiration request. | ||
+ | * Send a DELETE message for the given tile to each of the selected peers asynchronously. | ||
+ | |||
+ | === Pinging other peers === | ||
+ | |||
+ | If the peer is behind a NAT gateway, it can use PING messages to keep the UDP port open on the firewall: | ||
+ | |||
+ | # Every 60 seconds, select a random peer from the binary tree and send it a PING message. | ||
+ | # If the peer responds with a PONG message within ''t'' seconds, reset its timeout value to ''v''. | ||
+ | # Otherwise, remove the peer from the binary tree. | ||
+ | |||
+ | == Receiving requests == | ||
+ | |||
+ | === Responding to a GET request === | ||
# If the tile is present in the local cache, send a PUT message in response. | # If the tile is present in the local cache, send a PUT message in response. | ||
Line 65: | Line 83: | ||
# Otherwise, attempt to fetch the tile from the network. | # Otherwise, attempt to fetch the tile from the network. | ||
− | == Receiving a PUT request == | + | === Receiving a PUT request === |
# Store the tile in the local cache. | # Store the tile in the local cache. | ||
# Reset the sending peer's timeout value to ''v''. | # Reset the sending peer's timeout value to ''v''. | ||
− | == Receiving a DELETE request == | + | === Receiving a DELETE request === |
− | + | A peer should keep track of the last 2''k'' DELETE messages received. | |
− | |||
− | |||
− | |||
− | + | * If this DELETE message is a duplicate, discard it. | |
+ | * Otherwise: | ||
+ | *# Remove the tile(s) from the local cache. | ||
+ | *# Select ''k'' peers for the lower left tile of the DELETE request. | ||
+ | *# If the tile key is greater than the peer's key but less than the first peer selected, stop. | ||
+ | *# Otherwise, propagate the DELETE request to the ''k'' peers asynchronously. | ||
− | + | === Receiving PINGs === | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | == Receiving PINGs == | ||
Every PING message should be responded to with a matching PONG message. | Every PING message should be responded to with a matching PONG message. | ||
Line 91: | Line 105: | ||
= Directory service = | = Directory service = | ||
− | + | * When a peer requests the directory listing, store its directory tuple in the database, along with the time it made the request. | |
− | + | * Every ''d'' x 2 minutes, remove any peers from the database that have not made a directory request since the last check. | |
− | + | * Peers can and perhaps should be whitelisted to insure data integrity over the network. Peers not on the whitelist should not be added to the directory. | |
− | + | * The directory listing should be about 60 bytes per peer. With 10,000 peers, and assuming a 6:1 gzip compression ratio, a fresh listing should be at most 100k compressed. | |
− | + | * The directory service can seek high availability through a shared database backend and round-robin DNS. | |
− | = Plausible | + | = Plausible parameter values = |
* ''d'' = 5 | * ''d'' = 5 | ||
Line 124: | Line 138: | ||
The message ''payload'' takes up the remainder of the message. | The message ''payload'' takes up the remainder of the message. | ||
− | == Message | + | == Message integrity == |
− | + | * Compare the peer key embedded in the message with the sending IP's recorded peer key. If they do not match, discard the message. | |
+ | * Compare the checksum embeeded in the message with the checksum of the payload. If they do not match, discard the message. | ||
+ | * Check the message sequence against the sending peer's most recent sequence ID. | ||
+ | ** If the message sequence is less than or equal to the previously set sequence ID for that peer, discard the message. | ||
+ | ** Otherwise, update the peer's sequence ID. | ||
== PING messages == | == PING messages == | ||
Line 147: | Line 165: | ||
A DELETE message payload consists of the tuple (Layer, Level, MinRow, MinCol, MaxRow, MaxCol). | A DELETE message payload consists of the tuple (Layer, Level, MinRow, MinCol, MaxRow, MaxCol). | ||
+ | |||
+ | = References = | ||
+ | |||
+ | * This model is based largely on the Kademlia algorithm discussed at length in [Distributed Tile Caching], with the addition of a directory server to keep latency down. | ||
+ | * [http://www8.org/w8-papers/2a-webserver/caching/paper2.html Web Caching with Consistent Hashing] is a seminal work in distributed caching. | ||
+ | * [http://code.sixapart.com/svn/memcached/trunk/api/perl/dev/cons-hash.pl A pure-Perl implementation] of the algorithm described in the above paper. |
Revision as of 00:50, 12 November 2006
This design outlines a model for a distributed tile cache. The primary design goals are minimizing response latency for tile requests, and maintaining redundant storage of tiles across the cache.
Definitions
A key is a 20 byte SHA-1 sum.
Each peer has its own persistent peer key, generated randomly.
The directory of peers consists of a list of (key, IP, port, weight) tuples, where weight is the bandwidth that the peer is willing to serve in KB/s, expressed as an integer.
The directory server will serve the directory via HTTP from a well-known URL in gzip compressed whitespace-separated text.
Peers
Discovering other peers
- Request the directory listing from the server, passing the peer's directory tuple.
- Normalize the weights of each peer.
- Create an empty balanced binary tree.
- For every other peer listed:
- Set the peer's timeout value to v.
- Add the peer to the tree using its key.
- Calculate r = normalized weight x 64.
- For i in range(1, r):
- Calculate a subsidiary key by concatenating the peer key with the binary value of i, and take the SHA-1 sum of the result.
- Insert the peer into the tree using the subsidiary key.
Maintaining the local directory
- After d minutes have passed, request a new directory listing with an If-Modified-Since header.
- If the server responds with a 304, wait another d minutes and check again.
- Otherwise:
- For every peer in the binary tree not in the new listing, remove it.
- For every peer in the directory listing not in the binary tree, add it.
Selecting peers for a given tile
- Concatenate layer + level + row + column and take the SHA-1 sum. This is the tile key.
- If there are fewer than k peers in the directory, select all other known peers.
- Otherwise, until k distinct peers are selected:
- Select the first peer from the binary tree with key greater than or equal to the tile key.
- If there are no matching peers in the tree, select the first peer in tree.
- Set the tile key to the key of the peer just selected.
Issuing requests
Seeding a tile
- Fetch the tile from the data source (or render the tile or whatever).
- Select k peers for the tile.
- If the storing peer is selected, discard it.
- Send a PUT message to each peer asynchronously.
Fetching a tile
- Select k peers for the tile.
- Send a GET message for the given tile to each of the selected peers asynchronously.
- Decrement the timeout value for each selected peer.
- If a peer's timeout value drops to zero, remove that peer from the binary tree.
- If no peer responds within t seconds, seed the tile in the network.
Expiring a tile
- A DELETE request can cover a rectangular range of tiles at a given level for a given layer.
- Start by selecting k peers for the lower left tile in the expiration request.
- Send a DELETE message for the given tile to each of the selected peers asynchronously.
Pinging other peers
If the peer is behind a NAT gateway, it can use PING messages to keep the UDP port open on the firewall:
- Every 60 seconds, select a random peer from the binary tree and send it a PING message.
- If the peer responds with a PONG message within t seconds, reset its timeout value to v.
- Otherwise, remove the peer from the binary tree.
Receiving requests
Responding to a GET request
- If the tile is present in the local cache, send a PUT message in response.
- Select k peers for the tile.
- If the peer's own key is between the original tile key and the first peer selected, seed the tile in the network.
- Otherwise, attempt to fetch the tile from the network.
Receiving a PUT request
- Store the tile in the local cache.
- Reset the sending peer's timeout value to v.
Receiving a DELETE request
A peer should keep track of the last 2k DELETE messages received.
- If this DELETE message is a duplicate, discard it.
- Otherwise:
- Remove the tile(s) from the local cache.
- Select k peers for the lower left tile of the DELETE request.
- If the tile key is greater than the peer's key but less than the first peer selected, stop.
- Otherwise, propagate the DELETE request to the k peers asynchronously.
Receiving PINGs
Every PING message should be responded to with a matching PONG message.
Directory service
- When a peer requests the directory listing, store its directory tuple in the database, along with the time it made the request.
- Every d x 2 minutes, remove any peers from the database that have not made a directory request since the last check.
- Peers can and perhaps should be whitelisted to insure data integrity over the network. Peers not on the whitelist should not be added to the directory.
- The directory listing should be about 60 bytes per peer. With 10,000 peers, and assuming a 6:1 gzip compression ratio, a fresh listing should be at most 100k compressed.
- The directory service can seek high availability through a shared database backend and round-robin DNS.
Plausible parameter values
- d = 5
- v = 5
- t = 1
- k = 3
Protocol format
Protocol messages will be served via UDP.
Each message is a tuple consisting of (Peer Key, Type, Sequence, Checksum, Payload), for a total of 29 + n bytes.
Message type may be one of:
- PING
- PONG
- GET
- PUT
- DELETE
Message sequence must be a monotonically increasing 32-bit number.
Message checksum is a CRC-32 checksum of the data payload
The message payload takes up the remainder of the message.
Message integrity
- Compare the peer key embedded in the message with the sending IP's recorded peer key. If they do not match, discard the message.
- Compare the checksum embeeded in the message with the checksum of the payload. If they do not match, discard the message.
- Check the message sequence against the sending peer's most recent sequence ID.
- If the message sequence is less than or equal to the previously set sequence ID for that peer, discard the message.
- Otherwise, update the peer's sequence ID.
PING messages
PING messages have a checksum of 0 and no payload.
PONG messages
A PONG message payload consists of the sequence number from the corresponding PING packet.
GET messages
A GET message payload consists of the tuple (Layer, Level, Row, Column). The layer value is a zero-terminated string. The row and column values are 32-bit integers.
PUT messages
A PUT message payload consists of the tuple (Layer, Level, Row, Column, Data).
DELETE messages
A DELETE message payload consists of the tuple (Layer, Level, MinRow, MinCol, MaxRow, MaxCol).
References
- This model is based largely on the Kademlia algorithm discussed at length in [Distributed Tile Caching], with the addition of a directory server to keep latency down.
- Web Caching with Consistent Hashing is a seminal work in distributed caching.
- A pure-Perl implementation of the algorithm described in the above paper.