Caching, Revalidation, Concurrency Control
Scalability
- Need for scalability
- Huge amount of requests on the Web every day
- Huge amount of data downloaded
- Some examples
- Google, Facebook: 5 billion API calls/day
- Twitter: 3 billions of API calls/day (75% of all the traffic)
→ 50 million tweets a day
- eBay: 8 billion API calls/month
- Bing: 3 billion API calls/month
- Amazon WS: over 100 billion objects stored in S3
- Scalability in REST
- Caching and revalidation
- Concurrency control
Caching
- Your service should cache:
- anytime there is a static resource
- even there is a dynamic resource
- with chances it updates often
- you can force clients to always revalidate
- three steps:
- client GETs the resource representation
- server controls how it should cache through
Cache-Control
header
- client revalidates the content via conditional GET
Cache Headers
Cache-Control
response header
- controls over local and proxy caches
private
– no proxy should cache, only clients can
public
– any intermediary can cache (proxies and clients)
no-cache
– the response should not be cached.
If it is cached, the content should always be revalidated.
no-store
– must not store persistently (this turns off caching)
no-transform
– no transformation of cached data; e.g. compressions
max-age
, s-maxage
a time in seconds how long the cache is valid;
s-maxage
for proxies
Last-Modified
and ETag
response headers
- Content last modified date and a content entity tag
If-Modified-Since
and If-None-Match
request headers
- Content revalidation (conditional GET)
Example Date Revalidation
> GET /orders HTTP/1.1
> ...
< HTTP/1.1 200 OK
< Content-Type: application/xml
< Cache-Control: private, max-age=200
< Last-Modified: Sun, 7 Nov 2011, 09:40 CET
<
< ...data...
only client can cache, the cache is valid for 200 seconds.
Revalidation (conditional GET) example:
- A client revalidates the cache after
200
seconds.
> GET /orders HTTP/1.1
> If-Modified-Since: Sun, 7 Nov 2011, 09:40 CET
< HTTP/1.1 304 Not Modified
< Cache-Control: private, max-age=200
< Last-Modified: Sun, 7 Nov 2011, 09:40 CET
Entity Tags
- Signature of the response body
- A hash such as MD5
- A sequence number that changes with any modification of the content
- Types of tag
- Strong ETag: reflects the content bit by bit
- Weak ETag: reflects the content "semantically"
- The app defines the meaning of its weak tags
- Example content revalidation with
ETag
< HTTP/1.1 200 OK
< Cache-Control: private, max-age=200
< Last-Modified: Sun, 7 Nov 2011, 09:40 CET
< ETag: "4354a5f6423b43a54d"
> GET /orders HTTP/1.1
> If-None-Match: "4354a5f6423b43a54d"
< HTTP/1.1 304 Not Modified
< Cache-Control: private, max-age=200
< Last-Modified: Sun, 7 Nov 2011, 09:40 CET
< ETag: "4354a5f6423b43a54d"
Design Suggestions
- Composed resources use weak ETags
- For example
/orders
- a composed resource that contains a summary information
- changes to an order's items will not change semantics of
/orders
- It is usually not possible to perform updates on these resources
- Non-composed resources use strong ETags
- For example
/orders/{order-id}
- They can be updated
- Further notes
- Server should send both
Last-Modified
and
ETag
headers
- If client sends both
If-Modified-Since
and If-None-Match
,
ETag
validation takes preference
Weak ETag Example
- App specific,
/orders
resource example
{
"orders" :
[
{ "id" : 2245,
"customer" : "Tomas",
"descr" : "Stuff to build a house.",
"items" : [...] },
{ "id" : 5546,
"customer" : "Peter",
"descr" : "Things to build a pipeline.",
"items" : [...] }
]
}
Weak ETag compute function example
- Any modification to an order's items is not significant for
/orders
:
var crypto = require("crypto");
function computeWeakETag(orders) {
var content = "";
for (var i = 0; i < orders.length; i++)
content += orders[i].id + orders[i].customer + orders[i].descr;
return crypto.createHash('md5').update(content).digest("hex");
}
Weak ETag Revalidation
- Updating
/orders
resource
POST /orders/{order-id}
inserts a new item to an order
- Any changes to orders' items will not change the Weak ETag
Concurrency
- Two clients may update the same resource
- 1) a client GETs a resource
GET /orders/5545
- 2) the client modifies the resource
- 3) the client updates the resource via
PUT /orders/5545 HTTP/1.1
- What happens if another client updates the
resource between 1) and 3) ?
- Concurrency control
- Conditional
PUT
- Update the resource only if it has not changed since a specified date or
a specified ETag matches the resource content
If-Unmodified-Since
and If-Match
headers
- Response to conditional
PUT
:
200 OK
if the PUT
was successful
412 Precondition Failed
if the resource was updated in the meantime.
Concurrency Control Protocol
- Conditional PUT and ETags
- Conditional PUT must always use strong entity tags or date validation