Wednesday, February 23, 2011

My take on “Optimizing HTTP Caching” for your web application.

One quick way to a snappy and fast loading web page is to add appropriate HTTP headers to your response to ensure that resources are cached, by a browser or proxy. Browsers or proxies can refer to the locally cached copy instead of having to download it again on subsequent visits to the web page.

Let’s quickly dive into the various headers you’ll need to set with the optimal values. The values will defer from app to app depending on how long you may want the browsers or proxies to cache the content.

Expires  or max-age ? What do you use ?

  • Expires
    • The Expires HTTP header was the basic means of controlling caches back in the good old days of HTTP 1.0. It tells all caches how long the data in its local storage is fresh for. After that time, caches will always check back with the origin server to see if a document is changed. Expires headers are supported by practically every cache.
    • Expires header was very useful ,but it had some limitations .One being ,since there’s a date involved, the clocks on the Web server and the cache must be synchronized.
    • Example : "Expires: Sat, 01 Jan 2000 00:00:00 GMT"
  • max-age=[seconds]
    • Introduced in the HTTP 1.1 spec, this is a replacement for Expires but the advantage here is that this directive is relative to the time of the request, rather than an absolute date.
Expires or Cache-Control max-age when set ensures that no request is ever sent out from the browser and the file will be picked up from the local cache (provided other Cache control headers like no-cache ,no-store, must-revalidate, proxy-revalidate are not set) .

What if we cannot predict the lifetime of the page content and want the browser to check with the server on a regular basis if the cached copy is still FRESH. ??

Don’t worry we can do this to with headers like Last-Modified and Etag.
  • Last-Modified
    • This indicates the date and time at which the origin server believes the file was last modified
    • This is a "weak" caching header on which the browser applies a heuristic to determine whether to fetch the item from cache or not. (The heuristics are different among different browsers.)
  • ETag
    • This can be any value that uniquely identifies a resource (file versions or content hashes are typical)
So why do we need to set this?
  • These headers allow the browser to efficiently update its cached resources by issuing conditional GET requests when the user explicitly reloads the page.
  • If the resource has not changed the browser will NOT reply back with the entire file content in the response but just with a header  “304 Not Modified“ indicating to the browser that it can serve the file from its cache since its still fresh.
  • If you rely on ETags ,you may face a problem if you have multiple servers behind a load balancer (configured in a non-sticky environment) since the ETag generation may result in a different key.

What are some other Cache control headers that I can set ??
  • public or private
    • Setting the Cache-Control header to private disables proxy caching altogether for these resources. 
    • If your application relies less on proxy caches for user locality, this might be an appropriate setting.
  • must-revalidate
    • This directive specifies that the cache MUST revalidate with the server that the content is still fresh before serving the file from local cache.

How can I speed up my web application ??
  • Specify one of the two headers Expires or Cache-Control max-age for all cacheable resources. 
  • Specify one of two headers Last-Modified or ETag for all cacheable resources.
  • Set values appropriate to your web application
  • Use a tool like YSlow or Google’s Page Speed to check the end result.

Till next time …

No comments:

Post a Comment