so-you-wanna-see-an-image

We’ve been asked how we manage serving files from Amazons very cool S3 service at WordPress.com… This is how. (covering a requested image already stored on S3, not the upload -> s3 process)

A request comes into pound for a file. Pound hashes the hostname (via a custom patch which we have not, but may, release) , to determine which of several backend servers the request should hit. Pound forwards the request to that server. This, of course, means that a given blog always serves from the same backend server. The only exception to the afore-mentioned rule is if that server is, for some reason, unavailable in which case it picks another server to serve that hostname from temporarily.

The request then comes into varnishd on the backend servers. The varnishd daemon checks its 300Gb worth of files cache and (for the sake of this example) finds nothing (hey, new images are uploaded all the time!) Varnishd then checks with the web server (running on the same machine, just bound to a different IP/Port#) and that request is handled by a custom script.

So, a http daemon on the same backend server runs the file request. The custom script checks the DB to gather information on the file (specifically which DC’s it is in, size, mod time, and whether its deleted or not) all this info is saved in memcached for 5 minutes. The script increments and checks the “hawtness” (term courtesy of Barry) of the file in memcached (if the file has been accessed over a certain # of times it is then deemed “hawt”, and a special header is sent with the response telling varnishd to put the file into its cache. When that happens the request would be served directly by varnishd in the previous paragraph and never hit the httpd or this script again (or at least not until the cache entry expires.)) At this point, assuming the file should exist (deleted = 0 in the files db) we fetch the file from a backend source.

Which backend source depends on where it is available. The order of preference is as follows: Always fetch from Amazon S3 if the file lives there (no matter what, the following preferences only ever occur if, for some reason, s3 = 0 in the files db), and if that fails fetch from the one files server we still have (which has larger slower disks, and is used for archiving purposes and fault tolerance only)

After fetching the file from the back end… the custom script hands the data and programatically generated headers to the http daemon, which hands the data to varnishd, varnishd hands the data to pound, pound hands the data to the requesting client, and the image appears in the web browser.

And there was much rejoicing (yay.)

For the visual people among us who like visuals and stuff… (I like visuals…) here goes…

Comments (4)

  1. deepfryed wrote::

    how does using varnishd on the same box as the webserver runs on work better than using mod_cache ? does it implement a better caching mechanism ?

    Sunday, October 21, 2007 at 6:55 AM #
  2. Anime Girl wrote::

    Wow that is amazing. You guys have a fantastic service at WordPress.com. Have you ever considered charging for it?

    Sunday, October 21, 2007 at 8:42 AM #
  3. john allspaw wrote::

    Very cool stuff. Sounds slightly familiar to what we do at flickr.

    So this patch for pound you mention: it sounds like layer 7 URL hashing, except it only hashes on just the hostname, and not the rest of the URL, correct ? I can see why that would be great for you guys, wondering if it could be extended to handle the whole URL.

    Saturday, October 27, 2007 at 3:17 AM #
  4. apokalyptik wrote::

    John: I'm sure it could

    Saturday, October 27, 2007 at 3:28 AM #

Trackbacks/Pingbacks (6)

  1. WordPress.com using S3 « Barry on WordPress on Wednesday, October 10, 2007 at 12:26 PM

    [...] Virgin America review WordPress.com using S3 October 10th, 2007 Demitrious has a great post explaining how we are using S3, Varnish, and Pound to serve 60 million image requests per day on [...]

  2. Wordpress.com Powered by Amazon S3, Varnish, and Pound on Thursday, October 11, 2007 at 10:15 AM

    [...] is a more detailed review of how they manage serving files from Amazons S3 service at WordPress.com. They explain how a [...]

  3. [...] so-you-wanna-see-an-image at CodeWord: Apokalyptik – We?ve been asked how we manage serving files from Amazons very cool S3 service at WordPress.com? This is how. (covering a requested image already stored on S3, not the upload -> s3 process) [...]

  4. Static hostname hashing in Pound « Barry on WordPress on Wednesday, October 31, 2007 at 10:37 PM

    [...] however, and it has to do with the way we serve images. As Demitrious explained in his detailed post, when a request for an image is made, pound sends the request to a cache server running Varnish. [...]

  5. Amazon AWS Outage « Barry on WordPress on Friday, February 15, 2008 at 6:44 AM

    [...] degraded this morning. This is the first significant outage since we started using S3 to serve images for WordPress.com. Currently we serve about 1500 image requests per second across WordPress.com. [...]

  6. Amazon S3 Outage | Robert Accettura’s Fun With Wordage on Sunday, July 20, 2008 at 6:14 PM

    [...] uses S3, but proxies that with Varnish. There’s a brief description here, and a more detailed breakdown here. According to Barry Abrahamson, WordPress.com does 1500 image requests per second across and 80-100 [...]