Amazon S3 bandwidth protection, improved

Background

Amazon S3 charges by bandwidth used. Alas, when using S3 for hosting web content1, other people's websites can include your images or scripts. If they copy onto their own web host, you won't be charged. But if their sites include your images directly, Amazon makes you pay for the bandwidth. That's nicknamed "hotlinking". In HTML, it looks like <img src="//you.s3.amazon.com/something.png"> 2 or <link rel="stylesheet" href="//you.s3.amazon.com/style.css">

HTTP has a sketchy header named "Referer" (sic3) that can prevent this. Its main goal in life is to violate a Web user's privacy by telling to each site she visits which page on which Web site linked her to there. (Or less cynically, to help webmasters improve their sites using this information.)

Sending "Referer" is not mandatory. Web users can disable it in part or whole.4 In my time with Referer turned off, the only pages I've seen suffer are some payment and website-administration sites that use this header for an additional modicum of security. Don't be the one whose site breaks with Referer turned off.

Method

Most people browse with Referer on, happily for hotlink-prevention.

The server that hosts your static website files can forbid requests for Referers other than your site. It can allow requests that contain no Referer header: Web authors can't disable Referer for embedded content, only Web users can. (There is a new link attribute rel="noreferrer" which, for this reason, only applies to external links like <a>.5)

Apache has plenty of sites telling you how. Amazon S3 has a few which failed to meet my test of being accessible with Referer disabled. So I tweaked their S3 bucket policy (which is JSON) until it did what I wanted.

My S3 bucket policy

{
  "Version": "2008-10-17",
  "Id": "refererGuard",

  "Statement": [
  {
    "Sid": "1",
    "Effect": "Allow",
    "Principal": { "AWS": "*" },
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::idupree/*",
    "Condition": {
      "StringNotLike": {
        "aws:Referer": [
          "*"
        ]
      }
    }
  },
  {
    "Sid": "2",
    "Effect": "Allow",
    "Principal": { "AWS": "*" },
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::idupree/*",
    "Condition": {
      "StringLike": {
        "aws:Referer": [
          "http://idupree.com/*",
          "https://idupree.com/*",
          "http://www.idupree.com/*",
          "https://www.idupree.com/*",
          "http://idupree.s3.amazonaws.com/*",
          "https://idupree.s3.amazonaws.com/*",
          "http://localhost/*",
          "https://localhost/*"
        ]
      }
    }
  }
  ]
}

Change every instance of 'idupree' to the name of your S3 bucket and 'idupree.com' to your web domain.

The first Statement allows no-referrer requests, and the second Statement allows requests that have any referrer that you approve therein. (Adding "" in the list of "aws:Referer"s did not suffice to permit no-referrer requests.)

Cautions

Don't change "Version": "2008-10-17": Amazon will give you an obscure error message (as of January 2012, I got Policies must be valid JSON and the first byte must be '{' - undefined for putting in a different YYYY-MM-DD date.).

If you think to put "*" to cover all your subdomains, consider first that it's a textual match. If you say to allow http://*.idupree.com/* you are also allowing http://MyFavoriteClown.net/url.that.mentions.idupree.com/index.html. That may rarely happen by accident. But do you really want to make it easier for your sketchiest competitors, archenemies and script kiddies who have a grudge on you to spend your money?6 Practice your CAUTION TAPE skills here, before you take another shortcut and find yourself risking actual users' data, dollars, or dreadful delights7.

Don't apply this policy to your HTML files, if you want the Web to work as it does normally. It would make it difficult for people (perhaps including yourself!) to link to your site normally via links such as <a> tags, in e-mails if the recipient uses webmail, etc.8

If you want to host HTML on S3 and have normal links work, an additional statement Allow-ing "arn:aws:s3:::idupree/*.html" might work. (Other possibilities might include

Don't skip the HTTPS versions, because one day you'll try to enable HTTPS in your site and you'll have a heck of a time debugging it.


  1. Amazon Cloudfront is probably faster and equally cheap for this purpose, but seems to have no way to prevent hotlinking.

    NearlyFreeSpeech, a web hosting company, is prepaid and thus lets you limit the amount of money you're willing to spend. If Amazon let you set a spending limit for any of its Web services, that would also be sufficient to make me happy.

    It would also sort of work to frequently change the URL of your files there. But that would be silly, and degrade client-side caching, and mean that user bookmarks (if a user bookmarked an image file directly) would break even when you didn't change the image file's contents.

  2. The absolute links sans http:/https: mean to use the same protocol as the current page. This keeps encrypted pages fully encrypted (as is critical for security) and unencrypted pages not (which is less critical).

    If the destination server supports HTTPS under the same URLs as it supports HTTP, this is a good way to include content by default if you don't have a specific reason not to.

  3. The English word is spelt "referrer". In a rare fit of mistaken spelling, the engineers got it wrong, and it wasn't noticed until the time when fixing it would cause compatibility problems.

    RFC 2616 (HTTP 1.1) specifies Referer in section 14.36.

  4. RFC 2616 (HTTP 1.1) describes its optionality in section 15.1.3.

    Firefox lets you disable it by going to about:config and setting network.http.sendRefererHeader to 0, or installing any of a few addons. A quick search didn't tell me an easy way on the other browsers.

  5. Note that rel="noreferrer" only applies to <a> and <area> tags, not <link>, <img>, etc. I believe this was intentional but can't find my citation.

    Here is rel="noreferrer" in the WHATWG and W3C HTML5 specs.

  6. They can spend your money without help by going to your web pages repeatedly in a script, but S3's bandwidth prices are so low that they'd probably spend as much on download bandwidth as you're spending to serve it to them. But if they put up a popular web page that includes any of your files, no more luck. Of course, if they requisition a botnet, Referer is no protection at all, so it's up to you.

  7. The dreadful delights are for alliteration.
    You might be surprised how normal some of your users are.
    You might meet the most amazingly respectful person in the world
    and be surprised to find out that they like death metal

    and that they're still
    just as amazingly respectful
    as you sensed before.

  8. For unrelated security reasons, it's good to serve your HTML pages (both static and dynamic pages) with the HTTP header X-Frame-Options: SAMEORIGIN, unless you specifically intend a page to be included via <iframe> on websites on other domains. Amazon ignored me when I tried to use this header via s3cmd (1.0.0, in January 2012). I think I was doing it right because it went along with me adding Expires and/or Cache-Control headers this way. This is yet another reason I don't plan to serve any HTML (text/html) or XHTML(application/xhtml+xml) content on Amazon's servers. If I use SVG (image/svg+xml) I'll be careful too because SVG is sometimes permitted to have dynamic behaviour like HTML's javascript, forms, and links.