S3 Static Website: Return HTTP 410

Question

Background

I have a static website on S3 with 10000s of HTML pages indexed on Google. I'm moving to a new version and I want to remove old pages (which may no longer exist) from Google index. I've read online that the most efficient way to do that is to return HTTP 410 (Gone)

Problem

According to http://docs.aws.amazon.com/AmazonS3/latest/dev/CustomErrorDocSupport.html , you can not return a HTTP 410 when using S3 Static website

Api Gateway

I created a mock integration of API Gateway which return HTTP 410. Then I configured my S3 bucket to automatically redirect specific prefix to this url. However, the return code seen is HTTP 301 (for the first redirect). If I GET the API endpoint directly, I receive the 410 successfully, however if I access the API through a S3 GET, then the error code is 301

What's next

If anyone has an idea on how to return HTTP 410 on a static website hosted on S3, let me know.

Additionally, if you can think of a better alternative to de-index old page on Google (the manual tool isn't a solution as I have a large amount of pages) let me know :)


Show source
| seo   | amazon-web-services   | amazon-s3   | redirect   2016-12-30 17:12 1 Answers

Answers to S3 Static Website: Return HTTP 410 ( 1 )

  1. 2016-12-30 19:12

    I really feel that a better answer would be to put a server in front of the S3 content with a very simple database table. Your real issue is determining a 410 vs a 404. That is, you know a page is gone but how do you differentiate from a typo or other error?

    What I would envision is a table that is indexed by the path name - i.e., /path/to/my/file.html and a status of some sort. The server takes in a request for the full path, does a lookup in the database and either serves the page (assuming that the page is "active" or "available") or a 410 if you know the page is not active. If the page can't be found in the database then return a 404.

    The two issues I see with this approach are:

    1. The initial population of the database. If you've already removed the pages from S3 then how will you know when to put in a page and a "not available" flag? I'm not sure how many pages we're talking about but it could be quite big the first time.
    2. Maintenance - you will likely need an administrative interface of some sort down the road for the next time you need to deactivate some number of pages.

    There are content management systems that will do some of this for you or it wouldn't be too bad to write a simple server to do this pending the issues I've outlined.

Leave a reply to - S3 Static Website: Return HTTP 410

◀ Go back