ETag and HTTP Caching

A practical scenario for the HTTP ETag header is for client HTTP caching of GET requests. Additionally, the caching workflow also requires handling other conditional HTTP headers, such as If-Match or If-None-Match. However, the interaction of these headers can sometimes seem a bit complex.

Whenever I need to deal with this issue, I always spend time reviewing the relevant MDN documentation1[1]2[2]3[3] to refresh my knowledge. At this point, I have done it enough times that I feel it is necessary to write it down.

Caching Responses for GET Endpoints

The basic workflow is as follows:

  • The client sends a GET request to the server.
  • The server responds with 200 OK, including the requested content and an ETag header.
  • The client caches this response along with its ETag value.
  • For subsequent requests for the same resource, the client will include the If-None-Match header with the cached ETag value.
  • The server regenerates the ETag and checks if it matches the value provided by the client.
    • If it matches, the server replies with a 304 Not Modified status, indicating that the client’s cache is still valid, and the client continues to use the cached resource.
    • If it does not match, the server replies with a 200 OK status, along with new content and a new ETag header, prompting the client to update its cache.
Client                                 Server
  |                                       |
  |----- GET Request -------------------->|
  |                                       |
  |<---- Response 200 OK + ETag ----------|
  |     (Cache response locally)          |
  |                                       |
  |----- GET Request + If-None-Match ---->|  (If-None-Match == previous ETag)
  |                                       |
  |       Does ETag match?                |
  |<---- Yes: 304 Not Modified -----------|  (No body sent; Use local cache)
  |       No: 200 OK + New ETag ----------|  (Update cached response)
  |                                       |

We can test this workflow using GitHub’s REST API suite with the GitHub CLI4[4]. After installation and authentication, you can make a request like this:

This will request data related to the user rednafi. The response is as follows:

HTTP/2.0 200 OK
Etag: W/"b8fdfabd59aed6e0e602dd140c0a0ff48a665cac791dede458c5109bf4bf9463"

{
  "login":"rednafi",
  "id":30027932,
  ...
}

I have simplified the response body and ignored irrelevant headers. You can see that the HTTP status code is 200 OK, and the server includes an ETag header.

The W/ prefix indicates the use of a weak validator5[5], which does not perform a bitwise comparison when validating cached content. Therefore, if the response is JSON, even if the format of the JSON changes, the value of the ETag header will not change, because two different formats of JSON with the same content are semantically equivalent.

What happens if we resend the request and include the ETag value in the If-None-Match header?

gh api -i -H \
    'If-None-Match: W/"b8fdfabd59aed6e0e602dd140c0a0ff48a665cac791dede458c5109bf4bf9463"' \
    /users/rednafi

The return result:

HTTP/2.0 304 Not Modified
Etag: "b8fdfabd59aed6e0e602dd140c0a0ff48a665cac791dede458c5109bf4bf9463"

gh: HTTP 304

This indicates that the client’s cached response is still valid, and there is no need to retrieve data from the server again. Therefore, the client can be set to use the cached data when making requests.

Several key points to note:

  • When sending the If-None-Match header, be sure to enclose the ETag value in double quotes, as specified6[6].

  • Using the If-None-Match header to pass the ETag value means that if the client’s ETag value does not match the server’s, the request is considered successful. When the values match, the server will return 304 Not Modified, without including a response body.

  • When writing a compliant server, the specification requires us to perform weak comparisons of ETags for If-None-Match7[7], so that even if the data representation changes slightly, the client can effectively validate the cache.

  • If the client is a browser, it will automatically manage the cache and initiate conditional requests without requiring additional user action.

Writing a Server that Supports Client Caching

When serving static content, a load balancer can be set up to support this caching workflow. However, for dynamic GET requests, the server requires more setup to support client caching.

Here is a simple example of a Go language server that demonstrates how to set up the above caching workflow for dynamic GET requests:

package main

import (
    "crypto/sha256"
    "encoding/hex"
    "fmt"
    "net/http"
    "strings"
)

// calculateETag generates a weak ETag by hashing the content with SHA-256 and adding the 'W/' prefix for weak comparison
func calculateETag(content string) string {
    hasher := sha256.New()
    hasher.Write([]byte(content))
    hash := hex.EncodeToString(hasher.Sum(nil))
    return fmt.Sprintf("W/"%s"", hash)
}

func main() {
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        // Define the content in the handler
        content := `{"message": "Hello, world!"}`
        eTag := calculateETag(content)

        // Remove quotes and 'W/' prefix from the 'If-None-Match' header for comparison
        ifNoneMatch := strings.TrimPrefix(
            strings.Trim(r.Header.Get("If-None-Match"), """), "W/")

        // Generate content hash without 'W/' prefix for comparison
        contentHash := strings.TrimPrefix(eTag, "W/")

        // Check if ETag matches; if it matches, return 304 Not Modified
        if ifNoneMatch == strings.Trim(contentHash, """) {
            w.WriteHeader(http.StatusNotModified)
            return
        }


        // If ETag does not match, return content and ETag
        w.Header().Set("ETag", eTag)  // Send weak ETag
        w.Header().Set("Content-Type", "application/json")
        w.WriteHeader(http.StatusOK)
        fmt.Fprint(w, content)
    })

    fmt.Println("Server is running on http://localhost:8080")
    http.ListenAndServe(":8080", nil)
}
  • The server generates a weak ETag for the content using SHA-256 hashing and adds the W/ prefix, indicating it is for weak comparison.

  • The server includes this weak ETag when sending content, allowing the client to cache the content along with its ETag.

  • For subsequent requests, the server checks if the client has sent the If-None-Match header with the ETag and performs a weak comparison with the current content’s ETag.

  • If the ETags match, it indicates that the content has not significantly changed, and the server will reply with a 304 Not Modified status. Otherwise, it will resend the content and 200 OK status, updating the ETag. This way, the client knows that the existing cache can still be used without changes.

Start the server by running go run main.go and test it with the following command:

curl -i  http://localhost:8080/foo

This will return a JSON response containing the ETag:

HTTP/1.1 200 OK
Content-Type: application/json
Etag: W/"1d3b4242cc9039faa663d7ca51a25798e91fbf7675c9007c2b0470b72c2ed2f3"
Date: Wed, 10 Apr 2024 15:54:33 GMT
Content-Length: 28

{"message": "Hello, world!"}

Make another request and use the ETag value in the If-None-Match header:

curl -i -H \
    'If-None-Match: "1d3b4242cc9039faa663d7ca51a25798e91fbf7675c9007c2b0470b72c2ed2f3"' \
    http://localhost:8080/foo

This will return a 304 Not Modified response with no content:

HTTP/1.1 304 Not Modified
Date: Wed, 10 Apr 2024 15:57:25 GMT

In practical applications, you might place the caching logic in middleware so that all HTTP GET requests can be cached by the client without needing to set it up repeatedly.

Considerations

When building a server that supports caching, ensure that the system configuration allows the server to consistently return the same ETag for the same content, even if multiple servers are operating behind the scenes. If different servers generate different ETags for the same content, it can lead to client cache confusion.

Clients rely on ETags to determine whether content has changed. If the ETag value remains unchanged, they assume the content has not changed and will not re-download, saving bandwidth and improving access speed. However, if ETags are inconsistent between servers, clients may download content they already have, which not only wastes bandwidth but also affects efficiency.

This kind of inconsistency can also lead to the server handling more requests that could have been satisfied by the cache, and if ETags can be kept consistent, this situation can be avoided.

References

[1]

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag

[2]

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Match

[3]

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match

[4]

https://cli.github.com/

[5]

https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests#weak_validation

[6]

https://www.rfc-editor.org/rfc/rfc7232#section-3.2

[7]

https://www.rfc-editor.org/rfc/rfc7232#section-2.3.2

Leave a Comment