Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response headers vary: Accept-Encoding is stopping undici from caching the http request #3959

Open
SukkaW opened this issue Dec 17, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@SukkaW
Copy link
Contributor

SukkaW commented Dec 17, 2024

Bug Description

I was trying to try out undici cache with the following code snippets:

import undici, {
  interceptors,
  Agent,
  setGlobalDispatcher
} from 'undici';

setGlobalDispatcher(agent.compose(
  interceptors.cache({
    store: new undici.cacheStores.MemoryCacheStore()
  })
));

(async () => {
  console.time('fetch');
  await undici.request('https://easylist.to/easylist/easylist.txt');
  console.timeEnd('fetch');

  console.time('fetch 2');
  await undici.request('https://easylist.to/easylist/easylist.txt');
  console.timeEnd('fetch 2');
})();

However, I noticed that the request is never cached.

Reproducible By

See code snippet above.

Expected Behavior

The request should be cached and the fetch 2 should happen "instantly".

Environment

Node.js v22.11.0 on Darwin

Additional context

I noticed that https://easylist.to/easylist/easylist.txt is returning the Vary: Accept-Encoding response header regardless of whether Accept-Encoding is included in the request header. This causes undici to panic because undici doesn't expect Accept-Encoding to be included in the cacheKey:

varyDirectives = parseVaryHeader(headers.vary, this.#cacheKey.headers)
if (!varyDirectives) {
// Parse error
return downstreamOnHeaders()
}

cc @flakey5

@SukkaW SukkaW added the bug Something isn't working label Dec 17, 2024
@SukkaW
Copy link
Contributor Author

SukkaW commented Dec 17, 2024

Many CDNs will always return the Vary: Accept-Encoding header whether the HTTP client is sending an Accept-Encoding request header or not. However undici.request doesn't include Accept-Encoding and gzip & br decompression.

Same for GitHub RAW. GitHub RAW always returns Vary: Authorization,Accept-Encoding,Origin header. However, the Authorization and Origin headers might not always exist. The cache should still work w/o those exists in the request headers

@mcollina
Copy link
Member

Thanks for reporting! Would you like to send a Pull Request to address this issue? Remember to add unit tests.

@Uzlopak Uzlopak changed the title Rresponse headers vary: Accept-Encoding is stopping undici from caching the http request Response headers vary: Accept-Encoding is stopping undici from caching the http request Dec 18, 2024
@SukkaW
Copy link
Contributor Author

SukkaW commented Dec 18, 2024

Thanks for reporting! Would you like to send a Pull Request to address this issue? Remember to add unit tests.

I am not sure of the proper way to fix this:

We could simply ignore any missing request headers (those undici are not sending) when parsing the Vary response header, but I am afraid that later requests with those request headers presented will result in cache poisoning.

Or we could use a placeholder for those missing request headers, but the placeholder couldn't be a symbol because it can't be properly serialized for storing.

@mcollina
Copy link
Member

What does the standard say? cc @flakey5

@flakey5
Copy link
Member

flakey5 commented Dec 18, 2024

If (after any normalization that might take place) a header field [specified in the vary header] is absent from a request, it can only match another request if it is also absent there.

section 4.1-4

So I think we can ignore it if it's present in the vary header but isn't actually present in the request. The fix would be removing this else block:

undici/lib/util/cache.js

Lines 273 to 275 in 3eeeeb7

} else {
return undefined
}

I don't think cache poisoning would be a problem, but it is noteworthy that this change would mean that these two requests considered the same:

GET / HTTP/1.1
Host: localhost

HTTP/1.1 200 OK
Date: asd
Vary: a

asd123
GET / HTTP/1.1
Host: localhost

HTTP/1.1 200 OK
Date: asd

asd123

I am afraid that later requests with those request headers presented will result in cache poisoning.

It shouldn't, cache stores get the result of the parsed vary header which maps the header's name to the header's value. The store should be using it to find the request to use. If we ignore a header, it won't be in this mapping meaning it shouldn't be taken into consideration when deciding which response to use.

Regardless, that is definitely something that needs be covered in a unit test if we go down this road.

@flakey5
Copy link
Member

flakey5 commented Dec 18, 2024

but the placeholder couldn't be a symbol because it can't be properly serialized for storing.

Wdyt of using null?

@SukkaW
Copy link
Contributor Author

SukkaW commented Dec 19, 2024

So I think we can ignore it if it's present in the vary header but isn't actually present in the request. The fix would be removing this else block:

I don't think cache poisoning would be a problem, but it is noteworthy that this change would mean that these two requests considered the same:

It shouldn't, cache stores get the result of the parsed vary header which maps the header's name to the header's value. The store should be using it to find the request to use. If we ignore a header, it won't be in this mapping meaning it shouldn't be taken into consideration when deciding which response to use.

I am still not sure if this is the path we should take. I am not sure how to properly implement this now.

but the placeholder couldn't be a symbol because it can't be properly serialized for storing.

Wdyt of using null?

This should work as long as someone doesn't ignore the typescript definitions and passes a Record<string, string | null>.

@SukkaW
Copy link
Contributor Author

SukkaW commented Jan 2, 2025

I am still not sure if this is the path we should take

@flakey5 What's your suggestions on this?

@flakey5
Copy link
Member

flakey5 commented Jan 5, 2025

Sorry for lateish response - looking a bit deeper into this and I think was wrong here:

So I think we can ignore it if it's present in the vary header but isn't actually present in the request.

We can still continue parsing the vary header if we get to one that's in vary but isn't in the request ofc, however, cache stores still need to use it when deciding if a request is cached or not.

My original suggested change wouldn't allow for this and would mean that these two request & responses are seen as the same:

GET / HTTP/1.1
Host: localhost

HTTP/1.1 200 OK
Date: asd
Vary: a

asd123
GET / HTTP/1.1
Host: localhost
a: something

HTTP/1.1 200 OK
Date: asd

asd123

This goes against the spec. So we would need to do something similar to what @SukkaW mentioned here

Or we could use a placeholder for those missing request headers

I'm not really sure on the best way to do this however. The best thing I can think of right now is what I mentioned here #3959 (comment) but I don't particularly like it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants