fix: recover latin-1 encoded Location headers on redirects#12325
fix: recover latin-1 encoded Location headers on redirects#12325MAXDVVV wants to merge 3 commits intoaio-libs:masterfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #12325 +/- ##
==========================================
- Coverage 99.11% 99.11% -0.01%
==========================================
Files 130 130
Lines 45558 45623 +65
Branches 2404 2406 +2
==========================================
+ Hits 45156 45218 +62
- Misses 272 275 +3
Partials 130 130
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Merging this PR will not alter performance
Comparing Footnotes
|
aiohttp/client.py
Outdated
| except (UnicodeEncodeError, UnicodeDecodeError): | ||
| try: | ||
| raw = r_url.encode("utf-8", "surrogateescape") | ||
| r_url = raw.decode("latin-1") |
There was a problem hiding this comment.
What if it's not latin-1? This seems unreasonable for us to just start guessing charsets randomly.
If fallback_charset_resolver is set, we could use that instead maybe?
Problem
When a server sends a
Locationheader containing raw latin-1 encoded bytes (e.g.\xf8forø), the redirect URL gets corrupted.Redirect chain example (from #10047):
Root cause
The HTTP parser decodes header values with
utf-8/surrogateescape(http_parser.py L208). When a server sends raw latin-1 bytes in theLocationheader (which some servers do, despite RFC violations), bytes like\xf8are not valid UTF-8 and get decoded as surrogates like\udcf8. These surrogates then causeURL()to produce a broken URL.Fix
In the redirect handling code (
client.py), after reading theLocationheader value, detect if it contains surrogates (can't encode to UTF-8). If so, round-trip throughsurrogateescapeback to bytes and decode as latin-1, recovering the original characters:This is a targeted fix that only affects redirect URL processing, not general header decoding.
Verification
Fixes #10047