Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES/10047.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fixed redirect following when the server sends a ``Location`` header containing raw latin-1 encoded bytes (e.g. ``\xf8`` for ``ø``). Previously, these were decoded via UTF-8 surrogateescape, producing lone surrogates that broke URL parsing and caused 404 errors. The redirect URL is now recovered by round-tripping through latin-1 -- by :user:`lichuang9890-star`.
15 changes: 15 additions & 0 deletions aiohttp/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -847,6 +847,21 @@
# response is forbidden
resp.release()

# Some servers send Location headers with raw
# latin-1 bytes (e.g. \xf8 for ø). The HTTP
# parser decodes them via utf-8/surrogateescape,
# producing lone surrogates (\udcf8) that break
# URL parsing. Recover by round-tripping back
# to bytes and decoding as latin-1. (See #10047)
try:
r_url.encode("utf-8")
except (UnicodeEncodeError, UnicodeDecodeError):
try:
raw = r_url.encode("utf-8", "surrogateescape")
r_url = raw.decode("latin-1")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it's not latin-1? This seems unreasonable for us to just start guessing charsets randomly.

If fallback_charset_resolver is set, we could use that instead maybe?

except (UnicodeDecodeError, UnicodeEncodeError):

Check notice

Code scanning / CodeQL

Empty except Note

'except' clause does nothing but pass and there is no explanatory comment.
pass

try:
parsed_redirect_url = URL(
r_url, encoded=not self._requote_redirect_url
Expand Down
Loading