Skip to content
This repository was archived by the owner on Dec 10, 2018. It is now read-only.
This repository was archived by the owner on Dec 10, 2018. It is now read-only.

Utf-8-encoded unicode in thrift definition comments causes failure of thriftpy.load in Python 3 #309

@aawilson

Description

@aawilson

To reproduce, save the following as a .thrift file in an app that preserves the quotations as they are (rather than converting them to something ASCII-friendly) (my test file was saved as utf-8, for reference):

service PingPong {
    /* Ping to the pong with “funky quotes” y'all */
    string ping(),
}

In a Python 3 environment, run the following:

from thriftpy import load
load(path_to_thrift)

Observe something like this:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 66: character maps to <undefined>

(this was on Windows, other platforms might have other codecs listed, or maybe won't experience this problem at all).

I was personally able to fix this by adding an "encoding" argument to the open call in parser.py, but that argument doesn't exist in Python 2.7 and lower, so it is not a version-agnostic fix (and could conceivably be wrong anyway if the thrift file were saved in some other encoding for some reason, since I doubt the spec actually specifies an encoding). My guess is that file treatment will have to be rewritten to open files as binary and treat them explicitly rather than just passing them to the lexer (simply passing mode='rb' wasn't sufficient, so there's more to do).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions