Skip to content

Possible Encoding Issues #4

@nathanhammond

Description

@nathanhammond

It's possible that this library is selecting the wrong encoding for some characters. In comparing the output from this library to the content of https://github.com/lshk-org/jyutping-table I've noticed the following discrepancies.

I believe that these issues should be resolved in this library, and that the other output is correct.

The below results are also included in a related issue filed at lshk-org/jyutping-table#5

- From the original export, present in `list-20040907.tsv`
+ From a new export, https://github.com/nathanhammond/parse-jyutping-table-full/blob/master/totsv.js

- 十	U+5341	sap6
+ 〸	U+3038	sap6
- 卄	U+5344	jaa6
- 卄	U+5344	je6
- 卄	U+5344	lim6
- 卄	U+5344	nim6
+ 〹	U+3039	jaa6
+ 〹	U+3039	je6
+ 〹	U+3039	lim6
+ 〹	U+3039	nim6
- 卅	U+5345	saa1 aa6
+ 〺	U+303A	saa1 aa6
- 兀	U+5140	at6
- 兀	U+5140	ngat6
+ 兀	U+FA0C	at6
+ 兀	U+FA0C	ngat6
- 嗀	U+55C0	hok3
+ 嗀	U+FA0D	hok3

Further, there is a weird one:

+ 浧	U+6D67	wun3
- 𤧬	U+249EC	wun3

From JPTableFull.pdf that is defined as: { ucs2: "E6C5", jyutping: "wun3" }.

I do believe that U+249EC is the correct value here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions