Script streaming: more UTF-8 handing fixes (again).
1) Since we fill the output buffer both from the chunks and the conversion buffer, it's possible that we run out of space and call CopyCharsHelper with 0 length. The underlying functions don't handle it gracefully, so check there. 2) There was a bug where we used to try to copy too many characters from the beginning of the data chunk into the conversion buffer. Continuation bytes in UTF-8 are of the form 0b10XXXXXX. If a byte is bigger than that, it's the first byte of a new UTF-8 character and we should ignore it. These two together (or maybe in combination with surrogates) are a probable reason for crbug.com/420932. 3) The test data was off; \uc481 is \xec\x92\x81. BUG=420932 LOG=N R=yangguo@chromium.org Review URL: https://codereview.chromium.org/662003003 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@24725 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
Showing
Please
register
or
sign in
to comment