Changes in the validation of UTF-8
All UTF-8 encoding functionality (including the escape sequence '\u') accepts all values from the original UTF-8 specification (with sequences of up to six bytes). By default, the decoding functions in the UTF-8 library do not accept invalid Unicode code points, such as surrogates. A new parameter 'nonstrict' makes them accept all code points up to (2^31)-1, as in the original UTF-8 specification.
This commit is contained in:
2
llex.c
2
llex.c
@@ -335,7 +335,7 @@ static unsigned long readutf8esc (LexState *ls) {
|
||||
while ((save_and_next(ls), lisxdigit(ls->current))) {
|
||||
i++;
|
||||
r = (r << 4) + luaO_hexavalue(ls->current);
|
||||
esccheck(ls, r <= 0x10FFFF, "UTF-8 value too large");
|
||||
esccheck(ls, r <= 0x7FFFFFFFu, "UTF-8 value too large");
|
||||
}
|
||||
esccheck(ls, ls->current == '}', "missing '}'");
|
||||
next(ls); /* skip '}' */
|
||||
|
||||
Reference in New Issue
Block a user