According to the MSN documentation 'isalnum(c)' is only valid when c equals
EOF, or is in the range 0 to 255 inclusive. It states the behavior is
undefined outside this range, and in Debug mode triggers an assert dialog.
in #352, but I'm worried that there's some over-reach here.
Currently only implemented as a warning, with no switch to turn it off, which
maintains current behavior other than the warning.
In general, we're treating any string as a complete URL, rather than breaking
URL's into component parts. Thus the `IsURLCodePoint()` check includes a few
other generic characters that strictly speaking aren't valid codepoints, but
are valid as escape characters and delimiters.
When addressing #338, I ran into a similar situation in not having a built-in
method to separate path components (although a simple generalized solution was
good enough in that case).
Thus without introducing a new structure and functions to deconstruct a URL
into scheme, authority, path, parameters, etc., some variation of this patch
will have to be used to address #352.
This is a MUCH SANER approach to what I was trying to do (now that I screwed up enough internals to understand some of them!
At this point there are zero exit state reversions, and zero markup reversions! There are still 21 errout reversions; I'll
annotate and adjust as necessary.
my changes, I'm starting over with this. Comments in the PR thread.
This commit reduces the size of attrdict.c while causing only a single errout
regression that is justified.
An immense thanks to Ger Hobbelt who had already done this
in his github.com/GerHobbelt/htmltidy fork.
The two sources have diverges so was not a simple cut
an paste. But again thanks Ger for this.
the anchor name as a parameter, so it can look in the correct bin.
In the case of FreeAttrs, we have the name already (since we found a name or
id attribute). In the case of FixAnchors, the anchor name could come from
either the name or id attribute, so we call the function separately for each
case, passing the appropriate attribute value.
Introduced TY_(IsHTMLSpace)(uint c), which checks to see if c is one of the
chars that the HTML spec (and browsers) treat as a space in attribute
values: 0x020 (space), 0x009 (tab), 0x00a (LF), 0x00c (FF), or 0x00d (CF).
Can't use ANSI C isspace(int c) here because like standard functions for
many other langs, it also treats 0x00b as a space.