This is a case where the lexer, in GetTokenfromStream, does NOT eat any
trailing newline after a LEX_STARTTAG: case...
So far have identified pre, script, style as NEEDING this user newline
character for later pprint output. Any others?
html5 allows a naked ampersand unquoted, and now tidy will not issue a
warning. This only deals with a & b, and P&<li>O</li>
More may need to be done for other cases.
Revert TidyTag_A to HTML5 mode, but allow the table to be modified if the
DOCTYPE given is found to NOT be HTML5, through a service TY_(AdjustTags).
Care is taken to clear any previous hash cached tags.
At present this only effects the anchor tag, but could be applied to
others that need to change their parsing due to an identified DOCTYPE.
This is a set of kludgy fixes for MathML attribute and entities support.
It is intended that a full HTML5 entity table be added at some time, but
at present ALL entities are accepted as written when within the math
element.
Likewise all attributes are accepted on MathML elements without any check
of their name or value, even if they match attributes outside MathML.
And in the pprinter such entities are written as is from the lexer, using
a new PPrintMathML service added, using the new mode OtherNameSpace.
It is hoped all these fixes will NOT effect tidy outside the math element.
ALL fixes in the set a clearly marked '#130 - MathML attr and entity fix!'
for easy searching, and improving if possible.
the anchor name as a parameter, so it can look in the correct bin.
In the case of FreeAttrs, we have the name already (since we found a name or
id attribute). In the case of FixAnchors, the anchor name could come from
either the name or id attribute, so we call the function separately for each
case, passing the appropriate attribute value.
Introduced TY_(IsHTMLSpace)(uint c), which checks to see if c is one of the
chars that the HTML spec (and browsers) treat as a space in attribute
values: 0x020 (space), 0x009 (tab), 0x00a (LF), 0x00c (FF), or 0x00d (CF).
Can't use ANSI C isspace(int c) here because like standard functions for
many other langs, it also treats 0x00b as a space.