This is only if nonested is on, then a <script> tag has not incremented
the nested, so likewise no need to treat an escaped close tag <\/script>
as an end tage to decrement nested.
This is in the GetCDATA function. If the container is script or style and
this option is on, avoid bumping nested.
This addresses issues #65 (1642186) and #280.
All attempts at parsing script data are now abandoned as a bad direction.
That is reordering windows includes per #234
In general the order of includes should be system <headers>,
then local "headers", except perhaps for the ocassional local
"version" or "config" header...
Resolved conflicts in src/pprint.c by reverting to current master, and in
version.txt by increasing the version.
Moved the <windows.h> include above the "streamio.h" include to fix compilation with the latest Windows SDK.
<winnt.h> now has the following struct. In particular the `CR` member of this struct conflicts with a define in streamio.h.
typedef struct _IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY {
DWORD BeginAddress;
union {
DWORD UnwindData;
struct {
DWORD Flag : 2;
DWORD FunctionLength : 11;
DWORD RegF : 3;
DWORD RegI : 4;
DWORD H : 1;
DWORD CR : 2; // This line causes a compile error because CR is redefined in streamio.h
DWORD FrameSize : 9;
} DUMMYSTRUCTNAME;
} DUMMYUNIONNAME;
} IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY, * PIMAGE_ARM64_RUNTIME_FUNCTION_ENTRY;
This was evidenced by an 'assert' failure, that the type was not an 'int'!
And also in the -xml-help output, thus effecting the tidy.1 manual page
for this new feature --vertical-space auto, which produces almost single
line html output.
This 'fix' began in the issue-228 branch - see Issue #231
This is a case where the lexer, in GetTokenfromStream, does NOT eat any
trailing newline after a LEX_STARTTAG: case...
So far have identified pre, script, style as NEEDING this user newline
character for later pprint output. Any others?
html5 allows a naked ampersand unquoted, and now tidy will not issue a
warning. This only deals with a & b, and P&<li>O</li>
More may need to be done for other cases.
This is when setting a String config value through say tidyOptSetValue
using say tidyOptSetValue(tdoc,id,"").
If the length of the new string is zero then do not allocate a 1 byte
buffer, set it to 0, for the option. Any previous buffer has already been
released.
This means API functions like tidyOptSaveSink will not return erroneous
null String values!
This new warning will only be seen if the document remains in HTML5 mode,
where the summary attribute is obsolete. The W3C validator flags this as
an error, and suggests 'Consider describing the structure of the table in
a caption element or in a figure element containing the table; or simplify
the structure of the table so that no description is needed'.
At the same time this patch also restored the old warning if the document
is HTML4--, if the table element lacks a summary attribute. This has been
a tidy warning since the beginning of time, although the W3C validator
does not presently flag this.
In certain circumstances a leading space has to be preverved to allow it
to be used to create a text space node to insert before this element to
preserve the view in a browser.
And added a note asking why is ParseTag called with a hardcoded
IgnoreWhitespace when some effort above has set the mode variable to
MixedContent in certain cases, but need to think about this 2nd change.
Also added some MSVC Debug output when this leading text is used to insert
such a created text node before the element just to be reminded of this
special event.
Such debug is OFF by default, and only added by defining DEBUG_MEMORY. And
is only available for the Debug configuration compiled with MSVC, but this
could be easily extended...
This is particularly for the anchor tag which in html5 mode is parsed in
ParseBlock. That is retain a leading space, in case it needs to be
moved to in front of the block to keep space rendering.
Revert TidyTag_A to HTML5 mode, but allow the table to be modified if the
DOCTYPE given is found to NOT be HTML5, through a service TY_(AdjustTags).
Care is taken to clear any previous hash cached tags.
At present this only effects the anchor tag, but could be applied to
others that need to change their parsing due to an identified DOCTYPE.
The fix for #111 added an end tag for all StartEnd tags, when outputting
HTML5, but there should be some exceptions to this.
Added a new service, isVoidElement(node) for the void elements. Perhaps
this service could be further optimised.
With this fix introduced two new services, FindNodeById and
FindNodeWithId. The former does a total tree search for a TidyTagId.
Maybe there is a way to optimise this search...
Also change the uint badForm from an on/off to a bit field, so could be
extended to other document format errors.
This is a set of kludgy fixes for MathML attribute and entities support.
It is intended that a full HTML5 entity table be added at some time, but
at present ALL entities are accepted as written when within the math
element.
Likewise all attributes are accepted on MathML elements without any check
of their name or value, even if they match attributes outside MathML.
And in the pprinter such entities are written as is from the lexer, using
a new PPrintMathML service added, using the new mode OtherNameSpace.
It is hoped all these fixes will NOT effect tidy outside the math element.
ALL fixes in the set a clearly marked '#130 - MathML attr and entity fix!'
for easy searching, and improving if possible.
As predicted the previous fix had adverse consequences on say script text,
which then lost the indent, and was reverted.
This introduces a new service, nodeIsTextLike, which naturally returns yes
if it is text, but also is an AspTag.
Maybe other text like nodes need to be added.
An immense thanks to Ger Hobbelt who had already done this
in his github.com/GerHobbelt/htmltidy fork.
The two sources have diverges so was not a simple cut
an paste. But again thanks Ger for this.
This is a simple but profound change in pprint.c.
Since leading space is preserved on script code, after tidy indents
the code once, a second run on that tidied file would add more indent
to already indented code. This fix should be carefully checked,
and removed if there are other bad consequences.
Bump the version point to 4 for this change.
the anchor name as a parameter, so it can look in the correct bin.
In the case of FreeAttrs, we have the name already (since we found a name or
id attribute). In the case of FixAnchors, the anchor name could come from
either the name or id attribute, so we call the function separately for each
case, passing the appropriate attribute value.
Introduced TY_(IsHTMLSpace)(uint c), which checks to see if c is one of the
chars that the HTML spec (and browsers) treat as a space in attribute
values: 0x020 (space), 0x009 (tab), 0x00a (LF), 0x00c (FF), or 0x00d (CF).
Can't use ANSI C isspace(int c) here because like standard functions for
many other langs, it also treats 0x00b as a space.