Commit graph

112 commits

Author SHA1 Message Date
Geoff McLane 1830fdb97c Issue #384 - insert comments 2016-03-30 14:18:04 +02:00
Geoff McLane 4b135d9b47 Merge pull request #384 from seaburg/master
Fix skipping parsing character
2016-03-30 14:08:40 +02:00
Geoff McLane 000c6925bd Issue #348 - Add option 'escape-script', def = yes 2016-03-20 01:01:46 +01:00
Evgeniy Yurtaev 7d28b21e60 Fix skipping parsing character 2016-03-17 23:30:11 +03:00
Geoff McLane d091027089 Issue #377 add debug only output of constrained versions 2016-03-03 20:21:35 +01:00
Jim Derry 97abad0c05 Bump to 5.1.39 for merging.
Merge branch 'master' into attrdict_phase2
2016-02-16 11:11:36 +08:00
Jim Derry 3431dd05a4 Merge branch 'master' into attrdict_phase1
Bump version to 5.1.38
2016-02-16 11:07:32 +08:00
Jim Derry 1e4f7dd0f1 Merge pull request #368 from htacg/issue-341
Issue #341
2016-02-16 10:18:26 +08:00
Geoff McLane a4f425546f Improve MSVC DEBUG output.
Previous only output the first 8 characters, followed by an elipse if more
than 8. Now return first up to 19 chars. If nore than 19, return first 8,
followed by an elipse, followed by the last 8 characters.

This is in the get_text_string service, which is only used if MSVC and not
NDEBUG.
2016-02-14 18:17:46 +01:00
Jim Derry 896b00238b Forgot one file... 2016-02-13 11:53:40 +08:00
Jim Derry 2ade3357a9 Phase 2
This is a MUCH SANER approach to what I was trying to do (now that I screwed up enough internals to understand some of them!
At this point there are zero exit state reversions, and zero markup reversions! There are still 21 errout reversions; I'll
annotate and adjust as necessary.
2016-02-13 11:31:16 +08:00
Jim Derry e947d296e4 Handle some issues with misusing VERS_HTML5 in the doctype. 2016-02-12 20:49:14 +08:00
Geoff McLane 03a643f781 Issue #341 - No token can be inserted if istacksize == 0! 2016-02-08 15:12:23 +01:00
Geoff McLane c1f94c066c Tidy up some debug only code.
After @sria91 added #360 merge, added a little more improvement...
2016-01-30 20:51:27 +01:00
Srikanth Anantharam 9a0af48a4e fixed a NULL node bug in debug build 2016-01-30 22:03:52 +05:30
Jim Derry 9ae15f45a7 Consistent tabs
Fixed tabs in template file, and regen'd all related files.
2016-01-30 15:51:54 +08:00
Jim Derry 26e7d9d4b0 Fixes Mac OS X encoding issues and harmonizes output across platforms.
Previously Tidy produced different output based on the compilation target, NOT based on
the file encoding and specified options. Every platform was equal except Mac OS. Now unless
the encoding is specifically set to a Mac file type, all encoding assumptions are the same
across platforms.
2015-12-31 13:57:34 +08:00
Geoff McLane 2388fb0175 Issue #307, #167, #169 - regression of nestd anchors 2015-11-22 18:46:00 +01:00
Geoff McLane 800b91e576 Issue #65 - effect name change to skip-nested, and default to on 2015-11-05 15:19:39 +01:00
Geoff McLane c8751f60e7 Issue #286 - use AddByte for internal transfer 2015-10-20 15:04:18 +02:00
Geoff McLane d75c82275d Issue #285 - Add a ResetTags func to erset html5 mode before each document 2015-10-14 16:55:35 +02:00
Geoff McLane adbad0379e Issue #65 - if nonested then no endtag needed to decrement.
This is only if nonested is on, then a <script> tag has not incremented
the nested, so likewise no need to treat an escaped close tag <\/script>
as an end tage to decrement nested.
2015-10-08 17:06:03 +02:00
Geoff McLane 7e69ceb3d1 Issue #281 - only warn BAD_CDATA_CONTENT if inserting an escape. 2015-10-07 16:17:42 +02:00
Geoff McLane b63c1090c2 option to avoid incrementing nested comtainers.
This is in the GetCDATA function. If the container is script or style and
this option is on, avoid bumping nested.

This addresses issues #65 (1642186) and #280.

All attempts at parsing script data are now abandoned as a bad direction.
2015-10-07 15:11:25 +02:00
Geoff McLane b4efe7464a small enhancement of debug only code 2015-10-05 15:08:20 +02:00
Geoff McLane 6c1a2acea2 #273 - avoid xhtml doctype flip/flop 2015-09-27 17:36:57 +02:00
Christopher Brannon 94b0647c08 Issue #65, fix for ignoring cdata. 2015-09-24 18:13:57 -07:00
Geoff McLane 04ca419080 Issue #64 - Try hard to skip '<![CDATA[ ... ]]>' 2015-09-24 14:21:55 +02:00
Geoff McLane 96589c6f57 #65 Skip esc'd esc, and only for script containers 2015-09-21 12:33:53 +02:00
Geoff McLane eda37c5adb Issue #65 - avoid new quotes if in quotes 2015-09-19 14:58:42 +02:00
Geoff McLane d541405a2a Eventually complete a 2007 fix 2015-09-16 13:17:50 +02:00
Geoff McLane 9960f7c6dd Protext agains a NULL node in the Debug only code 2015-09-12 13:06:14 +02:00
Geoff McLane 66e288a8e2 Issue #239 - no warn for apos enitity in html5++ mode 2015-08-22 14:03:02 +02:00
Geoff McLane e79137de7f Issue #238 - only except the pre element 2015-08-22 14:00:18 +02:00
Geoff McLane 4246c2c462 Issue #230: Need to KEEP this newline char sometimes.
This is a case where the lexer, in GetTokenfromStream, does NOT eat any
trailing newline after a LEX_STARTTAG: case...

So far have identified pre, script, style as NEEDING this user newline
character for later pprint output. Any others?
2015-07-15 19:41:02 +02:00
Geoff McLane 3a524f1710 Issue #207 - deal with 2 cases of an unambiguous ampersand.
html5 allows a naked ampersand unquoted, and now tidy will not issue a
warning. This only deals with a & b, and P&<li>O</li>

More may need to be done for other cases.
2015-06-24 13:10:27 +02:00
Geoff McLane c18f27a587 Issue #217 - avoid len going negative, ever... 2015-06-03 20:26:03 +02:00
Geoff McLane c1a3100cb9 add conveninet break point based on row and column 2015-05-12 17:13:23 +02:00
Geoff McLane f5eb2cf26a Issue #196 - expand comment and bump version.
Thanks to @willydee for this PR.
2015-04-11 15:25:07 +02:00
Geoff McLane fd7b4f8589 just some more DEBUG on text nodes 2015-03-06 19:28:52 +01:00
Geoff McLane c0cad3aeba Issue #167 - further fixes for HTML5 mode 2015-03-06 19:13:06 +01:00
Geoff McLane 0dc68d6cb1 Issue #167 & #169 - default to HTML5 mode.
Revert TidyTag_A to HTML5 mode, but allow the table to be modified if the
DOCTYPE given is found to NOT be HTML5, through a service TY_(AdjustTags).
Care is taken to clear any previous hash cached tags.

At present this only effects the anchor tag, but could be applied to
others that need to change their parsing due to an identified DOCTYPE.
2015-03-06 12:55:24 +01:00
Geoff McLane 0aa81eb256 Issue #130 - MathML attr and entity fix!
This is a set of kludgy fixes for MathML attribute and entities support.

It is intended that a full HTML5 entity table be added at some time, but
at present ALL entities are accepted as written when within the math
element.

Likewise all attributes are accepted on MathML elements without any check
of their name or value, even if they match attributes outside MathML.

And in the pprinter such entities are written as is from the lexer, using
a new PPrintMathML service added, using the new mode OtherNameSpace.

It is hoped all these fixes will NOT effect tidy outside the math element.

ALL fixes in the set a clearly marked '#130 - MathML attr and entity fix!'
for easy searching, and improving if possible.
2015-02-22 18:58:55 +01:00
Geoff McLane 2172a498f6 Issue #153 - fix for endif section no conforming to what tidy expects 2015-02-05 19:01:34 +01:00
Geoff McLane 66951a562a add row/col to DEBUG output 2015-02-05 18:24:02 +01:00
Geoff McLane 1be5ccbb63 Issue #130 - initial MathML support 2015-02-05 12:21:08 +01:00
Geoff McLane 59974e0bb0 When adding meta element for Tidy prefer library version over date 2015-01-28 12:15:27 +01:00
Geoff McLane ec4d4cd1f1 Issue #92 - OLD problem of ins and del
These are marked as CM_INLINE, but also CM_BLOCK,
so should not be stacked for insertion
2015-01-28 11:50:06 +01:00
Jim Derry edb185a308 Use a hash table for anchors #64 2014-11-22 19:39:06 +08:00
Jim Derry 6aaf826476 Restart with geoffmcl's fork 2014-11-22 15:42:28 +08:00
Peter Kelly 7fc3255542 Applied hash table optimisation to RemoveAnchorByNode. This function now takes
the anchor name as a parameter, so it can look in the correct bin.

In the case of FreeAttrs, we have the name already (since we found a name or
id attribute). In the case of FixAnchors, the anchor name could come from
either the name or id attribute, so we call the function separately for each
case, passing the appropriate attribute value.
2012-08-20 10:06:30 +07:00
Craig Barnes ce27a729dc Remove CVS info blocks 2012-08-08 17:27:29 +01:00
Ryan Kistner 8025154e30 Check for NULL pointer before calling strcasecmp in GetVersFromFPI 2012-04-23 11:51:58 +09:00
Michael[tm] Smith 3a9a794d8b Minor formatting edit. 2012-03-15 14:12:41 +09:00
Michael[tm] Smith 0c8b587067 Added --doctype=html5 option value. Fixes #17. 2012-03-15 14:11:01 +09:00
Michael[tm] Smith 879e6cf909 Put "experimental" in the meta@generator output too. 2012-03-14 20:05:12 +09:00
Michael[tm] Smith c331917c31 Make the doctype handling work the way it should. 2012-03-14 19:38:18 +09:00
Michael[tm] Smith 1c4d43ad2a Deal with version reporting better. 2012-03-01 17:22:03 +09:00
Michael[tm] Smith 73834b8412 Correct meta@name=generator output. 2012-02-10 15:40:33 +09:00
Michael[tm] Smith b26db41c86 Do not mess with <!doctype html>. Fixes #2. 2012-02-10 15:33:21 +09:00
Michael[tm] Smith 264c9bc043 HTML IDs can contain anything except whitespace.
Introduced TY_(IsHTMLSpace)(uint c), which checks to see if c is one of the
chars that the HTML spec (and browsers) treat as a space in attribute
values: 0x020 (space), 0x009 (tab), 0x00a (LF), 0x00c (FF), or 0x00d (CF).
Can't use ANSI C isspace(int c) here because like standard functions for
many other langs, it also treats 0x00b as a space.
2012-01-02 16:12:51 +09:00
Michael[tm] Smith b92d7aab88 new 2011-11-17 11:44:16 +09:00