Commit graph

558 commits

Author SHA1 Message Date
Jim Derry 11178d775b Massive Revamp of the Messaging System
This is a rather large refactoring of Tidy's messaging system. This was done
mostly to allow non-C libraries that cannot adequately take advantage of
arg_lists a chance to query report filter information for information related
to arguments used in constructing an error message.

Three main goals were in mind for this project:

- Don't change the contents of Tidy's existing output sinks. This will ensure
  that changes do no affect console Tidy users, or LibTidy users who use the
  output sinks directly. This was accomplished 100% other than some improved
  cosmetics in the output. See tidy-html5-tests repository, the `refactor` and
  `more_messages_changes` branches for these minor diffs.
- Provide an API that is simple and also extensible without having to write new
  error filters all the time. This was accomplished by adding the new message
  callback `TidyMessageCallback` that provides callback functions an opaque
  object representing the message, and an API to query the message for wanted
  details. With this, we should never have to add a new callback routine again,
  as additional API can simply be written against the opaque object.
- The API should work the same as the rest of LibTidy's API in that it's
  consistent and only uses simple types with wide interoperability with other
  languages. Thanks to @gagern who suggested the model for the API in #409.
  Although the API uses the "Tidy" way off accessing data via an iterator
  rather than an index, this can be easily abstracted in the target language.

There are two *major* API breaking changes:

- Removed TidyReportFilter2
  - This was only used by one application in the entire world, and was a hacky
    kludge that served its purpose. TidyReportCallback (né TidyReportFilter3)
    is much better. If, for some reason, this affects you, I recommend using
    TidyReportCallback instead. It's a minor change for your application.
- Renamed TidyReportFilter3 to TidyReportCallback
  - This name is much more semantic, and much more sensible in light of
    improved callback system. As the name implies, it remains capable of
    *only* receiving callbacks for Tidy "reports."

Introducing TidyMessageCallback, and a new message interrogation API.

- As its name implies, it is able to capture (and optionally suppress) *all*
  of Tidy's output, including the dialogue messages that never make it to
  the existing report filters.
- Provides an opaque `TidyMessage` and an API that can be used to query against
  it to find the juicy goodness inside.
  - For example, `tidyGetMessageOutput( tmessage )` will return the complete,
    localized message.
  - Another example, `tidyGetMessageLine( tmessage )` will return the line the
    message applies to.
- You can also get information about the individual arguments that make up a
  message. By using the `tidyGetMessageArguments( tmessage )` itorator and
  `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument`
  which has its own interrogation API. For example:
    - tidyGetArgType( tmessage, &iterator );
    - tidyGetArgFormat( tmessage, &iterator );
    - tidyGetArgValueString( tmessage, &iterator );
    - …and so on.

Other major changes include refactoring `messages.c` to use the new message
"object" directly when emitting messages to the console or output sinks. This
allowed replacement of a lot of specialized functions with generalized ones.

Some of this generalizing involved modifications to the `language_xx.h` header
files, and these are all positive improvements even without the above changes.
2017-03-13 13:28:57 -04:00
Jim Derry 4dc8a2cf9a Bump version to 5.5.5 for this fiasco, and fix poor planning and unfortunate
merge.
  - Sort all of the existing options and re-indent per Tidy standards. This is
    simply for cosmetic effect.
  - Allow the iterator to return all options again, even "internal" options.
    Things are too embedded with N_TIDY_OPTIONS, etc., to try to hide them.
  - Instead, simply add documentation to LibTidy users that they shouldn't use
    internal options.
  - Also added `TidyInternalCategory` to `TidyConfigCategory` without adding a
    new field to the struct. API users should check for this category before
    use.
  - Defined a two character macro for `TidyInternalCategory` for use in
    `option_defs[]`.
  - Changed struct `option_defs[]` to reflect the new category for affected
    options.
  - Removed string indicating * refers to internal options, since it no longer
    applies.
  - Regen'd all strings for previous point.
  - `tidy.c` now checks for `TidyInternalCategory` everywhere in order to
    suppress output.
2017-03-10 09:13:21 -05:00
Jim Derry ac242e9ea4 hotfix 2017-03-09 19:56:16 -05:00
Jim Derry e27cc262fe Bring the local vars into the context, which is allowed in C89. 2017-03-09 12:44:48 -05:00
Jim Derry 005127c733 Address issue #472. 2017-03-08 15:37:01 -05:00
Jim Derry 978756a482 Restore the previous status of gnu-emacs-file
- Updated strings files to match.
- Inhibit internal options from being output via the iterator. Internals should
  never have the chance to be exposed if they shouldn't be use.
- Added tidySetEmacsFile() and TidyGetEmacsFile() to the public API, and use it
  instead of secret API to set the filename in the console application.

The end result is that `gnu-emacs-file` (and also `doctype-mode`) officially no
longer exist to CLI users nor to API users, and tidy console behaves properly
by using a published API to set the filename for emacs.
2017-03-07 20:11:31 -05:00
Jim Derry 03f0192f51 How did this get back in there??? 2017-03-04 15:31:25 -05:00
Jim Derry 74a4fa4049 Merge branch 'next' into clean_deprecations 2017-03-02 11:40:14 -05:00
Jim Derry 3be515b1f9 Merge branch 'next' into messages_squashed 2017-03-02 09:34:58 -05:00
Jim Derry 92621d6f99 MSVC Compatibility
- Changed location of pointer operator in declarations.
  - Updated `CODESTYLE.md` to reflect this.
  - Updated `API_AND_NAMESPACE.md` to reflect this.
2017-03-02 09:32:02 -05:00
Geoff McLane a49890ee55 Issue #498 - parser.c - if a <table> in a <table> just close.
The previous action was to discard the second, while it is the second
table that browsers will render.

This conforms to the principle that the html output by tidy should render
in a browser like the original html.
2017-02-24 16:20:10 +01:00
Geoff McLane c4b5904e1c Issue #497 - lexer.c - Add comment for this PR @seaburg 2017-02-24 14:38:20 +01:00
Geoff McLane e44f4d1469 Merge pull request #497 from seaburg/fix_value_trimming
Fix leading white spaces trimming
2017-02-24 14:30:39 +01:00
Geoff McLane 27fe0548b9 Issue #468 - config.c - use RAW encoding for all cases 2017-02-23 16:28:19 +01:00
Geoff McLane 569ae4b435 Issue #329 - lexer.c - do not discard this newline here 2017-02-23 15:27:03 +01:00
Evgeniy Yurtaev bb1d62d3bd Fix leading white spaces trimming 2017-02-22 14:34:40 +03:00
Jim Derry c54c10f857 - Removed deprecated options:
- TidySlideStyle
  - TidyBurstSlides

- Added documentation for TidyEmacsFile, since it's a valid option.

- Because TidyEmacsFile is a valid option, tweaked tidy.c so that it can
  be specified in a configuration file without being overwritten by the console
  app. Why a user might do this is dumb, but who are we to stop them.
2017-02-18 18:30:41 -05:00
Jim Derry edc548095c Removed language as tidy config option; it is only CLI option. 2017-02-18 17:16:35 -05:00
Jim Derry cbb8354f74 Combined leftover attribute API stuff into single, new file. 2017-02-18 16:57:11 -05:00
Jim Derry f6ce4d130e Removed deprecated tidyAttrGetSOMETHING from API. 2017-02-18 16:46:20 -05:00
Jim Derry 13c6387f47 Removed deprecated AttributeIsSOMETHING from API. 2017-02-18 16:43:47 -05:00
Jim Derry a16f36ce53 Removed deprecated NodeIsElementName from API. 2017-02-18 16:33:21 -05:00
Jim Derry 165acc4f3e Several foundational changes preparing for release of 5.4 and future 5.5:
- Consolidated all output string definitions enums into `tidyenum.h`, which
    is where they belong, and where they have proper visibility.
  - Re-arranged `messages.c/h` with several comments useful to developers.
  - Properly added the key lookup functions and the language localization
    functions into tidy.h/tidylib.c with proper name-spacing.
  - Previous point restored a *lot* of sanity to the #include pollution that's
    been introduced in light of these.
  - Note that opaque types have been (properly) introduced. Look at the updated
    headers for `language.h`. In particular only an opaque structure is passed
    outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName
    to poll these objects.
  - Console application updated as a result of this.
  - Removed dead code:
    - void TY_(UnknownOption)( TidyDocImpl* doc, char c );
    - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file );
  - Redundant strings were removed with the removal of this dead code.
  - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER
    depend on enum values. `TidyReportLevel` is an example of such.
  - Some enums were removed as a result of this. `TidyReportLevel` now has
    matching strings, so the redundant `TidyReportLevelStrings` was removed.
  - All of the PO's and language header files were regenerated as a result of
    the string cleanup and header cleanup.
  - Made the interface to the library version and release date consistent.
  - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to
    be able to remove console-only code from LibTidy (for LibTidy users).
  - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-17 15:29:26 -05:00
Jim Derry e1f066fe14 Merge branch 'empretty_script' 2017-02-13 08:49:13 -05:00
Jim Derry b7c84b1b57 Merge branch 'surrogates' 2017-02-13 08:49:06 -05:00
Geoff McLane ea49ca0b1d Fix license for SPRTF modules.
Also correct the coding style to conform to HTML Tidy standard.
2017-02-12 17:38:44 +01:00
Geoff McLane 7f73d4f429 Issue #483 - Add ReportSurrogateError() service and connect. 2017-02-11 18:33:45 +01:00
Geoff McLane 75bc1f06c7 More updates for Issue #483 - Start warning msgs - WIP 2017-02-09 20:55:23 +01:00
Jim Derry 1ac50fccb3 Pretty up output of empty script tags.
- No longer break script tags up on two lines if there is content. However
    output is still subject to the `--wrap` behavior.
  - Previous behavior intact if there is content.

Todo.

  - Associate this with a new Tidy option.
2017-02-08 13:53:37 -05:00
Geoff McLane 9dc76c1e77 Issue #483 - Some fixes for error condition 2017-02-02 16:43:10 +01:00
Geoff McLane 259d330780 Issue #483 - First cut dealing with 'surrogate pairs'.
Only deals with a successful case.

TODO: Maybe add a warning/error if the trailing surrogate not found, and
maybe consider substituting to avoid invalid utf-8 output.
2017-02-01 13:50:33 +01:00
Geoff McLane deebc93f97 Merge pull request #480 from onnimonni/feature-fix-xmlns-xlink
Add optional xmlns:xlink attributes as valid to support inline svg
2017-01-29 19:17:43 +01:00
Onni Hakala da27b5e339
Add optional xmlns:xlink attributes as valid to support inline svg
fixes #478
2017-01-09 01:38:16 +02:00
Marcos Caceres 91da8c6f74 style: ansi conforming comments 2016-12-20 16:51:09 +11:00
Geoff McLane fd0ccb2bbf Bad, repeated node iteration! closes #459 2016-10-30 23:37:31 +01:00
Marcos Caceres aff76bec38 fix(lexer.c): fixes from initial review 2016-10-17 17:00:58 +11:00
Marcos Caceres 523d58b004 refactor: ask for charset and http_equiv attrs 2016-10-06 19:30:23 +11:00
Marcos Caceres 932cc104a6 feat(attrask.c): learn about charset attr 2016-10-06 19:29:56 +11:00
Marcos Caceres 53ee94ddba fix: incorrect check for first element in head 2016-10-06 19:07:44 +11:00
Marcos Caceres b1629c4a4f fix(lexer): bad attribute reporting 2016-10-05 20:22:19 +11:00
Marcos Caceres 2d7ddfef94 Part 2.1 - Bug fixes and warning 2016-10-05 20:14:18 +11:00
Marcos Caceres cfc22ac46e Add garvankeeley's suggestions using calloc 2016-10-05 18:54:25 +11:00
Marcos Caceres 040c22c6dc Part 2 - Implement lexer logic 2016-10-04 21:23:57 +11:00
Marcos Caceres 169bd38adf Part 1 - Add basic infra for 'add-meta-charset' option 2016-10-04 17:56:29 +11:00
Geoff McLane d81a9ad901 Merge branch 'issue-428'
Conflicts:
	version.txt

This closes #428
2016-09-11 16:57:07 +02:00
Marcos Caceres e4ae9c064d Add support for link 'as' attribute (closes #449) 2016-08-23 18:46:04 +10:00
Geoff McLane 80e57b23bf Merge branch 'master' into issue-428
Conflicts:
	version.txt
2016-08-09 00:46:40 +02:00
Geoff McLane 7631f25ed2 rebase issue-428 2016-08-02 18:10:19 +02:00
Adam Majer 50557a4f63 Fix static buffer overrrun (issue #443)
result[6] is a fixed array of size 6, but in the process
of copying data into it, we clobber the last allocated byte.

Simplify some of the code by not calling redundant functions.
2016-08-02 11:10:45 +02:00
Benjamin Esham 54179386be Add support for the "integrity" attribute
This attribute may be used on "link" and "script" elements. See
http://www.w3.org/TR/2016/REC-SRI-20160623/#element-interface-extensions
2016-07-24 10:24:30 -04:00
Michal Čihař 10281040ca Avoid crash in tidyCleanAndRepair if document was not loaded
These services can only be used when there is a document loaded, ie a
lexer created.  But really should not be calling a Clean and Repair
service with no doc!
2016-07-07 16:38:05 +02:00
Geoff McLane 685f7a6c5b Issue #428 - Avoid adding form to input if html5 2016-07-02 20:13:01 +02:00
Geoff McLane 7bec2c2082 Merge pull request #422 from sesom42/master
prevent buffer overflow in debug output
2016-06-30 18:32:55 +02:00
Geoff McLane 97700044ce Merge pull request #410 from gagern/varargs
Pair va_copy calls with va_end
2016-06-18 18:53:53 +02:00
Jens Tautenhahn 84fc451a78 prevent buffer overflow in debug output 2016-06-14 15:42:18 +02:00
Benjamin Esham 941b763a8d Add support for "crossorigin" on audio too 2016-06-08 19:40:15 -04:00
Benjamin Esham d9d8e92e52 Allow "crossorigin" on img, script, and video tags too 2016-06-07 22:29:57 -04:00
Benjamin Esham 9377f65f89 Add support for the HTML5 "crossorigin" attribute
This attribute can only be used on "link" elements.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link#Attributes
2016-06-07 22:20:10 -04:00
Martin von Gagern 04bc8d3195 Pair va_copy calls with va_end
According to the specs, each va_copy call should be matched by a va_end call
to ensure proper cleanup.  Furthermore, since message filters might iterate
over the list of arguments, we should hand a new copy to each filter.
2016-05-17 22:37:32 +02:00
Raphael Ackermann b704a4d0d4 allow zero LI in UL when html5. fix for #396 2016-04-08 23:08:56 +02:00
Geoff McLane 61a0a331fc Issue #390 - fix indent with --hide-endtags yes.
The problem was, with --hide-endtags yes, a conditional pprint buffer
flush had nothing to flush, thus the indent was not adjusted.

To track down this bug added a lot of MSVC Debug code, but is only
existing if some additional items defined, so has no effect on the release
code.

This, what feels like a good fix, was first reported about 12 years ago by
@OlafvdSpek in SF Bugs 563. Hopefully finally closed.
2016-04-04 18:13:08 +02:00
Geoff McLane 7598fdfff2 avoid DEBUG duplicate newline 2016-04-03 17:54:46 +02:00
Geoff McLane 7777a71913 Issue #369 - Remove Debug asserts 2016-03-31 14:50:03 +02:00
Geoff rpi McLane 086e4c948c remove gcc comment warning 2016-03-30 15:02:19 +00:00
Geoff McLane 59d6fc7022 Issue #377 - If version XHTML5 available, return that. 2016-03-30 16:28:08 +02:00
Geoff McLane 1830fdb97c Issue #384 - insert comments 2016-03-30 14:18:04 +02:00
Geoff McLane 4b135d9b47 Merge pull request #384 from seaburg/master
Fix skipping parsing character
2016-03-30 14:08:40 +02:00
Geoff McLane e87f26c247 Merge pull request #388 from htacg/fr.po
Merge fr.po to master
2016-03-27 19:54:54 +02:00
Jim Derry 7d2ddee775 Add new rebase command to CLI.
This is intended to make it very, very easy to update the POT and all of the POs when
changes are made to `language_en.h`. Used without an sha-1 hash, untranslated strings
(i.e., the "source" strings) are updated in the POT/PO's.

However if you specify an --sha=HASH (or -c HASH) option, then the script will use git
to examine the `language_en.h` file from that specified commit, determing the strings
that have changed, and mark all of these strings as `fuzzy` in the POs. This will serve
as a flag to translators that the original has changed. In addition, this `fuzzy` flag
will appear in the headers as "(fuzzy) " in the item comments.

If a translator edits the header directly, he should remove the "(fuzzy )" in the
comment. Then when the PO is rebuilt, the fuzzy flag will be removed automatically.
The reverse is also true; if a translator is working with the PO, he or she should
clear the fuzzy flag and the comment will be adjusted accordingly in the generated
header.
2016-03-25 09:21:21 +08:00
Geoff McLane 8671544beb Issue #383 - Add a WIP language_fr.h to facilitate testing 2016-03-24 14:15:43 +01:00
Geoff McLane 5feca8cfd6 Issue #383 - correct another byte-by-byte output to message file.
As in the previous case these messages are already valid utf-8 text, and
thus, if output on a byte-by-byte basis, must not use WriteChar, except
for the EOL char.

Of course this output can be to either a user ouput file, if configured,
otherwise stderr.
2016-03-24 14:15:43 +01:00
Jim Derry ad7bdee3b9 Added translator comments to new TidyEscapeScripts option, and updated POT and POs to reflect this. 2016-03-24 11:00:47 +08:00
Jim Derry 71d6ca1392 Oops. Didn't commit es changes. This fixes that. 2016-03-23 15:10:07 +08:00
Jim Derry d54785c933 language help enhancements:
- Show the language Tidy is using.
- Update the POT and POs with the modified string.
- Regen language_es.h, which uses the string.

Note that the new header uses the new commentless behavior that's still
pending in another branch. In addition the proper c style hints have
been added to all PO's, as their previous absense was a bug.
2016-03-23 14:56:36 +08:00
Jim Derry 2cf03f7fa9 Fix two character lang codes not working. 2016-03-23 14:38:17 +08:00
Geoff McLane 000c6925bd Issue #348 - Add option 'escape-script', def = yes 2016-03-20 01:01:46 +01:00
Geoff McLane e6f1533d89 Issue #383 - Output message file text byte-by-byte 2016-03-18 18:47:00 +01:00
Evgeniy Yurtaev 7d28b21e60 Fix skipping parsing character 2016-03-17 23:30:11 +03:00
Geoff McLane 8dda04f1df Issue #379 - Care about 'ix' going negative.
How this lasted so long in the code is a mystery! But of course it will
only be a read out-of-bounds if testing the first character in the lexer,
and it is a spacey char.

A big thanks to @gaa-cifasis for running ASAN tests on Tidy.
2016-03-06 17:36:51 +01:00
Geoff McLane 8eee85cb9e Issue #380 - Experimental patch in issue-380 branch 2016-03-05 17:39:14 +01:00
Geoff McLane 0e6ed639d6 Issue #380 - Add more MSVC debug 2016-03-04 19:28:49 +01:00
Geoff McLane d091027089 Issue #377 add debug only output of constrained versions 2016-03-03 20:21:35 +01:00
Geoff McLane 7bdc31af76 Issue #377 - Table summary attribute also applies to XHTML5 2016-02-29 19:58:55 +01:00
Geoff McLane 24c62cf0df Issue #314 - Avoid head warning if show-body-only 2016-02-29 18:49:15 +01:00
Geoff McLane 23e689d145 Issue #373 - Merge branch 'issue-373' of github.com:htacg/tidy-html5 into issue-373
Conflicts: version.txt - set version 5.1.41issue-373
2016-02-18 15:18:39 +01:00
Geoff McLane 8c13d270ed Merge branch 'master' of github.com:htacg/tidy-html5 2016-02-18 13:58:23 +01:00
Geoff McLane b91d52592b Fix to K&R C to compile with MSVC 2016-02-18 13:57:47 +01:00
Jim Derry 63c0327de1 Fixed typo in output strings. 2016-02-18 15:40:10 +08:00
Jim Derry e00f419f5d Discovered some missing strings from tidyErrorFilterKeysStruct. 2016-02-18 10:19:57 +08:00
Jim Derry da8205b2dc Regen'd POT, POs, and headers in order to capture documentation changes in all of them. 2016-02-17 20:07:00 +08:00
Jim Derry 7fbe76be0b Finished semantic html. 2016-02-17 20:02:38 +08:00
Jim Derry a78daccd3c Through TidyIndentSpaces. 2016-02-17 17:43:09 +08:00
Jim Derry a16e89c4f8 Updated translator comments. 2016-02-17 17:27:57 +08:00
Jim Derry d30c2d7747 XSL for man handles <var>. Updated comment and sample string. 2016-02-17 17:20:02 +08:00
Jim Derry cc59efb23d Add a xml-error-strings service to console app providing symbols developers can use with TidyErrorFilter3. 2016-02-17 12:35:20 +08:00
Jim Derry bc1e54d5b5 Externalize the TidyReportFilter3 error codes, and provide iterators to loop through them. 2016-02-17 12:27:11 +08:00
Jim Derry 720d5c25d2 Squelch compiler warning default type. 2016-02-17 10:56:21 +08:00
Jim Derry 97abad0c05 Bump to 5.1.39 for merging.
Merge branch 'master' into attrdict_phase2
2016-02-16 11:11:36 +08:00
Jim Derry 3431dd05a4 Merge branch 'master' into attrdict_phase1
Bump version to 5.1.38
2016-02-16 11:07:32 +08:00
Jim Derry 1e4f7dd0f1 Merge pull request #368 from htacg/issue-341
Issue #341
2016-02-16 10:18:26 +08:00
Geoff McLane 9cf97d536b Issue #373 - Avoid a null added to output.
This bug was first openned in 2009 by Christophe Chenon, as bug sf905 but
the patch provided then never made it into the source.

Now appears fixed, 7 years later!
2016-02-15 13:02:10 +01:00
Geoff McLane a4f425546f Improve MSVC DEBUG output.
Previous only output the first 8 characters, followed by an elipse if more
than 8. Now return first up to 19 chars. If nore than 19, return first 8,
followed by an elipse, followed by the last 8 characters.

This is in the get_text_string service, which is only used if MSVC and not
NDEBUG.
2016-02-14 18:17:46 +01:00
Jim Derry c62127b9bd Default to NO at this point. 2016-02-13 12:33:02 +08:00
Jim Derry 8b5771cf24 Word2000
Added messages that would otherwise be missed in post-processing, after cleanup.
2016-02-13 12:26:19 +08:00
Jim Derry 2cdedb4a63 Forgot one file... 2016-02-13 11:53:53 +08:00
Jim Derry 896b00238b Forgot one file... 2016-02-13 11:53:40 +08:00
Jim Derry 2ade3357a9 Phase 2
This is a MUCH SANER approach to what I was trying to do (now that I screwed up enough internals to understand some of them!
At this point there are zero exit state reversions, and zero markup reversions! There are still 21 errout reversions; I'll
annotate and adjust as necessary.
2016-02-13 11:31:16 +08:00
Jim Derry e947d296e4 Handle some issues with misusing VERS_HTML5 in the doctype. 2016-02-12 20:49:14 +08:00
Jim Derry c81a151da5 Add VERS_STRICT to identify future strict document types. 2016-02-12 20:46:49 +08:00
Jim Derry 74604fd52b Hard-coded checks are redundant with updates to attrdict.c. 2016-02-12 20:44:03 +08:00
Jim Derry 429703dce4 Because the previous effort #350 grew too fast and there was a LOT of side effects to
my changes, I'm starting over with this. Comments in the PR thread.

This commit reduces the size of attrdict.c while causing only a single errout
regression that is justified.
2016-02-12 19:34:19 +08:00
Geoff McLane 03a643f781 Issue #341 - No token can be inserted if istacksize == 0! 2016-02-08 15:12:23 +01:00
Geoff McLane 7d0d8a853a Issue #345 - discard leading spaces in href 2016-02-01 20:07:55 +01:00
Geoff McLane 7f0d5c31e6 If no doctype, allow user doctype to reset table - Issue #342 2016-02-01 19:44:30 +01:00
Geoff McLane c1f94c066c Tidy up some debug only code.
After @sria91 added #360 merge, added a little more improvement...
2016-01-30 20:51:27 +01:00
Srikanth Anantharam 9a0af48a4e fixed a NULL node bug in debug build 2016-01-30 22:03:52 +05:30
Jim Derry 9ae15f45a7 Consistent tabs
Fixed tabs in template file, and regen'd all related files.
2016-01-30 15:51:54 +08:00
Jim Derry 53f2a2da2a msgunfmt works properly with escaped hex. 2016-01-30 15:51:53 +08:00
Martin von Gagern 17e50f2642 Encode UTF-8 strings to hex escapes in header files 2016-01-30 15:51:53 +08:00
Jim Derry bf70824cc2 - Add TidyReportFilter3, which removes translation strings completely from the equation. It would be a good idea to deprecate TidyReportFilter2, which is vulnerable to changing strings in Tidy source.
- Documentation reminders for future enum changes.
- Documentation updates.
2016-01-30 15:51:53 +08:00
Jim Derry d505869910 Localization Support added to HTML Tidy
- Languages can now be added to Tidy using standard toolchains.
- Tidy's help output is improved with new options and some reorganization.
2016-01-30 15:51:53 +08:00
Jim Derry 26e7d9d4b0 Fixes Mac OS X encoding issues and harmonizes output across platforms.
Previously Tidy produced different output based on the compilation target, NOT based on
the file encoding and specified options. Every platform was equal except Mac OS. Now unless
the encoding is specifically set to a Mac file type, all encoding assumptions are the same
across platforms.
2015-12-31 13:57:34 +08:00
Geoff McLane 78f2d52cdd Issue #308 - remove bad warn, bad assert, and free discarded 2015-12-05 15:03:41 +01:00
Geoff McLane 9caecb80cf Revert "Fix for head closing tag not reported (#327)"
This reverts commit 61cfcb1555.

This added an inconsistent warning about a missing optional close tag. In
general tidy does not report such optional close tags. See issue #327 for
some discussion on this.
2015-12-05 12:59:43 +01:00
Geoff McLane 3b13cd8076 Merge branch 'mingw-build' 2015-12-03 19:18:07 +01:00
Jim Derry 61cfcb1555 Fix for head closing tag not reported (#327) 2015-11-29 13:21:49 +08:00
Jim Derry 873794162a Callback added to XML printer, too; fixed off-by-one error. 2015-11-29 07:39:33 +08:00
Geoff McLane dc969f30d5 Issue #311 - small changes for MinGW32 build 2015-11-28 15:14:53 +01:00
Jim Derry 4adc07fd65 Removed the one callback per line filter. Library user can filter this himself. 2015-11-28 15:43:34 +08:00
Jim Derry dcd8f16f73 Tidying progress callback implemented. 2015-11-28 15:34:23 +08:00
Jim Derry 34d456aa80 Make pretty printer keep track of line numbers as it prints. 2015-11-28 14:16:17 +08:00
Jim Derry 9834cc17ad Style cleanup for previous commit. 2015-11-27 09:45:26 +08:00
Jim Derry 1c963acb58 Merge branch 'master' into fix_img_alt 2015-11-27 09:36:32 +08:00
Jim Derry 933fc3d236 - Addresses #320
- Different error output depending on whether or not the `alt-text` option was given a value.
2015-11-26 13:23:43 +08:00
Jim Derry 63234735d8 Allows null value css-prefix to be used in a config file without issuing a warning. 2015-11-26 11:21:48 +08:00
Ben Bullock 71d9638448 Don't push back non-A tokens. 2015-11-25 18:00:45 +09:00
Christopher Brannon 1ef5ba7968 Fix a tiny buffer overflow. 2015-11-23 12:28:00 -08:00
Geoff McLane b58aa1c26a Issue #307 - add a ref link in comments 2015-11-22 20:43:12 +01:00
Geoff McLane 2388fb0175 Issue #307, #167, #169 - regression of nestd anchors 2015-11-22 18:46:00 +01:00
Geoff McLane bbc72a9297 Issue #306 - fix an old typo hidden by a cast!
Thanks to @benkasminbullock for spotting this fix.
2015-11-18 20:01:21 +01:00
Geoff McLane e2feed485c gcc warning - if 0 an unused static table 2015-11-18 17:06:13 +01:00
Geoff R. McLane b98061ff62 fix gcc warning parentheses in pprint.c 2015-11-18 16:47:58 +01:00
Geoff McLane 768ad46968 Issue #304 - remove duplicated TidyAttr_ARIA_ORIENTATION 2015-11-17 15:06:23 +01:00
Shane McCarron c0b769c5c7 Initial cut at RDFa support (again)
New branch that implements support for RDFa attributes.  Should be
cleaner than my first attempt in PR #299 - also references issue #209
2015-11-16 11:29:23 -06:00
Paul Howarth baad0b0064 Don't mangle the output filename
Attached patch works for me, and shouldn't affect any other option
processing.
2015-11-11 11:28:47 +01:00
Geoff McLane c68ad42482 Revert 22a1922c35 2015-11-07 14:50:10 +01:00
Shane McCarron c572e3e3c8 Initial cut at supporting RDFa attributes. 2015-11-06 12:19:05 -06:00
Geoff McLane 800b91e576 Issue #65 - effect name change to skip-nested, and default to on 2015-11-05 15:19:39 +01:00
Jim Derry 32ce272f75 Fix indent-with-tabs for library use. 2015-11-04 12:44:15 +08:00
Jim Derry dec6356a6f Deleted multiple equal id attributes. 2015-11-02 15:31:47 +08:00
Jim Derry d0ac990636 More description beautification. 2015-11-02 12:06:37 +08:00
Jim Derry 807fed4ff6 Documentation improvements. 2015-11-01 19:05:03 +08:00
Jim Derry 2613f02dc5 More documentation beautification. 2015-10-31 22:03:33 +08:00
Jim Derry 565d2ec232 Documentation beautification underway. 2015-10-31 18:30:02 +08:00
Jim Derry cf3c0293c0 Additional tests with our troublesome option. 2015-10-31 14:45:51 +08:00
Jim Derry 8c5fae8c09 - documentation/quickref.xsl
- Includes <p> support
  - Matches the description class name in quickref.include.xsl
  - Styles <br /> to enforce vertical spacing (in the reference table only).
- documentation/style.css
  - Styles <br /> to enforce vertical spacing (in the reference table only).
- documentation/tidy1.xsl.in
  - Includes <p> support.
  - Better manages line breaks with .sp1 instead of .br.
- src/localize.c
  - Legibility to the troublesome `drop-font-tags` description.
2015-10-30 23:58:43 +08:00
Jim Derry 709ac8cb4c Support HTML in descriptions. 2015-10-30 18:17:40 +08:00
Jim Derry 09b0698c56 Typo. 2015-10-30 12:58:11 +08:00
Jim Derry a3138cb142 URL cleanup. 2015-10-30 12:23:20 +08:00
Jim Derry 2d0f971747 Update documentation to address #288. 2015-10-30 10:19:47 +08:00
Geoff McLane c8751f60e7 Issue #286 - use AddByte for internal transfer 2015-10-20 15:04:18 +02:00
Geoff McLane d75c82275d Issue #285 - Add a ResetTags func to erset html5 mode before each document 2015-10-14 16:55:35 +02:00
Geoff McLane adbad0379e Issue #65 - if nonested then no endtag needed to decrement.
This is only if nonested is on, then a <script> tag has not incremented
the nested, so likewise no need to treat an escaped close tag <\/script>
as an end tage to decrement nested.
2015-10-08 17:06:03 +02:00
Geoff McLane 7e69ceb3d1 Issue #281 - only warn BAD_CDATA_CONTENT if inserting an escape. 2015-10-07 16:17:42 +02:00
Geoff McLane b63c1090c2 option to avoid incrementing nested comtainers.
This is in the GetCDATA function. If the container is script or style and
this option is on, avoid bumping nested.

This addresses issues #65 (1642186) and #280.

All attempts at parsing script data are now abandoned as a bad direction.
2015-10-07 15:11:25 +02:00
Geoff McLane b4efe7464a small enhancement of debug only code 2015-10-05 15:08:20 +02:00
Geoff McLane 6c1a2acea2 #273 - avoid xhtml doctype flip/flop 2015-09-27 17:36:57 +02:00
Christopher Brannon 94b0647c08 Issue #65, fix for ignoring cdata. 2015-09-24 18:13:57 -07:00
Geoff McLane 04ca419080 Issue #64 - Try hard to skip '<![CDATA[ ... ]]>' 2015-09-24 14:21:55 +02:00
Geoff McLane 96589c6f57 #65 Skip esc'd esc, and only for script containers 2015-09-21 12:33:53 +02:00
Geoff McLane eda37c5adb Issue #65 - avoid new quotes if in quotes 2015-09-19 14:58:42 +02:00
Geoff McLane d541405a2a Eventually complete a 2007 fix 2015-09-16 13:17:50 +02:00
Geoff McLane 9960f7c6dd Protext agains a NULL node in the Debug only code 2015-09-12 13:06:14 +02:00
Srikanth Anantharam be9f1d4203 using _fileno(fout) instead of fout->_file makes it more portable across different MSVC versions 2015-09-11 00:27:17 +05:30
Geoff McLane c48680cc01 Issue #180 - fix indenting when -omit used 2015-09-10 15:01:48 +02:00
Geoff McLane 66e288a8e2 Issue #239 - no warn for apos enitity in html5++ mode 2015-08-22 14:03:02 +02:00
Geoff McLane e79137de7f Issue #238 - only except the pre element 2015-08-22 14:00:18 +02:00
Geoff McLane 1d67dc940a Merge branch 'Andrew-Dunn-patch-1' into issue-228.
That is reordering windows includes per #234

In general the order of includes should be system <headers>,
then local "headers", except perhaps for the ocassional local
"version" or "config" header...

Resolved conflicts in src/pprint.c by reverting to current master, and in
version.txt by increasing the version.
2015-08-10 18:49:13 +02:00
Andrew Dunn dfdffd0cb3 Reordered Windows Includes
Moved the <windows.h> include above the "streamio.h" include to fix compilation with the latest Windows SDK.

<winnt.h> now has the following struct. In particular the `CR` member of this struct conflicts with a define in streamio.h.

    typedef struct _IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY {
        DWORD BeginAddress;
        union {
            DWORD UnwindData;
            struct {
                DWORD Flag : 2;
                DWORD FunctionLength : 11;
                DWORD RegF : 3;
                DWORD RegI : 4;
                DWORD H : 1;
                DWORD CR : 2; // This line causes a compile error because CR is redefined in streamio.h
                DWORD FrameSize : 9;
            } DUMMYSTRUCTNAME;
        } DUMMYUNIONNAME;
    } IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY, * PIMAGE_ARM64_RUNTIME_FUNCTION_ENTRY;
2015-08-07 17:06:33 +10:00
Geoff McLane cbae924a40 Oops, missed setting 'type' for TidyVertSpace.
This was evidenced by an 'assert' failure, that the type was not an 'int'!

And also in the -xml-help output, thus effecting the tidy.1 manual page
for this new feature --vertical-space auto, which produces almost single
line html output.

This 'fix' began in the issue-228 branch - see Issue #231
2015-07-31 13:39:06 +02:00
Geoff McLane 38ef5bfe85 Issue #232 remove CM_HEAD from 'object' tag 2015-07-30 14:50:15 +02:00
Geoff McLane ae620a63a2 merge @camoy fix #158 to this branch 2015-07-17 19:00:16 +02:00
Geoff McLane d26cd72084 Add macros to get TidyVertSpace config, and implement 2015-07-15 20:58:00 +02:00
Geoff McLane 154a61543b Expand xml TidyVertSpace text to include tri-state 2015-07-15 20:56:22 +02:00
Geoff McLane 16580e0926 Revert TidyVertSpace to 'no', and make AutoBool option 2015-07-15 20:54:50 +02:00
Geoff McLane 4246c2c462 Issue #230: Need to KEEP this newline char sometimes.
This is a case where the lexer, in GetTokenfromStream, does NOT eat any
trailing newline after a LEX_STARTTAG: case...

So far have identified pre, script, style as NEEDING this user newline
character for later pprint output. Any others?
2015-07-15 19:41:02 +02:00
Cameron Moy d50391a984 Fix #158 - remove inserted newlines in pre 2015-07-13 16:31:52 -04:00
Geoff McLane cb2543efac Merge branch 'master' of https://github.com/stencila/tidy-html5 into issue-228 2015-07-13 19:11:30 +02:00
Nokome Bentley 991630e523 Changes default for vertical-space to yes
Makes this more similar (but not the same) as the previous default
behaviour.
2015-07-13 15:56:15 +12:00
Nokome Bentley b6bcf0408c Applies "smart" new lines to start of script like tags 2015-07-13 15:49:07 +12:00
Nokome Bentley f6979787d1 Adds "smart" line flushing functions.
See in-code comments for more details
2015-07-13 15:40:59 +12:00
Folkert van Heusden 784c7d7f79 Added methods for deleteing nodes and/or attributes.
This is useful when e.g. writing an HTML cleaner.
2015-07-12 18:34:35 +00:00
Geoff McLane 1e70fc6f15 Rename two headers. Issues #224 #223 #221
But this seemed a good time to release 5.0.0.RC1...
2015-06-30 20:06:02 +02:00
Geoff McLane 3a524f1710 Issue #207 - deal with 2 cases of an unambiguous ampersand.
html5 allows a naked ampersand unquoted, and now tidy will not issue a
warning. This only deals with a & b, and P&<li>O</li>

More may need to be done for other cases.
2015-06-24 13:10:27 +02:00
Geoff McLane 3aa50740da Issue #215 - only issue warning if NOT HTML5 mode 2015-06-21 19:49:44 +02:00
Geoff McLane e71bda718f Add TIDY_CALL to tidyLibraryVersion func. 2015-06-09 20:04:49 +02:00
Geoff McLane 18880eab55 Issue #218 - Do NOT allocate a 1 byte null String buffer.
This is when setting a String config value through say tidyOptSetValue
using say tidyOptSetValue(tdoc,id,"").

If the length of the new string is zero then do not allocate a 1 byte
buffer, set it to 0, for the option. Any previous buffer has already been
released.

This means API functions like tidyOptSaveSink will not return erroneous
null String values!
2015-06-08 13:52:00 +02:00
Geoff McLane 3f72b6e335 Issue #210 - Add new warning for summary attr in table if HTML5.
This new warning will only be seen if the document remains in HTML5 mode,
where the summary attribute is obsolete. The W3C validator flags this as
an error, and suggests 'Consider describing the structure of the table in
a caption element or in a figure element containing the table; or simplify
the structure of the table so that no description is needed'.

At the same time this patch also restored the old warning if the document
is HTML4--, if the table element lacks a summary attribute. This has been
a tidy warning since the beginning of time, although the W3C validator
does not presently flag this.
2015-06-06 11:20:35 +02:00
Geoff McLane 326f2414fd Issue #212 - Further fix to set MixedContent in some cases.
In certain circumstances a leading space has to be preverved to allow it
to be used to create a text space node to insert before this element to
preserve the view in a browser.

And added a note asking why is ParseTag called with a hardcoded
IgnoreWhitespace when some effort above has set the mode variable to
MixedContent in certain cases, but need to think about this 2nd change.

Also added some MSVC Debug output when this leading text is used to insert
such a created text node before the element just to be reminded of this
special event.
2015-06-04 13:12:05 +02:00
Geoff McLane a278b04a19 Add debug display of text modes.
Note this ONLY effects a MSVC Debug build!
2015-06-04 12:59:02 +02:00
Geoff McLane c18f27a587 Issue #217 - avoid len going negative, ever... 2015-06-03 20:26:03 +02:00
Geoff McLane 0fb7ccdfc6 Add some mem alloc and free debug to chase Issue #217
Such debug is OFF by default, and only added by defining DEBUG_MEMORY. And
is only available for the Debug configuration compiled with MSVC, but this
could be easily extended...
2015-06-03 20:24:41 +02:00
Geoff McLane 944b412fe6 Need extra include if UNICODE is defines 2015-06-02 20:44:00 +02:00
Geoff McLane b8bc88522c small fix for indent-with-tabs to have a default xml value 2015-05-25 16:48:39 +02:00
Denis Denisov 5a28d5f010 5.0.0
htacg/tidy-html5#190
2015-05-24 23:49:00 +03:00
Geoff McLane d923dd7b2d Issue #108 - first cut new option --indent-with-tabs yes. 2015-05-22 16:06:12 +02:00
Geoff McLane 5d5e689f1a For issue #212, retain mixed mode block parsing.
This is particularly for the anchor tag which in html5 mode is parsed in
ParseBlock. That is retain a leading space, in case it needs to be
moved to in front of the block to keep space rendering.
2015-05-13 12:35:06 +02:00
Geoff McLane 963caf0741 add counter for in ParseBlock 2015-05-12 17:14:09 +02:00
Geoff McLane c1a3100cb9 add conveninet break point based on row and column 2015-05-12 17:13:23 +02:00
Geoff McLane b2b9f1d6f2 spelling error noted in exploration of #207 in localize.c 2015-04-26 19:19:55 +02:00
Dmitry Ivanov 9a3f85d44c Support build with MinGW 4.9.1 2015-04-26 13:18:46 +03:00
Geoff McLane 2f6b3d49b6 Merge pull request #202 from aerilon/master
Please pull fix for #198 and #199
2015-04-22 21:24:12 +02:00
Geoff McLane f5eb2cf26a Issue #196 - expand comment and bump version.
Thanks to @willydee for this PR.
2015-04-11 15:25:07 +02:00
willydee 253a7e54c3 Fix for #196: HTML5 allows block elements in <CAPTION> 2015-04-11 15:06:35 +02:00
Arnaud Lacombe c05661df11 Issue #199 - Add support for html5's template tag 2015-04-10 15:50:07 -07:00
Geoff McLane e78c0105d3 Indicated by #191, why show doctype warning if omitted in output 2015-04-08 18:45:31 +02:00
Geoff McLane 5cbd3ee95b From issue #191, saw need to revert to 'master' branch 2015-04-08 17:55:12 +02:00
Geoff McLane 3585d4c31a Issue #186 - Move FreeLexer() to near last 2015-03-19 19:14:27 +01:00
Geoff McLane 79ac8b2554 Issue #185 - Treat elements ids as case-sensitive if in HTML5 mode 2015-03-13 19:47:28 +01:00
Geoff McLane 66a597f5b7 related to issue #180 - remove additional line unless 'classic' 2015-03-10 12:27:29 +01:00
Geoff McLane 9caab688f1 debug - avoid duplicae output if to stdout 2015-03-09 16:12:59 +01:00
Geoff McLane fd7b4f8589 just some more DEBUG on text nodes 2015-03-06 19:28:52 +01:00
Geoff McLane c0cad3aeba Issue #167 - further fixes for HTML5 mode 2015-03-06 19:13:06 +01:00
Geoff McLane 389ce17814 add attr to dbg_show_node 2015-03-06 18:36:01 +01:00
Geoff McLane 0dc68d6cb1 Issue #167 & #169 - default to HTML5 mode.
Revert TidyTag_A to HTML5 mode, but allow the table to be modified if the
DOCTYPE given is found to NOT be HTML5, through a service TY_(AdjustTags).
Care is taken to clear any previous hash cached tags.

At present this only effects the anchor tag, but could be applied to
others that need to change their parsing due to an identified DOCTYPE.
2015-03-06 12:55:24 +01:00
Geoff McLane 606ffebd47 Issue #168 - Fix for access test 5.2.1.2 2015-03-04 19:38:59 +01:00
Geoff McLane 86f626cd67 Issue #167 - revert anchor tag to inline only 2015-02-28 20:30:56 +01:00
Geoff McLane 4b2943edb3 Issue #162 - fix for this while hopefully maintaining #111 fix.
The fix for #111 added an end tag for all StartEnd tags, when outputting
HTML5, but there should be some exceptions to this.

Added a new service, isVoidElement(node) for the void elements. Perhaps
this service could be further optimised.
2015-02-24 17:51:59 +01:00
Geoff McLane cfffe7765f Issue #166 - repeated main element.
With this fix introduced two new services, FindNodeById and
FindNodeWithId. The former does a total tree search for a TidyTagId.

Maybe there is a way to optimise this search...

Also change the uint badForm from an on/off to a bit field, so could be
extended to other document format errors.
2015-02-24 15:04:19 +01:00
Geoff McLane a5629443e6 Just improve some debug output 2015-02-24 13:20:26 +01:00
Geoff McLane 70d7e58d8d Add macro nodeIsMAIN(n) 2015-02-22 20:53:14 +01:00
Geoff McLane 0aa81eb256 Issue #130 - MathML attr and entity fix!
This is a set of kludgy fixes for MathML attribute and entities support.

It is intended that a full HTML5 entity table be added at some time, but
at present ALL entities are accepted as written when within the math
element.

Likewise all attributes are accepted on MathML elements without any check
of their name or value, even if they match attributes outside MathML.

And in the pprinter such entities are written as is from the lexer, using
a new PPrintMathML service added, using the new mode OtherNameSpace.

It is hoped all these fixes will NOT effect tidy outside the math element.

ALL fixes in the set a clearly marked '#130 - MathML attr and entity fix!'
for easy searching, and improving if possible.
2015-02-22 18:58:55 +01:00
Frédéric Wang fe51244d4a Use HTML5 mapping for entities &lang; and &rang; (http://www.w3.org/TR/xml-entity-names/#diff-xhtml1). #130 2015-02-21 19:33:24 +01:00
Geoff McLane b144b834cd Add a show_all_nodes debug service 2015-02-19 19:14:40 +01:00
Geoff McLane 6e3b293985 Issue #130 - Add TidyAttr_DISPLAY for math tag 2015-02-13 18:37:07 +01:00
Jim Derry 04f68a032f Changed text to point to html-tidy.org 2015-02-13 19:17:25 +08:00
Geoff McLane cff3fdd308 Issue #133 - hopefully a better fix.
As predicted the previous fix had adverse consequences on say script text,
which then lost the indent, and was reverted.

This introduces a new service, nodeIsTextLike, which naturally returns yes
if it is text, but also is an AspTag.

Maybe other text like nodes need to be added.
2015-02-12 15:24:38 +01:00
Geoff McLane 5d2cbd10dc Revert "Issue #133 - ever increasing indent!"
This reverts commit 0f80c08355.

This commit had other BAD consequences
2015-02-12 14:56:51 +01:00
Geoff McLane ea50bd30e7 add comment only for potential fix of Issue #8 2015-02-10 15:32:05 +01:00
Geoff McLane 279c29bf8d Issue #20 - revert to 880221e fix this issue 2015-02-10 15:28:56 +01:00
Geoff McLane cbd9eb7903 Issue #155 - issue warnings unless --show-body-only yes 2015-02-07 13:56:13 +01:00
Geoff McLane b26291ec6b Issue #151 - Initial implementation of picture element.
TODO: check, verify the picture attribute list.
2015-02-07 13:42:22 +01:00
Geoff McLane d72e681d32 Issue #152 - add srcset and sizes to img tag 2015-02-06 19:24:04 +01:00
Geoff McLane 2172a498f6 Issue #153 - fix for endif section no conforming to what tidy expects 2015-02-05 19:01:34 +01:00
Geoff McLane 66951a562a add row/col to DEBUG output 2015-02-05 18:24:02 +01:00
Geoff McLane 1be5ccbb63 Issue #130 - initial MathML support 2015-02-05 12:21:08 +01:00
Geoff McLane 698396eaa0 Issue #149 - avoid crash on null attr value 2015-02-03 13:38:20 +01:00
Geoff McLane 885c7caab7 Issue #70 - Initial implmentation of SVG support.
An immense thanks to Ger Hobbelt who had already done this
in his github.com/GerHobbelt/htmltidy fork.

The two sources have diverges so was not a simple cut
an paste. But again thanks Ger for this.
2015-02-02 17:36:27 +01:00
Geoff McLane 63c6671f59 Issue #126 - partial fix for indenting style 2015-02-01 18:35:28 +01:00
Geoff McLane 7a3afd41ca Issue #132 - avoid warning if configured to omit-optional-tags.
Difficult decision to avoid head warning.
2015-02-01 16:05:39 +01:00