Commit graph

629 commits

Author SHA1 Message Date
Jim Derry 2a4dc1af52 Merge branch 'dialogue_cleanup' into next
Version bump for internal API change.

Conflicts:
	version.txt
2017-09-20 17:47:27 -04:00
Geoff McLane 79aa8b7460 Merge pull request #599 from htacg/memory-test
Issue #597 - Memory tests/diagnostics
2017-09-20 19:11:34 +02:00
Geoff McLane cd9bb76caf Merge pull request #595 from ablackton/fix/XmlElementNameParsing
Issue #594 - Parse XML element names beginning with Valid NameChar
2017-09-20 17:02:14 +02:00
Jim Derry f26d70c394 Added Doxygen documentation to the header. Some of these could be expanded a
bit, but they look nice in Doxygen.
2017-09-19 15:07:52 -04:00
Rafael Fontenelle c1a4f018df Add Brazilian Portuguese translation 2017-09-19 15:38:49 -03:00
Jim Derry 55ceb55fad Updated PO's and languages with minor changes from English. 2017-09-19 14:03:45 -04:00
Jim Derry 51e2e0f3bd Following the example of the recent changes in the "reports" aspect of Tidy's
output, classify and organize all of the dialogue type of messages. This paves
the way towards formalizing (and expanding!) the footnotes system with much
greater explanatory text, as well as providing much better fine-grained control
over which types of output that Tidy will produce.

Moved STRING_DOCTYPE_GIVEN, STRING_CONTENT_LOOKS, and STRING_NO_SYSID to the
Report paradigm from the Dialogue paradigm, as these are items that are
traditionally TidyInfo and included in the Report table, rather than any type
of dialogue.

At this point, we are exactly passing all tests.
2017-09-19 13:52:27 -04:00
Geoff McLane 55d287bc9d Issue #597 - Free the 'node' not stacked, and add 'message' 2017-09-18 19:47:52 +02:00
Geoff McLane eb81a53165 Issue #597 - Free the 'message' structure, in messageobj.c 2017-09-18 19:46:46 +02:00
Geoff McLane d5ba3d8939 Issue #597 - Switch to 'stderr' in sprtf.c 2017-09-17 16:30:37 +02:00
Geoff McLane a14cffc598 Issue #597 - Avoid reporting root node in lexer.c 2017-09-17 16:29:47 +02:00
Geoff McLane 5d017fe532 Issue #597 - Minor enhancement of memory debug in alloc.c 2017-09-17 16:28:39 +02:00
Andrew Blackton 5a50afe42c Parse XML element names beginning with Valid NameChar 2017-09-11 14:00:11 -05:00
Jim Derry 4509695445 Updated documentation in file.
Simplified the update counting.
2017-09-06 21:25:19 -04:00
Jim Derry 6bce1b377f Updated the POs and POT to reflect the re-sorted strings.
Updated language_fr.h reflect the re-sorted strings.
2017-09-06 20:55:36 -04:00
Jim Derry 80cb74fece Removed comments from and sorted error messages, as they are documented elsewhere in code now, here, too. 2017-09-04 17:43:06 -04:00
Jim Derry d8220c061f Updated the remaining items, including all of the accessibility module items.
Note that there are several regressions in the accessibility test suit that
are not related to output messages. These are a result of previous work, and
these results should be updated in the test suite when this item is merged.
2017-09-04 17:35:57 -04:00
Jim Derry 832b4772ad A bit of organizational cleanup. 2017-09-04 16:49:49 -04:00
Jim Derry bc4388e317 Migrated surrogate errors; removed break after return. 2017-09-04 16:38:07 -04:00
Jim Derry 5b6edb5813 EncodingWarning and MissingAttr migrated. 2017-09-04 16:12:01 -04:00
Jim Derry f49c419908 Implement formatter for encoding reports. 2017-09-04 15:50:45 -04:00
Jim Derry 8cb4198724 Entity errors migrated. 2017-09-04 15:28:08 -04:00
Jim Derry 18754c701d Transitioned formatCustomTagDetected to the general formatter. 2017-09-04 11:44:54 -04:00
Jim Derry e3893eb8b3 Also merged reportBadArgument into standard formatter as above. 2017-09-04 11:40:34 -04:00
Jim Derry be22ad3d03 Move file errors into the standard formatter. Local context is preserved with
braces to not pollute stack for other cases.
2017-09-04 11:35:49 -04:00
Jim Derry 283f8974c3 Migrated reports using formatFileError and formatStandard to flexible messaging system. Migrated old reportNotice() to report(). 2017-09-04 11:24:48 -04:00
Jim Derry 1d2c019162 Added a new string to disambiguate between config files and other file types. 2017-09-04 11:23:37 -04:00
Jim Derry 66e4d1f8e6 Migrated reports using formatter formatCustomTagDetected. 2017-09-02 18:04:51 -04:00
Jim Derry 0c8f684a4b Migrated messages using formatter formatBadArgument to new message system. All tests passing. 2017-09-02 18:00:46 -04:00
Jim Derry 46aa9605ee All reports that can use formatAttributeReport are now using it. Moved the
badAccess flag to the point of detection.
2017-09-02 17:29:56 -04:00
Jim Derry 00178113c8 A *complete* inventory of every message has been completed, and the dispatchTable
reflects such. Some fleshed in report formatters are included with cases for
several of Tidy's reports, but nothing is yet enabled. All reporting is status
quo, and this is just a bunch of dead code at this point.
2017-09-02 16:47:14 -04:00
Jim Derry 83263466f2 Cleanup ReportNotice() a bit by introducing an HTMLVersion() function. 2017-09-02 12:54:02 -04:00
Jim Derry 951ed381a3 Restore message logic. No bump. 2017-08-31 13:45:01 -04:00
Jim Derry e5a05ae5a8 Address merge conflicts. 2017-08-31 13:15:28 -04:00
Jim Derry 2c82cfa23b Inventoried current error strings, and removed/commented out several:
- BAD_COMMENT_CHARS
  - BAD_XML_COMMENT
  - DTYPE_NOT_UPPER_CASE
  - ENCODING_IO_CONFLICT
  - INCONSISTENT_NAMESPACE
  - INCONSISTENT_VERSION
  - INDICATE_CHANGES_IN_LANGUAGE
  - UNESCAPED_ELEMENT
  - XML_ATTRIBUTE_VALUE
Re-sorted new tidy options.
All tests passing.
Bump version to reflect strings that are externally accessible to API.
2017-08-31 12:57:58 -04:00
Jim Derry 38814f9e3b Sort message labels for simpler inventorying. 2017-08-31 10:57:54 -04:00
Jim Derry e1cbafd647 Handle message outlook properly in messageOut(). 2017-08-31 10:44:16 -04:00
Jim Derry e5eb09198d Begin migration towards "one output function to rule them all." Consolidated
the basic reporting functions that share the same signature. This also resulted
in eliminating a string, and adding a new string to disambiguate between
errors and warnings.
2017-08-30 20:01:44 -04:00
Jim Derry 1562c42c2e Merge branch 'next' into issue-456
Manually fixed merge commits.
2017-08-28 15:17:10 -04:00
Jim Derry 7badd93417 Geenrated en_gb language from the PR'd PO. Version bump for recent PR's. 2017-08-28 14:29:02 -04:00
Jim Derry 1f3cf24e82 Merge pull request #590 from mthorpe7/fix_upstream_crash
Fix NULL pointer issue with Word2000 empty attributes.
2017-08-28 14:25:35 -04:00
Jim Derry 6533181edf Merge pull request #583 from htacg/issue-582
Issue #582 - Remove extra new line in 'classic' mode
2017-08-28 14:24:26 -04:00
Michael Thorpe 52465c6142
Fix NULL pointer issue with Word2000 empty attributes.
This appears to be an issue with Word2000 handling of empty attributes.

A reproduction case can be seen here:

```
$ cat test.html
<html xmlns:o="urn:schemas-microsoft-com:office:office">
    <body>
        <table>
            <img class="" />
        </table>
    </body>
</html>

$ ./tidy --tidy-mark no --word-2000 yes test.html
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 3 column 1 - Warning: <img> isn't allowed in <table> elements
line 2 column 1 - Info: <table> previously mentioned
line 1 column 57 - Warning: inserting missing 'title' element
line 3 column 1 - Warning: <img> lacks "alt" attribute
line 3 column 1 - Warning: <img> lacks "src" attribute
line 2 column 1 - Warning: trimming empty <table>
line 1 column 1 - Warning: <html> proprietary attribute "xmlns:o"
[2]    52405 segmentation fault  ./tidy --tidy-mark no --word-2000 yes test.html
```

This was called from 6f2fb6e0e7/src/clean.c (L1710).

(It is technically undefined behaviour to call strncmp with `NULL` pointers however).
2017-08-28 15:30:28 +01:00
Jim Derry 561d43c7e5 Merge pull request #579 from htacg/issue-567-2
Issue 567 2 - style tag to head
2017-08-28 10:02:25 -04:00
Jim Derry d4a11b553e Merge pull request #577 from htacg/issue-572
Issue 572
2017-08-28 10:01:48 -04:00
Jim Derry f4c64966f0 Added TidyConfigCallback and deprecated TidyOptCallback for consistency with
the remainder of the callbacks. TidyConfigCallback is now given a reference
to the instance of the TidyDoc that caused the callback to occur.

+    TidyConfigCallback
2017-08-26 12:47:18 -04:00
Geoff McLane f7658b2c89 Issue #582 - Remove extra new line in 'classic' mode 2017-08-04 14:23:14 +02:00
Geoff McLane 09f1806834 Issue #572 - discard an all space text node.
An earlier patch now passes back an all space text node. Previously this
would have been skipped. So add code in ParseList to detect, and discard
such a node.

Change committed:
	modified:   src/parser.c
2017-07-08 19:45:42 +02:00
Geoff McLane f26a068809 Issue #572 - More conditions for #396 2017-07-02 21:10:20 +02:00
Geoff McLane 50859e8258 Issue #567 - add option, messages, and fix node iteration.
Add option TidyStyleTags, --fix-style-tags, Bool, to turn off
this action.

Add warning messages MOVED_STYLE_TO_HEAD, and FOUND_STYLE_IN_BODY.

Fully iterate ALL nodes in the body, in search of style tags...

Changes to be committed:
	modified:   include/tidyenum.h
	modified:   src/clean.c
	modified:   src/config.c
	modified:   src/language_en.h
	modified:   src/message.c
2017-06-28 20:41:46 +02:00
Geoff McLane d4ca02adfb Issue #567 - Branch 'issue-567-2' to move all 'style' to 'head' 2017-06-18 20:06:24 +02:00
Geoff McLane b32e14a8ea Issue #456 - add new option show-meta-change 2017-06-09 03:11:39 +02:00
Geoff McLane 97292646f6 Issue #456 - Add 'Info:' message when charset replaced 2017-06-05 17:16:53 +02:00
Geoff McLane a4770daa2b Issue #456 - Add 'Info:' message, when meta added.
It also fixes the addition of the constant 'http-equiv="Content-Type"
attribute.
2017-06-04 20:44:02 +02:00
Geoff McLane 13b34c9d8b Issue #456 - BAH! Fix a stupid logic reversal 2017-06-04 15:41:16 +02:00
Geoff McLane e28ec72301 Merge branch 'next' into issue-456
Continue WIP #456
2017-06-04 14:59:18 +02:00
Geoff McLane eb127a5c5b Issue #550 - K&R/MSVC10 fix - message.c 2017-05-30 18:14:58 +02:00
Geoff McLane 722a841ce2 Merge branch 'next' into issue-456
This was to pick up the fix for #395, PR #564, and bumps the version to
5.5.30...
2017-05-29 14:36:14 +02:00
Geoff McLane 4136d85a9c Issue #395, #564 - Oops, restore orig char if not closing 2017-05-29 14:26:55 +02:00
Geoff McLane 40e1d64963 Issue #456 - A desparate commit to get this WIP right, but... 2017-05-27 20:13:51 +02:00
Geoff McLane 8a932f96eb Issue #456 - Oops, incorrect merge conflict 2017-05-27 18:52:49 +02:00
Geoff McLane 049bc6c288 mERGE branch 'next' into issue-456 2017-05-27 18:35:01 +02:00
Geoff McLane c61b5b7b0c Merge branch 'next' into issue-395 2017-05-27 18:20:28 +02:00
Geoff McLane 825ad59262 Merge branch 'next' into issue-392 2017-05-27 16:25:24 +02:00
Jim Derry 47c27ecf8e Generated French header file; bumped to 5.5.26 for updated French language. 2017-05-21 14:29:13 -04:00
Jim Derry 996ddb813d Merge pull request #554 from htacg/issue-365
Issue 365
2017-05-21 14:24:03 -04:00
Geoff McLane c9c1d7ae55 Issue #395 - a potential fix 2017-05-21 01:47:36 +02:00
Geoff McLane 6f05041b5e Issue #392 - a simple fix, but maybe incomplete 2017-05-21 00:18:43 +02:00
Geoff McLane ec03beb361 Issue #552 - remove no 'case default:' warning in most gcc versions
Seems too small for a version bump. Closes #552
2017-05-19 18:38:01 +02:00
Geoff McLane 21f008501a Issue #456 - Oops, also out of 'lexer.h' 2017-05-15 16:51:34 +02:00
Geoff McLane a7a4cd6a16 Issue #456 - avoid head work if showing body only 2017-05-15 16:42:49 +02:00
Geoff McLane f310f1d5de Issue #456 - Move new TidyMetaCharset to clean 2017-05-15 16:39:53 +02:00
Geoff McLane 6ebd12be67 Issue #456 - More work on this option 2017-05-14 19:08:29 +02:00
Jim Derry 9b2cd06711 Merge branch 'next' into issue-365 2017-05-13 22:27:14 -04:00
Jim Derry 66d0825e58 Merge pull request #557 from htacg/update_langs
Update languages against current English.
2017-05-13 22:24:43 -04:00
Jim Derry 5791c55081 Update languages against current English. 2017-05-13 21:07:02 -04:00
Jim Derry 0f1e625324 Address #378
Addresses issue #378 by NOT emitting warnings if `fix-uri` is `no`, for HTML5
documents. This preserves existing behavior for legacy document types.
2017-05-13 20:46:48 -04:00
Jim Derry d18b21b94c Merge branch 'next' into issue-365 2017-05-13 19:55:19 -04:00
Jim Derry b6bf48c24a Merge pull request #553 from htacg/new_picklists
New picklists and parsers
2017-05-13 19:50:20 -04:00
Jim Derry a399725a1e Fixed ParseAutoBool error. 2017-05-13 11:39:13 -04:00
Geoff McLane 8843199370 Issue #456 - Merge branch 'meta-charset' of tidy-html5-marco.
This pulls the work done by @marcoscaceres WIP #458 into the issue-456
branch, to complete the new add-meta-charset option.
2017-05-13 16:02:26 +02:00
Jim Derry 982504eee0 Case insensitive compare is safe here, and prevents erroneous propriertary attribute errors. 2017-05-12 08:28:11 -04:00
Jim Derry e7c28636b9 Fixed cause of assertions -- funny, these don't pop up in XCode. 2017-05-12 07:30:20 -04:00
Jim Derry 29766afcfd Initial take on issue 365. This is based off of the simplification of the
parser and picklist system. Console application needs to be updated to fix
the description, as it shows autobool, and for some reason on the current
system I'm not getting assertion failures.
2017-05-11 18:12:56 -04:00
Jim Derry 7112fba553 Merge pull request #549 from htacg/issue_391
Address #391. Tested on macOS and Win10.
2017-05-11 15:24:44 -04:00
Jim Derry aeb9a24fab Refactor Picklists and Option Parsers
This PR refactors how picklists and option parsers are implemented in LibTidy,
making is vastly easier to implement new picklists in the future, as well as
modify some of the existing picklists such that they have more logical names.

Picklist arrays are now arrays of structures that include the possible strings
capable of setting a particular option value, and a new parser has been written
to work with these structures.

In addition, several of the existing parsers were removed, as they are now
redundant, and a couple of the remaining parsers were refactored to take
advantage of the new parser.

In effect, this means that:

- New parsers don't have to be written in the majority of cases where new
  options are added that exceed yes/no/auto.
- Some of the existing options can have more meaningful names than yes/no/auto,
  in a backward compatible way. For example, vertical-spacing "auto" currently
  in no way reflects "auto" when used.
2017-05-11 14:40:21 -04:00
Geoff McLane f7e7554c95 Close the file before the _WIN32 switch 2017-05-09 19:24:20 +02:00
Jim Derry acaab679c5 Merge pull request #547 from htacg/issue_352
Attempt to address issue #352.
2017-05-08 17:36:52 -04:00
Geoff McLane 77420b94d0 Fix for 'isalnum' in Windows
According to the MSN documentation 'isalnum(c)' is only valid when c equals
EOF, or is in the range 0 to 255 inclusive. It states the behavior is
undefined outside this range, and in Debug mode triggers an assert dialog.
2017-05-08 18:42:33 +02:00
Jim Derry ce105dcf09 Address #391. Tested on macOS and Win10.
- Add a check upon opening a file for validity of the file.
- Add a new message to indicate that the path is not a file.
2017-05-07 17:04:53 -04:00
Jim Derry fd77312175 Attempt to address issue #352. This patch correctly address the specific issues
in #352, but I'm worried that there's some over-reach here.

Currently only implemented as a warning, with no switch to turn it off, which
maintains current behavior other than the warning.

In general, we're treating any string as a complete URL, rather than breaking
URL's into component parts. Thus the `IsURLCodePoint()` check includes a few
other generic characters that strictly speaking aren't valid codepoints, but
are valid as escape characters and delimiters.

When addressing #338, I ran into a similar situation in not having a built-in
method to separate path components (although a simple generalized solution was
good enough in that case).

Thus without introducing a new structure and functions to deconstruct a URL
into scheme, authority, path, parameters, etc., some variation of this patch
will have to be used to address #352.
2017-05-06 18:54:42 -04:00
Jim Derry 09d1802298 Merge branch 'next' into deprecations 2017-05-06 14:34:48 -04:00
Geoff McLane fd2400d55b Merge pull request #543 from htacg/issue-436
Small documentation change to close #436
2017-05-06 15:44:45 +02:00
Geoff McLane d4978608e7 Merge pull request #537 from deathbaba/next
Correctly process 'bookmarks' in html exported from Google Doc.
2017-05-06 15:35:57 +02:00
Geoff McLane 6839dfe601 Merge pull request #541 from htacg/issue_338
Issue #338 - fix 3 spurious access level 3 warnings...
2017-05-06 15:20:55 +02:00
Geoff McLane 6da0fff256 Merge pull request #532 from lhchavez/add-warn-prop-attrs
Add a flag to warn on proprietary attributes
2017-05-06 14:48:36 +02:00
Jim Derry 846b3cde55 Address #436 just to close it. 2017-05-04 13:45:06 -04:00
Geoff McLane d142527a8e Issue #338 - Deal with two other spurious access warnings 2017-05-04 17:36:39 +02:00
Jim Derry 49b833f63b WIP 2017-05-03 18:16:03 -04:00
Jim Derry 8b2f92f625 Issue #338 occurs because the existing routines assume that any URI with an
extension is a file, and so links to TLD's ending with .pl, .au, etc., will
cause accessibility warnings. This fix attempts to distinguish between URI's
that are likely to be files versus links to domains.
2017-05-03 16:15:44 -04:00
Geoff McLane b03598652f Issue #461 - alternative patch for this issue 2017-05-02 19:39:16 +02:00
Alexander Zolotarev 87169d8953 Correctly process 'bookmarks' in html exported from Google Doc. 2017-04-19 14:47:27 -10:00
lhchavez a19d271f47 Add a flag to warn on proprietary attributes
This change adds the TidyWarnPropAttrs flag (default=on) that emits a
warning every proprietary attribute it finds.
2017-04-15 03:17:16 +00:00
Geoff McLane d8839485a4 Merge branch 'next' of github.com:htacg/tidy-html5 into next 2017-04-09 02:09:19 +02:00
Geoff McLane 219a5c797b Issue #524 - Remove obsolete message 2017-04-09 02:08:03 +02:00
Jim Derry d1e0b22be7 Removed TidyDropFontTags. Note that POs and POT were _not_ updated. 2017-04-04 14:42:47 -04:00
Jim Derry 24afc6a6fa Fixed some casting issues that Ubuntu object to.
- Test on macOS, Win10, Ubuntu.
- No version bump for this change.
2017-04-04 14:33:56 -04:00
Geoff McLane 22dcea067e Issue #335, maybe #333, to output indent char, reduce if tab 2017-03-26 16:57:29 +02:00
Geoff McLane 5f88452487 Issue #333 - create exception for span/meta 2017-03-26 16:57:29 +02:00
Jim Derry 5f05add439 Continue the documentation effort!
- Many, many updates to the public header files.
- tidyenum.h was reorganized substantially in order to better generate
  documentation with Doxygen.
- This was also a good time to clean up all of the various enums for languages
  and strings. Everything is simple and in a single enum now, other than a
  couple of cases (TidyOptionId, for example, doesn't need to be redefined).
- A full and complete audit of the strings meant some opportunities to delete
  useless strings.
- Reorganized the order of the strings in language_en.h in order to better
  find things when programmers want to make changes. There are a lot fewer
  internal "sections" now, and everything has been painstakingly sorted within
  the remaining sections.
- Consequently rebased all of the PO's, POT, and other language files.
- Updated several of the READMEs with the newest information.
- Made the READMEs easier to copy into the Doxygen project by changing some of
  the code format for compatibility, mainly the use of tildes instead of
  backslashes for code blocks.
- Added tidyGetMessageCode() to message API. Despite the huge diff, this is the
  only externally-visible change, other than removing some enums (but not their
  values!).
- Passing `next` tests on Mac, Linux, Win10.
2017-03-22 16:05:13 -04:00
Jim Derry 929575afb4 Picklist enums should all start at zero for external LibTidy user compatibility.
Restore the new custom-tags enum to this state, and add separate string keys.
Updated PO's to reflect; no change to headers.
2017-03-20 12:20:51 -04:00
Jim Derry a4f752f274 Implement TODO:
- tidyDetectedHtmlVersion()
- tidyDetectedXhtml()
- added two new fields to W3C_Doctypes[] in order to simplify this.
- added TY_(HTMLVersionNumberFromCode)() to enable lookup.
- Implement tidyDetectedGenericXml()
- Added a warning message if an XML declaration exists but the document is not
  XHTML.
- Remove dead commented code.
- Updated POs and POT. Headers not affected, but translators should check
  their translations.
- Testing is clean on Mac OS X, Ubuntu 16.04, and Windows 10.
2017-03-19 15:41:51 -04:00
Jim Derry 13122e8862 Added tidyErrorCodeFromKey()
Added tidyGetMessageDoc()
Improved the Public API documentation.
2017-03-19 08:15:32 -04:00
Geoff McLane c8f366b76e Issue #119 - Remove 3 newline chars, that crept in... 2017-03-18 18:52:48 +01:00
Jim Derry da55a6e4ac Removed unused declaration. 2017-03-16 08:00:05 -04:00
Jim Derry 0c5550b06f I think the messages are where I want them to be. Will generate test cases
for comparison. Also regen'd all pots and language headers.
2017-03-15 17:36:05 -04:00
Jim Derry 5606f32f13 WIP; messaging much more logical, except @todo noted. 2017-03-14 21:50:10 -04:00
Jim Derry 66ade9def4 Still noisy, but adds HTML5 dependent output message upon detection. 2017-03-14 16:27:11 -04:00
Jim Derry ed5a1d84ea Add TY_(nodeIsAutonomousCustomTag), so we can use it elsewhere. 2017-03-14 15:44:46 -04:00
Jim Derry 8273491e16 Change allowed values for custom-tags, and make y equal to inline. 2017-03-14 15:16:11 -04:00
Jim Derry 66de84bc2b - Add support for the is attribute.
- Add support for autonomous custom elements.
2017-03-13 13:45:32 -04:00
Jim Derry 11178d775b Massive Revamp of the Messaging System
This is a rather large refactoring of Tidy's messaging system. This was done
mostly to allow non-C libraries that cannot adequately take advantage of
arg_lists a chance to query report filter information for information related
to arguments used in constructing an error message.

Three main goals were in mind for this project:

- Don't change the contents of Tidy's existing output sinks. This will ensure
  that changes do no affect console Tidy users, or LibTidy users who use the
  output sinks directly. This was accomplished 100% other than some improved
  cosmetics in the output. See tidy-html5-tests repository, the `refactor` and
  `more_messages_changes` branches for these minor diffs.
- Provide an API that is simple and also extensible without having to write new
  error filters all the time. This was accomplished by adding the new message
  callback `TidyMessageCallback` that provides callback functions an opaque
  object representing the message, and an API to query the message for wanted
  details. With this, we should never have to add a new callback routine again,
  as additional API can simply be written against the opaque object.
- The API should work the same as the rest of LibTidy's API in that it's
  consistent and only uses simple types with wide interoperability with other
  languages. Thanks to @gagern who suggested the model for the API in #409.
  Although the API uses the "Tidy" way off accessing data via an iterator
  rather than an index, this can be easily abstracted in the target language.

There are two *major* API breaking changes:

- Removed TidyReportFilter2
  - This was only used by one application in the entire world, and was a hacky
    kludge that served its purpose. TidyReportCallback (né TidyReportFilter3)
    is much better. If, for some reason, this affects you, I recommend using
    TidyReportCallback instead. It's a minor change for your application.
- Renamed TidyReportFilter3 to TidyReportCallback
  - This name is much more semantic, and much more sensible in light of
    improved callback system. As the name implies, it remains capable of
    *only* receiving callbacks for Tidy "reports."

Introducing TidyMessageCallback, and a new message interrogation API.

- As its name implies, it is able to capture (and optionally suppress) *all*
  of Tidy's output, including the dialogue messages that never make it to
  the existing report filters.
- Provides an opaque `TidyMessage` and an API that can be used to query against
  it to find the juicy goodness inside.
  - For example, `tidyGetMessageOutput( tmessage )` will return the complete,
    localized message.
  - Another example, `tidyGetMessageLine( tmessage )` will return the line the
    message applies to.
- You can also get information about the individual arguments that make up a
  message. By using the `tidyGetMessageArguments( tmessage )` itorator and
  `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument`
  which has its own interrogation API. For example:
    - tidyGetArgType( tmessage, &iterator );
    - tidyGetArgFormat( tmessage, &iterator );
    - tidyGetArgValueString( tmessage, &iterator );
    - …and so on.

Other major changes include refactoring `messages.c` to use the new message
"object" directly when emitting messages to the console or output sinks. This
allowed replacement of a lot of specialized functions with generalized ones.

Some of this generalizing involved modifications to the `language_xx.h` header
files, and these are all positive improvements even without the above changes.
2017-03-13 13:28:57 -04:00
Jim Derry 4dc8a2cf9a Bump version to 5.5.5 for this fiasco, and fix poor planning and unfortunate
merge.
  - Sort all of the existing options and re-indent per Tidy standards. This is
    simply for cosmetic effect.
  - Allow the iterator to return all options again, even "internal" options.
    Things are too embedded with N_TIDY_OPTIONS, etc., to try to hide them.
  - Instead, simply add documentation to LibTidy users that they shouldn't use
    internal options.
  - Also added `TidyInternalCategory` to `TidyConfigCategory` without adding a
    new field to the struct. API users should check for this category before
    use.
  - Defined a two character macro for `TidyInternalCategory` for use in
    `option_defs[]`.
  - Changed struct `option_defs[]` to reflect the new category for affected
    options.
  - Removed string indicating * refers to internal options, since it no longer
    applies.
  - Regen'd all strings for previous point.
  - `tidy.c` now checks for `TidyInternalCategory` everywhere in order to
    suppress output.
2017-03-10 09:13:21 -05:00
Jim Derry ac242e9ea4 hotfix 2017-03-09 19:56:16 -05:00
Jim Derry e27cc262fe Bring the local vars into the context, which is allowed in C89. 2017-03-09 12:44:48 -05:00
Jim Derry 005127c733 Address issue #472. 2017-03-08 15:37:01 -05:00
Jim Derry 978756a482 Restore the previous status of gnu-emacs-file
- Updated strings files to match.
- Inhibit internal options from being output via the iterator. Internals should
  never have the chance to be exposed if they shouldn't be use.
- Added tidySetEmacsFile() and TidyGetEmacsFile() to the public API, and use it
  instead of secret API to set the filename in the console application.

The end result is that `gnu-emacs-file` (and also `doctype-mode`) officially no
longer exist to CLI users nor to API users, and tidy console behaves properly
by using a published API to set the filename for emacs.
2017-03-07 20:11:31 -05:00
Jim Derry 03f0192f51 How did this get back in there??? 2017-03-04 15:31:25 -05:00
Jim Derry 74a4fa4049 Merge branch 'next' into clean_deprecations 2017-03-02 11:40:14 -05:00
Jim Derry 3be515b1f9 Merge branch 'next' into messages_squashed 2017-03-02 09:34:58 -05:00
Jim Derry 92621d6f99 MSVC Compatibility
- Changed location of pointer operator in declarations.
  - Updated `CODESTYLE.md` to reflect this.
  - Updated `API_AND_NAMESPACE.md` to reflect this.
2017-03-02 09:32:02 -05:00
Geoff McLane a49890ee55 Issue #498 - parser.c - if a <table> in a <table> just close.
The previous action was to discard the second, while it is the second
table that browsers will render.

This conforms to the principle that the html output by tidy should render
in a browser like the original html.
2017-02-24 16:20:10 +01:00
Geoff McLane c4b5904e1c Issue #497 - lexer.c - Add comment for this PR @seaburg 2017-02-24 14:38:20 +01:00
Geoff McLane e44f4d1469 Merge pull request #497 from seaburg/fix_value_trimming
Fix leading white spaces trimming
2017-02-24 14:30:39 +01:00
Geoff McLane 27fe0548b9 Issue #468 - config.c - use RAW encoding for all cases 2017-02-23 16:28:19 +01:00
Geoff McLane 569ae4b435 Issue #329 - lexer.c - do not discard this newline here 2017-02-23 15:27:03 +01:00
Evgeniy Yurtaev bb1d62d3bd Fix leading white spaces trimming 2017-02-22 14:34:40 +03:00
Jim Derry c54c10f857 - Removed deprecated options:
- TidySlideStyle
  - TidyBurstSlides

- Added documentation for TidyEmacsFile, since it's a valid option.

- Because TidyEmacsFile is a valid option, tweaked tidy.c so that it can
  be specified in a configuration file without being overwritten by the console
  app. Why a user might do this is dumb, but who are we to stop them.
2017-02-18 18:30:41 -05:00
Jim Derry edc548095c Removed language as tidy config option; it is only CLI option. 2017-02-18 17:16:35 -05:00
Jim Derry cbb8354f74 Combined leftover attribute API stuff into single, new file. 2017-02-18 16:57:11 -05:00
Jim Derry f6ce4d130e Removed deprecated tidyAttrGetSOMETHING from API. 2017-02-18 16:46:20 -05:00
Jim Derry 13c6387f47 Removed deprecated AttributeIsSOMETHING from API. 2017-02-18 16:43:47 -05:00
Jim Derry a16f36ce53 Removed deprecated NodeIsElementName from API. 2017-02-18 16:33:21 -05:00
Jim Derry 165acc4f3e Several foundational changes preparing for release of 5.4 and future 5.5:
- Consolidated all output string definitions enums into `tidyenum.h`, which
    is where they belong, and where they have proper visibility.
  - Re-arranged `messages.c/h` with several comments useful to developers.
  - Properly added the key lookup functions and the language localization
    functions into tidy.h/tidylib.c with proper name-spacing.
  - Previous point restored a *lot* of sanity to the #include pollution that's
    been introduced in light of these.
  - Note that opaque types have been (properly) introduced. Look at the updated
    headers for `language.h`. In particular only an opaque structure is passed
    outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName
    to poll these objects.
  - Console application updated as a result of this.
  - Removed dead code:
    - void TY_(UnknownOption)( TidyDocImpl* doc, char c );
    - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file );
  - Redundant strings were removed with the removal of this dead code.
  - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER
    depend on enum values. `TidyReportLevel` is an example of such.
  - Some enums were removed as a result of this. `TidyReportLevel` now has
    matching strings, so the redundant `TidyReportLevelStrings` was removed.
  - All of the PO's and language header files were regenerated as a result of
    the string cleanup and header cleanup.
  - Made the interface to the library version and release date consistent.
  - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to
    be able to remove console-only code from LibTidy (for LibTidy users).
  - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-17 15:29:26 -05:00
Jim Derry e1f066fe14 Merge branch 'empretty_script' 2017-02-13 08:49:13 -05:00
Jim Derry b7c84b1b57 Merge branch 'surrogates' 2017-02-13 08:49:06 -05:00
Geoff McLane ea49ca0b1d Fix license for SPRTF modules.
Also correct the coding style to conform to HTML Tidy standard.
2017-02-12 17:38:44 +01:00
Geoff McLane 7f73d4f429 Issue #483 - Add ReportSurrogateError() service and connect. 2017-02-11 18:33:45 +01:00
Geoff McLane 75bc1f06c7 More updates for Issue #483 - Start warning msgs - WIP 2017-02-09 20:55:23 +01:00
Jim Derry 1ac50fccb3 Pretty up output of empty script tags.
- No longer break script tags up on two lines if there is content. However
    output is still subject to the `--wrap` behavior.
  - Previous behavior intact if there is content.

Todo.

  - Associate this with a new Tidy option.
2017-02-08 13:53:37 -05:00
Geoff McLane 9dc76c1e77 Issue #483 - Some fixes for error condition 2017-02-02 16:43:10 +01:00
Geoff McLane 259d330780 Issue #483 - First cut dealing with 'surrogate pairs'.
Only deals with a successful case.

TODO: Maybe add a warning/error if the trailing surrogate not found, and
maybe consider substituting to avoid invalid utf-8 output.
2017-02-01 13:50:33 +01:00
Geoff McLane deebc93f97 Merge pull request #480 from onnimonni/feature-fix-xmlns-xlink
Add optional xmlns:xlink attributes as valid to support inline svg
2017-01-29 19:17:43 +01:00
Onni Hakala da27b5e339
Add optional xmlns:xlink attributes as valid to support inline svg
fixes #478
2017-01-09 01:38:16 +02:00
Marcos Caceres 91da8c6f74 style: ansi conforming comments 2016-12-20 16:51:09 +11:00
Geoff McLane fd0ccb2bbf Bad, repeated node iteration! closes #459 2016-10-30 23:37:31 +01:00
Marcos Caceres aff76bec38 fix(lexer.c): fixes from initial review 2016-10-17 17:00:58 +11:00
Marcos Caceres 523d58b004 refactor: ask for charset and http_equiv attrs 2016-10-06 19:30:23 +11:00
Marcos Caceres 932cc104a6 feat(attrask.c): learn about charset attr 2016-10-06 19:29:56 +11:00
Marcos Caceres 53ee94ddba fix: incorrect check for first element in head 2016-10-06 19:07:44 +11:00
Marcos Caceres b1629c4a4f fix(lexer): bad attribute reporting 2016-10-05 20:22:19 +11:00
Marcos Caceres 2d7ddfef94 Part 2.1 - Bug fixes and warning 2016-10-05 20:14:18 +11:00
Marcos Caceres cfc22ac46e Add garvankeeley's suggestions using calloc 2016-10-05 18:54:25 +11:00
Marcos Caceres 040c22c6dc Part 2 - Implement lexer logic 2016-10-04 21:23:57 +11:00
Marcos Caceres 169bd38adf Part 1 - Add basic infra for 'add-meta-charset' option 2016-10-04 17:56:29 +11:00
Geoff McLane d81a9ad901 Merge branch 'issue-428'
Conflicts:
	version.txt

This closes #428
2016-09-11 16:57:07 +02:00
Marcos Caceres e4ae9c064d Add support for link 'as' attribute (closes #449) 2016-08-23 18:46:04 +10:00
Geoff McLane 80e57b23bf Merge branch 'master' into issue-428
Conflicts:
	version.txt
2016-08-09 00:46:40 +02:00
Geoff McLane 7631f25ed2 rebase issue-428 2016-08-02 18:10:19 +02:00
Adam Majer 50557a4f63 Fix static buffer overrrun (issue #443)
result[6] is a fixed array of size 6, but in the process
of copying data into it, we clobber the last allocated byte.

Simplify some of the code by not calling redundant functions.
2016-08-02 11:10:45 +02:00
Benjamin Esham 54179386be Add support for the "integrity" attribute
This attribute may be used on "link" and "script" elements. See
http://www.w3.org/TR/2016/REC-SRI-20160623/#element-interface-extensions
2016-07-24 10:24:30 -04:00
Michal Čihař 10281040ca Avoid crash in tidyCleanAndRepair if document was not loaded
These services can only be used when there is a document loaded, ie a
lexer created.  But really should not be calling a Clean and Repair
service with no doc!
2016-07-07 16:38:05 +02:00
Geoff McLane 685f7a6c5b Issue #428 - Avoid adding form to input if html5 2016-07-02 20:13:01 +02:00
Geoff McLane 7bec2c2082 Merge pull request #422 from sesom42/master
prevent buffer overflow in debug output
2016-06-30 18:32:55 +02:00
Geoff McLane 97700044ce Merge pull request #410 from gagern/varargs
Pair va_copy calls with va_end
2016-06-18 18:53:53 +02:00
Jens Tautenhahn 84fc451a78 prevent buffer overflow in debug output 2016-06-14 15:42:18 +02:00
Benjamin Esham 941b763a8d Add support for "crossorigin" on audio too 2016-06-08 19:40:15 -04:00
Benjamin Esham d9d8e92e52 Allow "crossorigin" on img, script, and video tags too 2016-06-07 22:29:57 -04:00
Benjamin Esham 9377f65f89 Add support for the HTML5 "crossorigin" attribute
This attribute can only be used on "link" elements.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link#Attributes
2016-06-07 22:20:10 -04:00
Martin von Gagern 04bc8d3195 Pair va_copy calls with va_end
According to the specs, each va_copy call should be matched by a va_end call
to ensure proper cleanup.  Furthermore, since message filters might iterate
over the list of arguments, we should hand a new copy to each filter.
2016-05-17 22:37:32 +02:00
Raphael Ackermann b704a4d0d4 allow zero LI in UL when html5. fix for #396 2016-04-08 23:08:56 +02:00
Geoff McLane 61a0a331fc Issue #390 - fix indent with --hide-endtags yes.
The problem was, with --hide-endtags yes, a conditional pprint buffer
flush had nothing to flush, thus the indent was not adjusted.

To track down this bug added a lot of MSVC Debug code, but is only
existing if some additional items defined, so has no effect on the release
code.

This, what feels like a good fix, was first reported about 12 years ago by
@OlafvdSpek in SF Bugs 563. Hopefully finally closed.
2016-04-04 18:13:08 +02:00
Geoff McLane 7598fdfff2 avoid DEBUG duplicate newline 2016-04-03 17:54:46 +02:00
Geoff McLane 7777a71913 Issue #369 - Remove Debug asserts 2016-03-31 14:50:03 +02:00
Geoff rpi McLane 086e4c948c remove gcc comment warning 2016-03-30 15:02:19 +00:00
Geoff McLane 59d6fc7022 Issue #377 - If version XHTML5 available, return that. 2016-03-30 16:28:08 +02:00
Geoff McLane 1830fdb97c Issue #384 - insert comments 2016-03-30 14:18:04 +02:00
Geoff McLane 4b135d9b47 Merge pull request #384 from seaburg/master
Fix skipping parsing character
2016-03-30 14:08:40 +02:00
Geoff McLane e87f26c247 Merge pull request #388 from htacg/fr.po
Merge fr.po to master
2016-03-27 19:54:54 +02:00
Jim Derry 7d2ddee775 Add new rebase command to CLI.
This is intended to make it very, very easy to update the POT and all of the POs when
changes are made to `language_en.h`. Used without an sha-1 hash, untranslated strings
(i.e., the "source" strings) are updated in the POT/PO's.

However if you specify an --sha=HASH (or -c HASH) option, then the script will use git
to examine the `language_en.h` file from that specified commit, determing the strings
that have changed, and mark all of these strings as `fuzzy` in the POs. This will serve
as a flag to translators that the original has changed. In addition, this `fuzzy` flag
will appear in the headers as "(fuzzy) " in the item comments.

If a translator edits the header directly, he should remove the "(fuzzy )" in the
comment. Then when the PO is rebuilt, the fuzzy flag will be removed automatically.
The reverse is also true; if a translator is working with the PO, he or she should
clear the fuzzy flag and the comment will be adjusted accordingly in the generated
header.
2016-03-25 09:21:21 +08:00
Geoff McLane 8671544beb Issue #383 - Add a WIP language_fr.h to facilitate testing 2016-03-24 14:15:43 +01:00
Geoff McLane 5feca8cfd6 Issue #383 - correct another byte-by-byte output to message file.
As in the previous case these messages are already valid utf-8 text, and
thus, if output on a byte-by-byte basis, must not use WriteChar, except
for the EOL char.

Of course this output can be to either a user ouput file, if configured,
otherwise stderr.
2016-03-24 14:15:43 +01:00
Jim Derry ad7bdee3b9 Added translator comments to new TidyEscapeScripts option, and updated POT and POs to reflect this. 2016-03-24 11:00:47 +08:00
Jim Derry 71d6ca1392 Oops. Didn't commit es changes. This fixes that. 2016-03-23 15:10:07 +08:00
Jim Derry d54785c933 language help enhancements:
- Show the language Tidy is using.
- Update the POT and POs with the modified string.
- Regen language_es.h, which uses the string.

Note that the new header uses the new commentless behavior that's still
pending in another branch. In addition the proper c style hints have
been added to all PO's, as their previous absense was a bug.
2016-03-23 14:56:36 +08:00
Jim Derry 2cf03f7fa9 Fix two character lang codes not working. 2016-03-23 14:38:17 +08:00
Geoff McLane 000c6925bd Issue #348 - Add option 'escape-script', def = yes 2016-03-20 01:01:46 +01:00
Geoff McLane e6f1533d89 Issue #383 - Output message file text byte-by-byte 2016-03-18 18:47:00 +01:00
Evgeniy Yurtaev 7d28b21e60 Fix skipping parsing character 2016-03-17 23:30:11 +03:00
Geoff McLane 8dda04f1df Issue #379 - Care about 'ix' going negative.
How this lasted so long in the code is a mystery! But of course it will
only be a read out-of-bounds if testing the first character in the lexer,
and it is a spacey char.

A big thanks to @gaa-cifasis for running ASAN tests on Tidy.
2016-03-06 17:36:51 +01:00