Commit graph

616 commits

Author SHA1 Message Date
Geoff McLane 825ad59262 Merge branch 'next' into issue-392 2017-05-27 16:25:24 +02:00
Jim Derry 47c27ecf8e Generated French header file; bumped to 5.5.26 for updated French language. 2017-05-21 14:29:13 -04:00
Jim Derry 996ddb813d Merge pull request #554 from htacg/issue-365
Issue 365
2017-05-21 14:24:03 -04:00
Geoff McLane c9c1d7ae55 Issue #395 - a potential fix 2017-05-21 01:47:36 +02:00
Geoff McLane 6f05041b5e Issue #392 - a simple fix, but maybe incomplete 2017-05-21 00:18:43 +02:00
Geoff McLane ec03beb361 Issue #552 - remove no 'case default:' warning in most gcc versions
Seems too small for a version bump. Closes #552
2017-05-19 18:38:01 +02:00
Geoff McLane 21f008501a Issue #456 - Oops, also out of 'lexer.h' 2017-05-15 16:51:34 +02:00
Geoff McLane a7a4cd6a16 Issue #456 - avoid head work if showing body only 2017-05-15 16:42:49 +02:00
Geoff McLane f310f1d5de Issue #456 - Move new TidyMetaCharset to clean 2017-05-15 16:39:53 +02:00
Geoff McLane 6ebd12be67 Issue #456 - More work on this option 2017-05-14 19:08:29 +02:00
Jim Derry 9b2cd06711 Merge branch 'next' into issue-365 2017-05-13 22:27:14 -04:00
Jim Derry 66d0825e58 Merge pull request #557 from htacg/update_langs
Update languages against current English.
2017-05-13 22:24:43 -04:00
Jim Derry 5791c55081 Update languages against current English. 2017-05-13 21:07:02 -04:00
Jim Derry 0f1e625324 Address #378
Addresses issue #378 by NOT emitting warnings if `fix-uri` is `no`, for HTML5
documents. This preserves existing behavior for legacy document types.
2017-05-13 20:46:48 -04:00
Jim Derry d18b21b94c Merge branch 'next' into issue-365 2017-05-13 19:55:19 -04:00
Jim Derry b6bf48c24a Merge pull request #553 from htacg/new_picklists
New picklists and parsers
2017-05-13 19:50:20 -04:00
Jim Derry a399725a1e Fixed ParseAutoBool error. 2017-05-13 11:39:13 -04:00
Geoff McLane 8843199370 Issue #456 - Merge branch 'meta-charset' of tidy-html5-marco.
This pulls the work done by @marcoscaceres WIP #458 into the issue-456
branch, to complete the new add-meta-charset option.
2017-05-13 16:02:26 +02:00
Jim Derry 982504eee0 Case insensitive compare is safe here, and prevents erroneous propriertary attribute errors. 2017-05-12 08:28:11 -04:00
Jim Derry e7c28636b9 Fixed cause of assertions -- funny, these don't pop up in XCode. 2017-05-12 07:30:20 -04:00
Jim Derry 29766afcfd Initial take on issue 365. This is based off of the simplification of the
parser and picklist system. Console application needs to be updated to fix
the description, as it shows autobool, and for some reason on the current
system I'm not getting assertion failures.
2017-05-11 18:12:56 -04:00
Jim Derry 7112fba553 Merge pull request #549 from htacg/issue_391
Address #391. Tested on macOS and Win10.
2017-05-11 15:24:44 -04:00
Jim Derry aeb9a24fab Refactor Picklists and Option Parsers
This PR refactors how picklists and option parsers are implemented in LibTidy,
making is vastly easier to implement new picklists in the future, as well as
modify some of the existing picklists such that they have more logical names.

Picklist arrays are now arrays of structures that include the possible strings
capable of setting a particular option value, and a new parser has been written
to work with these structures.

In addition, several of the existing parsers were removed, as they are now
redundant, and a couple of the remaining parsers were refactored to take
advantage of the new parser.

In effect, this means that:

- New parsers don't have to be written in the majority of cases where new
  options are added that exceed yes/no/auto.
- Some of the existing options can have more meaningful names than yes/no/auto,
  in a backward compatible way. For example, vertical-spacing "auto" currently
  in no way reflects "auto" when used.
2017-05-11 14:40:21 -04:00
Geoff McLane f7e7554c95 Close the file before the _WIN32 switch 2017-05-09 19:24:20 +02:00
Jim Derry acaab679c5 Merge pull request #547 from htacg/issue_352
Attempt to address issue #352.
2017-05-08 17:36:52 -04:00
Geoff McLane 77420b94d0 Fix for 'isalnum' in Windows
According to the MSN documentation 'isalnum(c)' is only valid when c equals
EOF, or is in the range 0 to 255 inclusive. It states the behavior is
undefined outside this range, and in Debug mode triggers an assert dialog.
2017-05-08 18:42:33 +02:00
Jim Derry ce105dcf09 Address #391. Tested on macOS and Win10.
- Add a check upon opening a file for validity of the file.
- Add a new message to indicate that the path is not a file.
2017-05-07 17:04:53 -04:00
Jim Derry fd77312175 Attempt to address issue #352. This patch correctly address the specific issues
in #352, but I'm worried that there's some over-reach here.

Currently only implemented as a warning, with no switch to turn it off, which
maintains current behavior other than the warning.

In general, we're treating any string as a complete URL, rather than breaking
URL's into component parts. Thus the `IsURLCodePoint()` check includes a few
other generic characters that strictly speaking aren't valid codepoints, but
are valid as escape characters and delimiters.

When addressing #338, I ran into a similar situation in not having a built-in
method to separate path components (although a simple generalized solution was
good enough in that case).

Thus without introducing a new structure and functions to deconstruct a URL
into scheme, authority, path, parameters, etc., some variation of this patch
will have to be used to address #352.
2017-05-06 18:54:42 -04:00
Jim Derry 09d1802298 Merge branch 'next' into deprecations 2017-05-06 14:34:48 -04:00
Geoff McLane fd2400d55b Merge pull request #543 from htacg/issue-436
Small documentation change to close #436
2017-05-06 15:44:45 +02:00
Geoff McLane d4978608e7 Merge pull request #537 from deathbaba/next
Correctly process 'bookmarks' in html exported from Google Doc.
2017-05-06 15:35:57 +02:00
Geoff McLane 6839dfe601 Merge pull request #541 from htacg/issue_338
Issue #338 - fix 3 spurious access level 3 warnings...
2017-05-06 15:20:55 +02:00
Geoff McLane 6da0fff256 Merge pull request #532 from lhchavez/add-warn-prop-attrs
Add a flag to warn on proprietary attributes
2017-05-06 14:48:36 +02:00
Jim Derry 846b3cde55 Address #436 just to close it. 2017-05-04 13:45:06 -04:00
Geoff McLane d142527a8e Issue #338 - Deal with two other spurious access warnings 2017-05-04 17:36:39 +02:00
Jim Derry 49b833f63b WIP 2017-05-03 18:16:03 -04:00
Jim Derry 8b2f92f625 Issue #338 occurs because the existing routines assume that any URI with an
extension is a file, and so links to TLD's ending with .pl, .au, etc., will
cause accessibility warnings. This fix attempts to distinguish between URI's
that are likely to be files versus links to domains.
2017-05-03 16:15:44 -04:00
Geoff McLane b03598652f Issue #461 - alternative patch for this issue 2017-05-02 19:39:16 +02:00
Alexander Zolotarev 87169d8953 Correctly process 'bookmarks' in html exported from Google Doc. 2017-04-19 14:47:27 -10:00
lhchavez a19d271f47 Add a flag to warn on proprietary attributes
This change adds the TidyWarnPropAttrs flag (default=on) that emits a
warning every proprietary attribute it finds.
2017-04-15 03:17:16 +00:00
Geoff McLane d8839485a4 Merge branch 'next' of github.com:htacg/tidy-html5 into next 2017-04-09 02:09:19 +02:00
Geoff McLane 219a5c797b Issue #524 - Remove obsolete message 2017-04-09 02:08:03 +02:00
Jim Derry d1e0b22be7 Removed TidyDropFontTags. Note that POs and POT were _not_ updated. 2017-04-04 14:42:47 -04:00
Jim Derry 24afc6a6fa Fixed some casting issues that Ubuntu object to.
- Test on macOS, Win10, Ubuntu.
- No version bump for this change.
2017-04-04 14:33:56 -04:00
Geoff McLane 22dcea067e Issue #335, maybe #333, to output indent char, reduce if tab 2017-03-26 16:57:29 +02:00
Geoff McLane 5f88452487 Issue #333 - create exception for span/meta 2017-03-26 16:57:29 +02:00
Jim Derry 5f05add439 Continue the documentation effort!
- Many, many updates to the public header files.
- tidyenum.h was reorganized substantially in order to better generate
  documentation with Doxygen.
- This was also a good time to clean up all of the various enums for languages
  and strings. Everything is simple and in a single enum now, other than a
  couple of cases (TidyOptionId, for example, doesn't need to be redefined).
- A full and complete audit of the strings meant some opportunities to delete
  useless strings.
- Reorganized the order of the strings in language_en.h in order to better
  find things when programmers want to make changes. There are a lot fewer
  internal "sections" now, and everything has been painstakingly sorted within
  the remaining sections.
- Consequently rebased all of the PO's, POT, and other language files.
- Updated several of the READMEs with the newest information.
- Made the READMEs easier to copy into the Doxygen project by changing some of
  the code format for compatibility, mainly the use of tildes instead of
  backslashes for code blocks.
- Added tidyGetMessageCode() to message API. Despite the huge diff, this is the
  only externally-visible change, other than removing some enums (but not their
  values!).
- Passing `next` tests on Mac, Linux, Win10.
2017-03-22 16:05:13 -04:00
Jim Derry 929575afb4 Picklist enums should all start at zero for external LibTidy user compatibility.
Restore the new custom-tags enum to this state, and add separate string keys.
Updated PO's to reflect; no change to headers.
2017-03-20 12:20:51 -04:00
Jim Derry a4f752f274 Implement TODO:
- tidyDetectedHtmlVersion()
- tidyDetectedXhtml()
- added two new fields to W3C_Doctypes[] in order to simplify this.
- added TY_(HTMLVersionNumberFromCode)() to enable lookup.
- Implement tidyDetectedGenericXml()
- Added a warning message if an XML declaration exists but the document is not
  XHTML.
- Remove dead commented code.
- Updated POs and POT. Headers not affected, but translators should check
  their translations.
- Testing is clean on Mac OS X, Ubuntu 16.04, and Windows 10.
2017-03-19 15:41:51 -04:00
Jim Derry 13122e8862 Added tidyErrorCodeFromKey()
Added tidyGetMessageDoc()
Improved the Public API documentation.
2017-03-19 08:15:32 -04:00
Geoff McLane c8f366b76e Issue #119 - Remove 3 newline chars, that crept in... 2017-03-18 18:52:48 +01:00
Jim Derry da55a6e4ac Removed unused declaration. 2017-03-16 08:00:05 -04:00
Jim Derry 0c5550b06f I think the messages are where I want them to be. Will generate test cases
for comparison. Also regen'd all pots and language headers.
2017-03-15 17:36:05 -04:00
Jim Derry 5606f32f13 WIP; messaging much more logical, except @todo noted. 2017-03-14 21:50:10 -04:00
Jim Derry 66ade9def4 Still noisy, but adds HTML5 dependent output message upon detection. 2017-03-14 16:27:11 -04:00
Jim Derry ed5a1d84ea Add TY_(nodeIsAutonomousCustomTag), so we can use it elsewhere. 2017-03-14 15:44:46 -04:00
Jim Derry 8273491e16 Change allowed values for custom-tags, and make y equal to inline. 2017-03-14 15:16:11 -04:00
Jim Derry 66de84bc2b - Add support for the is attribute.
- Add support for autonomous custom elements.
2017-03-13 13:45:32 -04:00
Jim Derry 11178d775b Massive Revamp of the Messaging System
This is a rather large refactoring of Tidy's messaging system. This was done
mostly to allow non-C libraries that cannot adequately take advantage of
arg_lists a chance to query report filter information for information related
to arguments used in constructing an error message.

Three main goals were in mind for this project:

- Don't change the contents of Tidy's existing output sinks. This will ensure
  that changes do no affect console Tidy users, or LibTidy users who use the
  output sinks directly. This was accomplished 100% other than some improved
  cosmetics in the output. See tidy-html5-tests repository, the `refactor` and
  `more_messages_changes` branches for these minor diffs.
- Provide an API that is simple and also extensible without having to write new
  error filters all the time. This was accomplished by adding the new message
  callback `TidyMessageCallback` that provides callback functions an opaque
  object representing the message, and an API to query the message for wanted
  details. With this, we should never have to add a new callback routine again,
  as additional API can simply be written against the opaque object.
- The API should work the same as the rest of LibTidy's API in that it's
  consistent and only uses simple types with wide interoperability with other
  languages. Thanks to @gagern who suggested the model for the API in #409.
  Although the API uses the "Tidy" way off accessing data via an iterator
  rather than an index, this can be easily abstracted in the target language.

There are two *major* API breaking changes:

- Removed TidyReportFilter2
  - This was only used by one application in the entire world, and was a hacky
    kludge that served its purpose. TidyReportCallback (né TidyReportFilter3)
    is much better. If, for some reason, this affects you, I recommend using
    TidyReportCallback instead. It's a minor change for your application.
- Renamed TidyReportFilter3 to TidyReportCallback
  - This name is much more semantic, and much more sensible in light of
    improved callback system. As the name implies, it remains capable of
    *only* receiving callbacks for Tidy "reports."

Introducing TidyMessageCallback, and a new message interrogation API.

- As its name implies, it is able to capture (and optionally suppress) *all*
  of Tidy's output, including the dialogue messages that never make it to
  the existing report filters.
- Provides an opaque `TidyMessage` and an API that can be used to query against
  it to find the juicy goodness inside.
  - For example, `tidyGetMessageOutput( tmessage )` will return the complete,
    localized message.
  - Another example, `tidyGetMessageLine( tmessage )` will return the line the
    message applies to.
- You can also get information about the individual arguments that make up a
  message. By using the `tidyGetMessageArguments( tmessage )` itorator and
  `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument`
  which has its own interrogation API. For example:
    - tidyGetArgType( tmessage, &iterator );
    - tidyGetArgFormat( tmessage, &iterator );
    - tidyGetArgValueString( tmessage, &iterator );
    - …and so on.

Other major changes include refactoring `messages.c` to use the new message
"object" directly when emitting messages to the console or output sinks. This
allowed replacement of a lot of specialized functions with generalized ones.

Some of this generalizing involved modifications to the `language_xx.h` header
files, and these are all positive improvements even without the above changes.
2017-03-13 13:28:57 -04:00
Jim Derry 4dc8a2cf9a Bump version to 5.5.5 for this fiasco, and fix poor planning and unfortunate
merge.
  - Sort all of the existing options and re-indent per Tidy standards. This is
    simply for cosmetic effect.
  - Allow the iterator to return all options again, even "internal" options.
    Things are too embedded with N_TIDY_OPTIONS, etc., to try to hide them.
  - Instead, simply add documentation to LibTidy users that they shouldn't use
    internal options.
  - Also added `TidyInternalCategory` to `TidyConfigCategory` without adding a
    new field to the struct. API users should check for this category before
    use.
  - Defined a two character macro for `TidyInternalCategory` for use in
    `option_defs[]`.
  - Changed struct `option_defs[]` to reflect the new category for affected
    options.
  - Removed string indicating * refers to internal options, since it no longer
    applies.
  - Regen'd all strings for previous point.
  - `tidy.c` now checks for `TidyInternalCategory` everywhere in order to
    suppress output.
2017-03-10 09:13:21 -05:00
Jim Derry ac242e9ea4 hotfix 2017-03-09 19:56:16 -05:00
Jim Derry e27cc262fe Bring the local vars into the context, which is allowed in C89. 2017-03-09 12:44:48 -05:00
Jim Derry 005127c733 Address issue #472. 2017-03-08 15:37:01 -05:00
Jim Derry 978756a482 Restore the previous status of gnu-emacs-file
- Updated strings files to match.
- Inhibit internal options from being output via the iterator. Internals should
  never have the chance to be exposed if they shouldn't be use.
- Added tidySetEmacsFile() and TidyGetEmacsFile() to the public API, and use it
  instead of secret API to set the filename in the console application.

The end result is that `gnu-emacs-file` (and also `doctype-mode`) officially no
longer exist to CLI users nor to API users, and tidy console behaves properly
by using a published API to set the filename for emacs.
2017-03-07 20:11:31 -05:00
Jim Derry 03f0192f51 How did this get back in there??? 2017-03-04 15:31:25 -05:00
Jim Derry 74a4fa4049 Merge branch 'next' into clean_deprecations 2017-03-02 11:40:14 -05:00
Jim Derry 3be515b1f9 Merge branch 'next' into messages_squashed 2017-03-02 09:34:58 -05:00
Jim Derry 92621d6f99 MSVC Compatibility
- Changed location of pointer operator in declarations.
  - Updated `CODESTYLE.md` to reflect this.
  - Updated `API_AND_NAMESPACE.md` to reflect this.
2017-03-02 09:32:02 -05:00
Geoff McLane a49890ee55 Issue #498 - parser.c - if a <table> in a <table> just close.
The previous action was to discard the second, while it is the second
table that browsers will render.

This conforms to the principle that the html output by tidy should render
in a browser like the original html.
2017-02-24 16:20:10 +01:00
Geoff McLane c4b5904e1c Issue #497 - lexer.c - Add comment for this PR @seaburg 2017-02-24 14:38:20 +01:00
Geoff McLane e44f4d1469 Merge pull request #497 from seaburg/fix_value_trimming
Fix leading white spaces trimming
2017-02-24 14:30:39 +01:00
Geoff McLane 27fe0548b9 Issue #468 - config.c - use RAW encoding for all cases 2017-02-23 16:28:19 +01:00
Geoff McLane 569ae4b435 Issue #329 - lexer.c - do not discard this newline here 2017-02-23 15:27:03 +01:00
Evgeniy Yurtaev bb1d62d3bd Fix leading white spaces trimming 2017-02-22 14:34:40 +03:00
Jim Derry c54c10f857 - Removed deprecated options:
- TidySlideStyle
  - TidyBurstSlides

- Added documentation for TidyEmacsFile, since it's a valid option.

- Because TidyEmacsFile is a valid option, tweaked tidy.c so that it can
  be specified in a configuration file without being overwritten by the console
  app. Why a user might do this is dumb, but who are we to stop them.
2017-02-18 18:30:41 -05:00
Jim Derry edc548095c Removed language as tidy config option; it is only CLI option. 2017-02-18 17:16:35 -05:00
Jim Derry cbb8354f74 Combined leftover attribute API stuff into single, new file. 2017-02-18 16:57:11 -05:00
Jim Derry f6ce4d130e Removed deprecated tidyAttrGetSOMETHING from API. 2017-02-18 16:46:20 -05:00
Jim Derry 13c6387f47 Removed deprecated AttributeIsSOMETHING from API. 2017-02-18 16:43:47 -05:00
Jim Derry a16f36ce53 Removed deprecated NodeIsElementName from API. 2017-02-18 16:33:21 -05:00
Jim Derry 165acc4f3e Several foundational changes preparing for release of 5.4 and future 5.5:
- Consolidated all output string definitions enums into `tidyenum.h`, which
    is where they belong, and where they have proper visibility.
  - Re-arranged `messages.c/h` with several comments useful to developers.
  - Properly added the key lookup functions and the language localization
    functions into tidy.h/tidylib.c with proper name-spacing.
  - Previous point restored a *lot* of sanity to the #include pollution that's
    been introduced in light of these.
  - Note that opaque types have been (properly) introduced. Look at the updated
    headers for `language.h`. In particular only an opaque structure is passed
    outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName
    to poll these objects.
  - Console application updated as a result of this.
  - Removed dead code:
    - void TY_(UnknownOption)( TidyDocImpl* doc, char c );
    - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file );
  - Redundant strings were removed with the removal of this dead code.
  - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER
    depend on enum values. `TidyReportLevel` is an example of such.
  - Some enums were removed as a result of this. `TidyReportLevel` now has
    matching strings, so the redundant `TidyReportLevelStrings` was removed.
  - All of the PO's and language header files were regenerated as a result of
    the string cleanup and header cleanup.
  - Made the interface to the library version and release date consistent.
  - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to
    be able to remove console-only code from LibTidy (for LibTidy users).
  - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-17 15:29:26 -05:00
Jim Derry e1f066fe14 Merge branch 'empretty_script' 2017-02-13 08:49:13 -05:00
Jim Derry b7c84b1b57 Merge branch 'surrogates' 2017-02-13 08:49:06 -05:00
Geoff McLane ea49ca0b1d Fix license for SPRTF modules.
Also correct the coding style to conform to HTML Tidy standard.
2017-02-12 17:38:44 +01:00
Geoff McLane 7f73d4f429 Issue #483 - Add ReportSurrogateError() service and connect. 2017-02-11 18:33:45 +01:00
Geoff McLane 75bc1f06c7 More updates for Issue #483 - Start warning msgs - WIP 2017-02-09 20:55:23 +01:00
Jim Derry 1ac50fccb3 Pretty up output of empty script tags.
- No longer break script tags up on two lines if there is content. However
    output is still subject to the `--wrap` behavior.
  - Previous behavior intact if there is content.

Todo.

  - Associate this with a new Tidy option.
2017-02-08 13:53:37 -05:00
Geoff McLane 9dc76c1e77 Issue #483 - Some fixes for error condition 2017-02-02 16:43:10 +01:00
Geoff McLane 259d330780 Issue #483 - First cut dealing with 'surrogate pairs'.
Only deals with a successful case.

TODO: Maybe add a warning/error if the trailing surrogate not found, and
maybe consider substituting to avoid invalid utf-8 output.
2017-02-01 13:50:33 +01:00
Geoff McLane deebc93f97 Merge pull request #480 from onnimonni/feature-fix-xmlns-xlink
Add optional xmlns:xlink attributes as valid to support inline svg
2017-01-29 19:17:43 +01:00
Onni Hakala da27b5e339
Add optional xmlns:xlink attributes as valid to support inline svg
fixes #478
2017-01-09 01:38:16 +02:00
Marcos Caceres 91da8c6f74 style: ansi conforming comments 2016-12-20 16:51:09 +11:00
Geoff McLane fd0ccb2bbf Bad, repeated node iteration! closes #459 2016-10-30 23:37:31 +01:00
Marcos Caceres aff76bec38 fix(lexer.c): fixes from initial review 2016-10-17 17:00:58 +11:00
Marcos Caceres 523d58b004 refactor: ask for charset and http_equiv attrs 2016-10-06 19:30:23 +11:00
Marcos Caceres 932cc104a6 feat(attrask.c): learn about charset attr 2016-10-06 19:29:56 +11:00
Marcos Caceres 53ee94ddba fix: incorrect check for first element in head 2016-10-06 19:07:44 +11:00
Marcos Caceres b1629c4a4f fix(lexer): bad attribute reporting 2016-10-05 20:22:19 +11:00
Marcos Caceres 2d7ddfef94 Part 2.1 - Bug fixes and warning 2016-10-05 20:14:18 +11:00
Marcos Caceres cfc22ac46e Add garvankeeley's suggestions using calloc 2016-10-05 18:54:25 +11:00
Marcos Caceres 040c22c6dc Part 2 - Implement lexer logic 2016-10-04 21:23:57 +11:00
Marcos Caceres 169bd38adf Part 1 - Add basic infra for 'add-meta-charset' option 2016-10-04 17:56:29 +11:00
Geoff McLane d81a9ad901 Merge branch 'issue-428'
Conflicts:
	version.txt

This closes #428
2016-09-11 16:57:07 +02:00
Marcos Caceres e4ae9c064d Add support for link 'as' attribute (closes #449) 2016-08-23 18:46:04 +10:00
Geoff McLane 80e57b23bf Merge branch 'master' into issue-428
Conflicts:
	version.txt
2016-08-09 00:46:40 +02:00
Geoff McLane 7631f25ed2 rebase issue-428 2016-08-02 18:10:19 +02:00
Adam Majer 50557a4f63 Fix static buffer overrrun (issue #443)
result[6] is a fixed array of size 6, but in the process
of copying data into it, we clobber the last allocated byte.

Simplify some of the code by not calling redundant functions.
2016-08-02 11:10:45 +02:00
Benjamin Esham 54179386be Add support for the "integrity" attribute
This attribute may be used on "link" and "script" elements. See
http://www.w3.org/TR/2016/REC-SRI-20160623/#element-interface-extensions
2016-07-24 10:24:30 -04:00
Michal Čihař 10281040ca Avoid crash in tidyCleanAndRepair if document was not loaded
These services can only be used when there is a document loaded, ie a
lexer created.  But really should not be calling a Clean and Repair
service with no doc!
2016-07-07 16:38:05 +02:00
Geoff McLane 685f7a6c5b Issue #428 - Avoid adding form to input if html5 2016-07-02 20:13:01 +02:00
Geoff McLane 7bec2c2082 Merge pull request #422 from sesom42/master
prevent buffer overflow in debug output
2016-06-30 18:32:55 +02:00
Geoff McLane 97700044ce Merge pull request #410 from gagern/varargs
Pair va_copy calls with va_end
2016-06-18 18:53:53 +02:00
Jens Tautenhahn 84fc451a78 prevent buffer overflow in debug output 2016-06-14 15:42:18 +02:00
Benjamin Esham 941b763a8d Add support for "crossorigin" on audio too 2016-06-08 19:40:15 -04:00
Benjamin Esham d9d8e92e52 Allow "crossorigin" on img, script, and video tags too 2016-06-07 22:29:57 -04:00
Benjamin Esham 9377f65f89 Add support for the HTML5 "crossorigin" attribute
This attribute can only be used on "link" elements.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/link#Attributes
2016-06-07 22:20:10 -04:00
Martin von Gagern 04bc8d3195 Pair va_copy calls with va_end
According to the specs, each va_copy call should be matched by a va_end call
to ensure proper cleanup.  Furthermore, since message filters might iterate
over the list of arguments, we should hand a new copy to each filter.
2016-05-17 22:37:32 +02:00
Raphael Ackermann b704a4d0d4 allow zero LI in UL when html5. fix for #396 2016-04-08 23:08:56 +02:00
Geoff McLane 61a0a331fc Issue #390 - fix indent with --hide-endtags yes.
The problem was, with --hide-endtags yes, a conditional pprint buffer
flush had nothing to flush, thus the indent was not adjusted.

To track down this bug added a lot of MSVC Debug code, but is only
existing if some additional items defined, so has no effect on the release
code.

This, what feels like a good fix, was first reported about 12 years ago by
@OlafvdSpek in SF Bugs 563. Hopefully finally closed.
2016-04-04 18:13:08 +02:00
Geoff McLane 7598fdfff2 avoid DEBUG duplicate newline 2016-04-03 17:54:46 +02:00
Geoff McLane 7777a71913 Issue #369 - Remove Debug asserts 2016-03-31 14:50:03 +02:00
Geoff rpi McLane 086e4c948c remove gcc comment warning 2016-03-30 15:02:19 +00:00
Geoff McLane 59d6fc7022 Issue #377 - If version XHTML5 available, return that. 2016-03-30 16:28:08 +02:00
Geoff McLane 1830fdb97c Issue #384 - insert comments 2016-03-30 14:18:04 +02:00
Geoff McLane 4b135d9b47 Merge pull request #384 from seaburg/master
Fix skipping parsing character
2016-03-30 14:08:40 +02:00
Geoff McLane e87f26c247 Merge pull request #388 from htacg/fr.po
Merge fr.po to master
2016-03-27 19:54:54 +02:00
Jim Derry 7d2ddee775 Add new rebase command to CLI.
This is intended to make it very, very easy to update the POT and all of the POs when
changes are made to `language_en.h`. Used without an sha-1 hash, untranslated strings
(i.e., the "source" strings) are updated in the POT/PO's.

However if you specify an --sha=HASH (or -c HASH) option, then the script will use git
to examine the `language_en.h` file from that specified commit, determing the strings
that have changed, and mark all of these strings as `fuzzy` in the POs. This will serve
as a flag to translators that the original has changed. In addition, this `fuzzy` flag
will appear in the headers as "(fuzzy) " in the item comments.

If a translator edits the header directly, he should remove the "(fuzzy )" in the
comment. Then when the PO is rebuilt, the fuzzy flag will be removed automatically.
The reverse is also true; if a translator is working with the PO, he or she should
clear the fuzzy flag and the comment will be adjusted accordingly in the generated
header.
2016-03-25 09:21:21 +08:00
Geoff McLane 8671544beb Issue #383 - Add a WIP language_fr.h to facilitate testing 2016-03-24 14:15:43 +01:00
Geoff McLane 5feca8cfd6 Issue #383 - correct another byte-by-byte output to message file.
As in the previous case these messages are already valid utf-8 text, and
thus, if output on a byte-by-byte basis, must not use WriteChar, except
for the EOL char.

Of course this output can be to either a user ouput file, if configured,
otherwise stderr.
2016-03-24 14:15:43 +01:00
Jim Derry ad7bdee3b9 Added translator comments to new TidyEscapeScripts option, and updated POT and POs to reflect this. 2016-03-24 11:00:47 +08:00
Jim Derry 71d6ca1392 Oops. Didn't commit es changes. This fixes that. 2016-03-23 15:10:07 +08:00
Jim Derry d54785c933 language help enhancements:
- Show the language Tidy is using.
- Update the POT and POs with the modified string.
- Regen language_es.h, which uses the string.

Note that the new header uses the new commentless behavior that's still
pending in another branch. In addition the proper c style hints have
been added to all PO's, as their previous absense was a bug.
2016-03-23 14:56:36 +08:00
Jim Derry 2cf03f7fa9 Fix two character lang codes not working. 2016-03-23 14:38:17 +08:00
Geoff McLane 000c6925bd Issue #348 - Add option 'escape-script', def = yes 2016-03-20 01:01:46 +01:00
Geoff McLane e6f1533d89 Issue #383 - Output message file text byte-by-byte 2016-03-18 18:47:00 +01:00
Evgeniy Yurtaev 7d28b21e60 Fix skipping parsing character 2016-03-17 23:30:11 +03:00
Geoff McLane 8dda04f1df Issue #379 - Care about 'ix' going negative.
How this lasted so long in the code is a mystery! But of course it will
only be a read out-of-bounds if testing the first character in the lexer,
and it is a spacey char.

A big thanks to @gaa-cifasis for running ASAN tests on Tidy.
2016-03-06 17:36:51 +01:00
Geoff McLane 8eee85cb9e Issue #380 - Experimental patch in issue-380 branch 2016-03-05 17:39:14 +01:00
Geoff McLane 0e6ed639d6 Issue #380 - Add more MSVC debug 2016-03-04 19:28:49 +01:00
Geoff McLane d091027089 Issue #377 add debug only output of constrained versions 2016-03-03 20:21:35 +01:00
Geoff McLane 7bdc31af76 Issue #377 - Table summary attribute also applies to XHTML5 2016-02-29 19:58:55 +01:00
Geoff McLane 24c62cf0df Issue #314 - Avoid head warning if show-body-only 2016-02-29 18:49:15 +01:00
Geoff McLane 23e689d145 Issue #373 - Merge branch 'issue-373' of github.com:htacg/tidy-html5 into issue-373
Conflicts: version.txt - set version 5.1.41issue-373
2016-02-18 15:18:39 +01:00
Geoff McLane 8c13d270ed Merge branch 'master' of github.com:htacg/tidy-html5 2016-02-18 13:58:23 +01:00
Geoff McLane b91d52592b Fix to K&R C to compile with MSVC 2016-02-18 13:57:47 +01:00
Jim Derry 63c0327de1 Fixed typo in output strings. 2016-02-18 15:40:10 +08:00
Jim Derry e00f419f5d Discovered some missing strings from tidyErrorFilterKeysStruct. 2016-02-18 10:19:57 +08:00
Jim Derry da8205b2dc Regen'd POT, POs, and headers in order to capture documentation changes in all of them. 2016-02-17 20:07:00 +08:00
Jim Derry 7fbe76be0b Finished semantic html. 2016-02-17 20:02:38 +08:00
Jim Derry a78daccd3c Through TidyIndentSpaces. 2016-02-17 17:43:09 +08:00
Jim Derry a16e89c4f8 Updated translator comments. 2016-02-17 17:27:57 +08:00
Jim Derry d30c2d7747 XSL for man handles <var>. Updated comment and sample string. 2016-02-17 17:20:02 +08:00
Jim Derry cc59efb23d Add a xml-error-strings service to console app providing symbols developers can use with TidyErrorFilter3. 2016-02-17 12:35:20 +08:00
Jim Derry bc1e54d5b5 Externalize the TidyReportFilter3 error codes, and provide iterators to loop through them. 2016-02-17 12:27:11 +08:00
Jim Derry 720d5c25d2 Squelch compiler warning default type. 2016-02-17 10:56:21 +08:00
Jim Derry 97abad0c05 Bump to 5.1.39 for merging.
Merge branch 'master' into attrdict_phase2
2016-02-16 11:11:36 +08:00
Jim Derry 3431dd05a4 Merge branch 'master' into attrdict_phase1
Bump version to 5.1.38
2016-02-16 11:07:32 +08:00
Jim Derry 1e4f7dd0f1 Merge pull request #368 from htacg/issue-341
Issue #341
2016-02-16 10:18:26 +08:00
Geoff McLane 9cf97d536b Issue #373 - Avoid a null added to output.
This bug was first openned in 2009 by Christophe Chenon, as bug sf905 but
the patch provided then never made it into the source.

Now appears fixed, 7 years later!
2016-02-15 13:02:10 +01:00
Geoff McLane a4f425546f Improve MSVC DEBUG output.
Previous only output the first 8 characters, followed by an elipse if more
than 8. Now return first up to 19 chars. If nore than 19, return first 8,
followed by an elipse, followed by the last 8 characters.

This is in the get_text_string service, which is only used if MSVC and not
NDEBUG.
2016-02-14 18:17:46 +01:00
Jim Derry c62127b9bd Default to NO at this point. 2016-02-13 12:33:02 +08:00
Jim Derry 8b5771cf24 Word2000
Added messages that would otherwise be missed in post-processing, after cleanup.
2016-02-13 12:26:19 +08:00
Jim Derry 2cdedb4a63 Forgot one file... 2016-02-13 11:53:53 +08:00
Jim Derry 896b00238b Forgot one file... 2016-02-13 11:53:40 +08:00
Jim Derry 2ade3357a9 Phase 2
This is a MUCH SANER approach to what I was trying to do (now that I screwed up enough internals to understand some of them!
At this point there are zero exit state reversions, and zero markup reversions! There are still 21 errout reversions; I'll
annotate and adjust as necessary.
2016-02-13 11:31:16 +08:00
Jim Derry e947d296e4 Handle some issues with misusing VERS_HTML5 in the doctype. 2016-02-12 20:49:14 +08:00
Jim Derry c81a151da5 Add VERS_STRICT to identify future strict document types. 2016-02-12 20:46:49 +08:00
Jim Derry 74604fd52b Hard-coded checks are redundant with updates to attrdict.c. 2016-02-12 20:44:03 +08:00
Jim Derry 429703dce4 Because the previous effort #350 grew too fast and there was a LOT of side effects to
my changes, I'm starting over with this. Comments in the PR thread.

This commit reduces the size of attrdict.c while causing only a single errout
regression that is justified.
2016-02-12 19:34:19 +08:00
Geoff McLane 03a643f781 Issue #341 - No token can be inserted if istacksize == 0! 2016-02-08 15:12:23 +01:00
Geoff McLane 7d0d8a853a Issue #345 - discard leading spaces in href 2016-02-01 20:07:55 +01:00
Geoff McLane 7f0d5c31e6 If no doctype, allow user doctype to reset table - Issue #342 2016-02-01 19:44:30 +01:00
Geoff McLane c1f94c066c Tidy up some debug only code.
After @sria91 added #360 merge, added a little more improvement...
2016-01-30 20:51:27 +01:00
Srikanth Anantharam 9a0af48a4e fixed a NULL node bug in debug build 2016-01-30 22:03:52 +05:30
Jim Derry 9ae15f45a7 Consistent tabs
Fixed tabs in template file, and regen'd all related files.
2016-01-30 15:51:54 +08:00
Jim Derry 53f2a2da2a msgunfmt works properly with escaped hex. 2016-01-30 15:51:53 +08:00
Martin von Gagern 17e50f2642 Encode UTF-8 strings to hex escapes in header files 2016-01-30 15:51:53 +08:00
Jim Derry bf70824cc2 - Add TidyReportFilter3, which removes translation strings completely from the equation. It would be a good idea to deprecate TidyReportFilter2, which is vulnerable to changing strings in Tidy source.
- Documentation reminders for future enum changes.
- Documentation updates.
2016-01-30 15:51:53 +08:00
Jim Derry d505869910 Localization Support added to HTML Tidy
- Languages can now be added to Tidy using standard toolchains.
- Tidy's help output is improved with new options and some reorganization.
2016-01-30 15:51:53 +08:00
Jim Derry 26e7d9d4b0 Fixes Mac OS X encoding issues and harmonizes output across platforms.
Previously Tidy produced different output based on the compilation target, NOT based on
the file encoding and specified options. Every platform was equal except Mac OS. Now unless
the encoding is specifically set to a Mac file type, all encoding assumptions are the same
across platforms.
2015-12-31 13:57:34 +08:00
Geoff McLane 78f2d52cdd Issue #308 - remove bad warn, bad assert, and free discarded 2015-12-05 15:03:41 +01:00
Geoff McLane 9caecb80cf Revert "Fix for head closing tag not reported (#327)"
This reverts commit 61cfcb1555.

This added an inconsistent warning about a missing optional close tag. In
general tidy does not report such optional close tags. See issue #327 for
some discussion on this.
2015-12-05 12:59:43 +01:00
Geoff McLane 3b13cd8076 Merge branch 'mingw-build' 2015-12-03 19:18:07 +01:00
Jim Derry 61cfcb1555 Fix for head closing tag not reported (#327) 2015-11-29 13:21:49 +08:00
Jim Derry 873794162a Callback added to XML printer, too; fixed off-by-one error. 2015-11-29 07:39:33 +08:00
Geoff McLane dc969f30d5 Issue #311 - small changes for MinGW32 build 2015-11-28 15:14:53 +01:00
Jim Derry 4adc07fd65 Removed the one callback per line filter. Library user can filter this himself. 2015-11-28 15:43:34 +08:00
Jim Derry dcd8f16f73 Tidying progress callback implemented. 2015-11-28 15:34:23 +08:00
Jim Derry 34d456aa80 Make pretty printer keep track of line numbers as it prints. 2015-11-28 14:16:17 +08:00
Jim Derry 9834cc17ad Style cleanup for previous commit. 2015-11-27 09:45:26 +08:00
Jim Derry 1c963acb58 Merge branch 'master' into fix_img_alt 2015-11-27 09:36:32 +08:00
Jim Derry 933fc3d236 - Addresses #320
- Different error output depending on whether or not the `alt-text` option was given a value.
2015-11-26 13:23:43 +08:00
Jim Derry 63234735d8 Allows null value css-prefix to be used in a config file without issuing a warning. 2015-11-26 11:21:48 +08:00
Ben Bullock 71d9638448 Don't push back non-A tokens. 2015-11-25 18:00:45 +09:00
Christopher Brannon 1ef5ba7968 Fix a tiny buffer overflow. 2015-11-23 12:28:00 -08:00
Geoff McLane b58aa1c26a Issue #307 - add a ref link in comments 2015-11-22 20:43:12 +01:00
Geoff McLane 2388fb0175 Issue #307, #167, #169 - regression of nestd anchors 2015-11-22 18:46:00 +01:00
Geoff McLane bbc72a9297 Issue #306 - fix an old typo hidden by a cast!
Thanks to @benkasminbullock for spotting this fix.
2015-11-18 20:01:21 +01:00
Geoff McLane e2feed485c gcc warning - if 0 an unused static table 2015-11-18 17:06:13 +01:00
Geoff R. McLane b98061ff62 fix gcc warning parentheses in pprint.c 2015-11-18 16:47:58 +01:00
Geoff McLane 768ad46968 Issue #304 - remove duplicated TidyAttr_ARIA_ORIENTATION 2015-11-17 15:06:23 +01:00
Shane McCarron c0b769c5c7 Initial cut at RDFa support (again)
New branch that implements support for RDFa attributes.  Should be
cleaner than my first attempt in PR #299 - also references issue #209
2015-11-16 11:29:23 -06:00
Paul Howarth baad0b0064 Don't mangle the output filename
Attached patch works for me, and shouldn't affect any other option
processing.
2015-11-11 11:28:47 +01:00
Geoff McLane c68ad42482 Revert 22a1922c35 2015-11-07 14:50:10 +01:00
Shane McCarron c572e3e3c8 Initial cut at supporting RDFa attributes. 2015-11-06 12:19:05 -06:00
Geoff McLane 800b91e576 Issue #65 - effect name change to skip-nested, and default to on 2015-11-05 15:19:39 +01:00
Jim Derry 32ce272f75 Fix indent-with-tabs for library use. 2015-11-04 12:44:15 +08:00
Jim Derry dec6356a6f Deleted multiple equal id attributes. 2015-11-02 15:31:47 +08:00
Jim Derry d0ac990636 More description beautification. 2015-11-02 12:06:37 +08:00
Jim Derry 807fed4ff6 Documentation improvements. 2015-11-01 19:05:03 +08:00
Jim Derry 2613f02dc5 More documentation beautification. 2015-10-31 22:03:33 +08:00
Jim Derry 565d2ec232 Documentation beautification underway. 2015-10-31 18:30:02 +08:00
Jim Derry cf3c0293c0 Additional tests with our troublesome option. 2015-10-31 14:45:51 +08:00
Jim Derry 8c5fae8c09 - documentation/quickref.xsl
- Includes <p> support
  - Matches the description class name in quickref.include.xsl
  - Styles <br /> to enforce vertical spacing (in the reference table only).
- documentation/style.css
  - Styles <br /> to enforce vertical spacing (in the reference table only).
- documentation/tidy1.xsl.in
  - Includes <p> support.
  - Better manages line breaks with .sp1 instead of .br.
- src/localize.c
  - Legibility to the troublesome `drop-font-tags` description.
2015-10-30 23:58:43 +08:00
Jim Derry 709ac8cb4c Support HTML in descriptions. 2015-10-30 18:17:40 +08:00
Jim Derry 09b0698c56 Typo. 2015-10-30 12:58:11 +08:00
Jim Derry a3138cb142 URL cleanup. 2015-10-30 12:23:20 +08:00
Jim Derry 2d0f971747 Update documentation to address #288. 2015-10-30 10:19:47 +08:00
Geoff McLane c8751f60e7 Issue #286 - use AddByte for internal transfer 2015-10-20 15:04:18 +02:00
Geoff McLane d75c82275d Issue #285 - Add a ResetTags func to erset html5 mode before each document 2015-10-14 16:55:35 +02:00
Geoff McLane adbad0379e Issue #65 - if nonested then no endtag needed to decrement.
This is only if nonested is on, then a <script> tag has not incremented
the nested, so likewise no need to treat an escaped close tag <\/script>
as an end tage to decrement nested.
2015-10-08 17:06:03 +02:00
Geoff McLane 7e69ceb3d1 Issue #281 - only warn BAD_CDATA_CONTENT if inserting an escape. 2015-10-07 16:17:42 +02:00
Geoff McLane b63c1090c2 option to avoid incrementing nested comtainers.
This is in the GetCDATA function. If the container is script or style and
this option is on, avoid bumping nested.

This addresses issues #65 (1642186) and #280.

All attempts at parsing script data are now abandoned as a bad direction.
2015-10-07 15:11:25 +02:00
Geoff McLane b4efe7464a small enhancement of debug only code 2015-10-05 15:08:20 +02:00
Geoff McLane 6c1a2acea2 #273 - avoid xhtml doctype flip/flop 2015-09-27 17:36:57 +02:00
Christopher Brannon 94b0647c08 Issue #65, fix for ignoring cdata. 2015-09-24 18:13:57 -07:00
Geoff McLane 04ca419080 Issue #64 - Try hard to skip '<![CDATA[ ... ]]>' 2015-09-24 14:21:55 +02:00
Geoff McLane 96589c6f57 #65 Skip esc'd esc, and only for script containers 2015-09-21 12:33:53 +02:00
Geoff McLane eda37c5adb Issue #65 - avoid new quotes if in quotes 2015-09-19 14:58:42 +02:00
Geoff McLane d541405a2a Eventually complete a 2007 fix 2015-09-16 13:17:50 +02:00
Geoff McLane 9960f7c6dd Protext agains a NULL node in the Debug only code 2015-09-12 13:06:14 +02:00
Srikanth Anantharam be9f1d4203 using _fileno(fout) instead of fout->_file makes it more portable across different MSVC versions 2015-09-11 00:27:17 +05:30
Geoff McLane c48680cc01 Issue #180 - fix indenting when -omit used 2015-09-10 15:01:48 +02:00
Geoff McLane 66e288a8e2 Issue #239 - no warn for apos enitity in html5++ mode 2015-08-22 14:03:02 +02:00
Geoff McLane e79137de7f Issue #238 - only except the pre element 2015-08-22 14:00:18 +02:00
Geoff McLane 1d67dc940a Merge branch 'Andrew-Dunn-patch-1' into issue-228.
That is reordering windows includes per #234

In general the order of includes should be system <headers>,
then local "headers", except perhaps for the ocassional local
"version" or "config" header...

Resolved conflicts in src/pprint.c by reverting to current master, and in
version.txt by increasing the version.
2015-08-10 18:49:13 +02:00
Andrew Dunn dfdffd0cb3 Reordered Windows Includes
Moved the <windows.h> include above the "streamio.h" include to fix compilation with the latest Windows SDK.

<winnt.h> now has the following struct. In particular the `CR` member of this struct conflicts with a define in streamio.h.

    typedef struct _IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY {
        DWORD BeginAddress;
        union {
            DWORD UnwindData;
            struct {
                DWORD Flag : 2;
                DWORD FunctionLength : 11;
                DWORD RegF : 3;
                DWORD RegI : 4;
                DWORD H : 1;
                DWORD CR : 2; // This line causes a compile error because CR is redefined in streamio.h
                DWORD FrameSize : 9;
            } DUMMYSTRUCTNAME;
        } DUMMYUNIONNAME;
    } IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY, * PIMAGE_ARM64_RUNTIME_FUNCTION_ENTRY;
2015-08-07 17:06:33 +10:00
Geoff McLane cbae924a40 Oops, missed setting 'type' for TidyVertSpace.
This was evidenced by an 'assert' failure, that the type was not an 'int'!

And also in the -xml-help output, thus effecting the tidy.1 manual page
for this new feature --vertical-space auto, which produces almost single
line html output.

This 'fix' began in the issue-228 branch - see Issue #231
2015-07-31 13:39:06 +02:00
Geoff McLane 38ef5bfe85 Issue #232 remove CM_HEAD from 'object' tag 2015-07-30 14:50:15 +02:00
Geoff McLane ae620a63a2 merge @camoy fix #158 to this branch 2015-07-17 19:00:16 +02:00
Geoff McLane d26cd72084 Add macros to get TidyVertSpace config, and implement 2015-07-15 20:58:00 +02:00
Geoff McLane 154a61543b Expand xml TidyVertSpace text to include tri-state 2015-07-15 20:56:22 +02:00
Geoff McLane 16580e0926 Revert TidyVertSpace to 'no', and make AutoBool option 2015-07-15 20:54:50 +02:00
Geoff McLane 4246c2c462 Issue #230: Need to KEEP this newline char sometimes.
This is a case where the lexer, in GetTokenfromStream, does NOT eat any
trailing newline after a LEX_STARTTAG: case...

So far have identified pre, script, style as NEEDING this user newline
character for later pprint output. Any others?
2015-07-15 19:41:02 +02:00
Cameron Moy d50391a984 Fix #158 - remove inserted newlines in pre 2015-07-13 16:31:52 -04:00
Geoff McLane cb2543efac Merge branch 'master' of https://github.com/stencila/tidy-html5 into issue-228 2015-07-13 19:11:30 +02:00
Nokome Bentley 991630e523 Changes default for vertical-space to yes
Makes this more similar (but not the same) as the previous default
behaviour.
2015-07-13 15:56:15 +12:00
Nokome Bentley b6bcf0408c Applies "smart" new lines to start of script like tags 2015-07-13 15:49:07 +12:00
Nokome Bentley f6979787d1 Adds "smart" line flushing functions.
See in-code comments for more details
2015-07-13 15:40:59 +12:00
Folkert van Heusden 784c7d7f79 Added methods for deleteing nodes and/or attributes.
This is useful when e.g. writing an HTML cleaner.
2015-07-12 18:34:35 +00:00