Modify the build system to assume config files by default. Modify tidyplatform
to accomodate these changes. Reformat tidyplatform for friendliness to new
developers.
- Change default value of `--fix-bad-comments` to `no`.
- Ensure that when _not_ fixing, nothing is actually fixed.
- Ensure that when fixing, initial adjacent hyphens actually are fixed.
- Issue tidyinfo for all fixes made.
- Issue tidywarning when when not making fixes for non-HTML5 doctypes.
output, classify and organize all of the dialogue type of messages. This paves
the way towards formalizing (and expanding!) the footnotes system with much
greater explanatory text, as well as providing much better fine-grained control
over which types of output that Tidy will produce.
Moved STRING_DOCTYPE_GIVEN, STRING_CONTENT_LOOKS, and STRING_NO_SYSID to the
Report paradigm from the Dialogue paradigm, as these are items that are
traditionally TidyInfo and included in the Report table, rather than any type
of dialogue.
At this point, we are exactly passing all tests.
Note that there are several regressions in the accessibility test suit that
are not related to output messages. These are a result of previous work, and
these results should be updated in the test suite when this item is merged.
reflects such. Some fleshed in report formatters are included with cases for
several of Tidy's reports, but nothing is yet enabled. All reporting is status
quo, and this is just a bunch of dead code at this point.
the basic reporting functions that share the same signature. This also resulted
in eliminating a string, and adding a new string to disambiguate between
errors and warnings.
This appears to be an issue with Word2000 handling of empty attributes.
A reproduction case can be seen here:
```
$ cat test.html
<html xmlns:o="urn:schemas-microsoft-com:office:office">
<body>
<table>
<img class="" />
</table>
</body>
</html>
$ ./tidy --tidy-mark no --word-2000 yes test.html
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 3 column 1 - Warning: <img> isn't allowed in <table> elements
line 2 column 1 - Info: <table> previously mentioned
line 1 column 57 - Warning: inserting missing 'title' element
line 3 column 1 - Warning: <img> lacks "alt" attribute
line 3 column 1 - Warning: <img> lacks "src" attribute
line 2 column 1 - Warning: trimming empty <table>
line 1 column 1 - Warning: <html> proprietary attribute "xmlns:o"
[2] 52405 segmentation fault ./tidy --tidy-mark no --word-2000 yes test.html
```
This was called from 6f2fb6e0e7/src/clean.c (L1710).
(It is technically undefined behaviour to call strncmp with `NULL` pointers however).
the remainder of the callbacks. TidyConfigCallback is now given a reference
to the instance of the TidyDoc that caused the callback to occur.
+ TidyConfigCallback
An earlier patch now passes back an all space text node. Previously this
would have been skipped. So add code in ParseList to detect, and discard
such a node.
Change committed:
modified: src/parser.c
Add option TidyStyleTags, --fix-style-tags, Bool, to turn off
this action.
Add warning messages MOVED_STYLE_TO_HEAD, and FOUND_STYLE_IN_BODY.
Fully iterate ALL nodes in the body, in search of style tags...
Changes to be committed:
modified: include/tidyenum.h
modified: src/clean.c
modified: src/config.c
modified: src/language_en.h
modified: src/message.c
parser and picklist system. Console application needs to be updated to fix
the description, as it shows autobool, and for some reason on the current
system I'm not getting assertion failures.
This PR refactors how picklists and option parsers are implemented in LibTidy,
making is vastly easier to implement new picklists in the future, as well as
modify some of the existing picklists such that they have more logical names.
Picklist arrays are now arrays of structures that include the possible strings
capable of setting a particular option value, and a new parser has been written
to work with these structures.
In addition, several of the existing parsers were removed, as they are now
redundant, and a couple of the remaining parsers were refactored to take
advantage of the new parser.
In effect, this means that:
- New parsers don't have to be written in the majority of cases where new
options are added that exceed yes/no/auto.
- Some of the existing options can have more meaningful names than yes/no/auto,
in a backward compatible way. For example, vertical-spacing "auto" currently
in no way reflects "auto" when used.
According to the MSN documentation 'isalnum(c)' is only valid when c equals
EOF, or is in the range 0 to 255 inclusive. It states the behavior is
undefined outside this range, and in Debug mode triggers an assert dialog.
in #352, but I'm worried that there's some over-reach here.
Currently only implemented as a warning, with no switch to turn it off, which
maintains current behavior other than the warning.
In general, we're treating any string as a complete URL, rather than breaking
URL's into component parts. Thus the `IsURLCodePoint()` check includes a few
other generic characters that strictly speaking aren't valid codepoints, but
are valid as escape characters and delimiters.
When addressing #338, I ran into a similar situation in not having a built-in
method to separate path components (although a simple generalized solution was
good enough in that case).
Thus without introducing a new structure and functions to deconstruct a URL
into scheme, authority, path, parameters, etc., some variation of this patch
will have to be used to address #352.
extension is a file, and so links to TLD's ending with .pl, .au, etc., will
cause accessibility warnings. This fix attempts to distinguish between URI's
that are likely to be files versus links to domains.
- Many, many updates to the public header files.
- tidyenum.h was reorganized substantially in order to better generate
documentation with Doxygen.
- This was also a good time to clean up all of the various enums for languages
and strings. Everything is simple and in a single enum now, other than a
couple of cases (TidyOptionId, for example, doesn't need to be redefined).
- A full and complete audit of the strings meant some opportunities to delete
useless strings.
- Reorganized the order of the strings in language_en.h in order to better
find things when programmers want to make changes. There are a lot fewer
internal "sections" now, and everything has been painstakingly sorted within
the remaining sections.
- Consequently rebased all of the PO's, POT, and other language files.
- Updated several of the READMEs with the newest information.
- Made the READMEs easier to copy into the Doxygen project by changing some of
the code format for compatibility, mainly the use of tildes instead of
backslashes for code blocks.
- Added tidyGetMessageCode() to message API. Despite the huge diff, this is the
only externally-visible change, other than removing some enums (but not their
values!).
- Passing `next` tests on Mac, Linux, Win10.
- tidyDetectedHtmlVersion()
- tidyDetectedXhtml()
- added two new fields to W3C_Doctypes[] in order to simplify this.
- added TY_(HTMLVersionNumberFromCode)() to enable lookup.
- Implement tidyDetectedGenericXml()
- Added a warning message if an XML declaration exists but the document is not
XHTML.
- Remove dead commented code.
- Updated POs and POT. Headers not affected, but translators should check
their translations.
- Testing is clean on Mac OS X, Ubuntu 16.04, and Windows 10.