Doc folder.

This commit is contained in:
Jim Derry 2021-07-30 18:57:02 -04:00
parent 1213047d42
commit 995c20e9e4
17 changed files with 1352 additions and 0 deletions

104
docs/API_AND_NAMESPACE.md Normal file
View file

@ -0,0 +1,104 @@
# The `LibTidy` API and Namespacing
## Introduction
If you're just getting started working with `LibTidy`, some of the design choices may seem overwhelming if you're not a seasoned C veteran. Hopefully this article will give a decent overview, encouraging you to explore and contribute to the `LibTidy` code.
This article will discuss briefly:
- How `LibTidy` achieves namespacing in C
- Explanations for some of the bizzarre, do-nothing macros.
- Opaque types
- How to add new functions to the `LibTidy` API.
# Namespacing
The C language does not support built in namespacing, but it is subject to namespace collision, especially when a library is statically linked. `LibTidy` tries to get around this by making a compromise between human-readable names and making the names random enough to avoid a collision.
As you browse Tidy's code, you'll notice many uses of a macro function `TY_()` applied to the function names of non-static functions. The preprocessor thus resolves all of these function names to `prvTidyFunction`, thus ensuring a clear namespace and avoiding the possibility of collisions (unless some other library has thoughtlessly borrowed our prefix for the same). For example, `TY_(getNextOptionPick)` will resolve to `prvTidygetNextOptionPick` when compiled.
Of course, `static` functions are immune to the issue of namespace pollution, so in general you will really only use this technique for functions that must be accessible from outside of your new file, such as functions that you want to expose to the API.
# Macros for documentation
`TIDY_EXPORT` and `TIDY_CALL` are defined to be `NULL`, i.e., when compiled they resolve to nothing. These are used exclusively for documenting functions that are part of the API defined in `tidy.h` and the implementation in `tidylib.c`. For example, in `tidy.h`:
~~~
TIDY_EXPORT TidyIterator TIDY_CALL getWindowsLanguageList();
~~~
The `TIDY_EXPORT` call clearly indicates that this function prototype is meant to be exported from the API, and `TIDY_CALL` clearly indicates that the function is called from within `LibTidy`.
Although this makes things obvious from the documentation perspective, the truth is a little murkier. In some environments one might define `TIDY_EXPORT` and `TIDY_CALL` differently in order to control compiler behavior, especially in environments that have special requirements for dynamic libraries. In general, though, you shouldn't have to worry about this.
The preferred use of pointer operators when documenting with macros is this:
~~~
const tidyLocaleMapItem* TIDY_CALL getNextWindowsLanguage( TidyIterator* iter )
~~~
…instead of this:
~~~
const tidyLocaleMapItem TIDY_CALL *getNextWindowsLanguage( TidyIterator* iter )
~~~
# External types are opaque
In several spots the source code indicates that a particular structure is "opaque." This simply means that API users cannot see inside of them, and they have to depend on accessor functions to gain access to the sweet fruit that is within. This is a design choice that makes `LibTidy` highly portable and makes it accessible to multitudes of other languages that can communicate with a C API.
Take `tidyDoc` for example, as it's the most fundamental datatype within `LibTidy`. As an API user, you can have a reference to a `tidyDoc`, and you're going to pass it around a lot to accessor functions (such as `tidyCleanAndRepair`), and you know that it contains lots of good stuff, but you're not allowed to peek inside of it unless an accessor function is provided. Think of it as a token that you pass around, and nothing more.
Internally, the type is cast to a native C structure of type `tidyDocImpl`, and so if you decide to become a Tidy developer, you have the choice to access the item fully.
If you extend Tidy's API, it's important to respect this design choice, even if only writing functionality for the console application (which is, of course, simply an implementor of `LibTidy`).
# How to add new functions to `LibTidy`
All of the information above is useful for anyone who wants to browse Tidy's source code, or use the API, or understand Tidy better, but it all comes together nicely when you want to extend the API. This quick lesson will show you how to do so, using `tidyLocalizedString()` as an example.
## Behind the scenes
The first thing we need to do is have the internal version of the function that we want to add. Tidy has a module that handles localization: `language.h/c`. In the header is where we define the interface to LibTidy, which should be namespaced according to the discussion above. We can declare:
~~~
ctmbstr TY_(tidyLocalizedString)( uint messageType );
~~~
…and of course implement it in the `.c` file.
Now you have a decision to make: if you plan to use this function internally, you're going to have to import the header into other modules that require the function. This can lead to painful compile-time consequences. However since we want to expose this particular function to the API, it will be visible within `TidyLib`, so we can use the public API internally, too.
## The API
Once implemented, we want a pretty, public-facing name for our `tidyLocalizedString()` function, which appropriately is `tidyLocalizedString()`. Add the declaration to `tidy.h`:
~~~
TIDY_EXPORT ctmbstr TIDY_CALL tidyLocalizedString( uint messageType );
~~~
…and now the publicly exposed interface knows that your function exists. All that's left to do is add the `language.h` header to `tidylib.c`, and then implement it there:
~~~
ctmbstr TIDY_CALL tidyLocalizedString( uint messageType )
{
return TY_(tidyLocalizedString)( messageType );
}
~~~
Congratulations, you can now expose new functionality to the API.
## API functions for opaque types
For a more complicated example that demonstrates how to use opaque types (and also the `TidyIterator` type) have a look at the implementation of `getWindowsLanguageList()`, and its partners `*getNextWindowsLanguage()`, `TidyLangWindowsName()`, and `TidyLangPosixName()`. These demonstrate how to:
- implement iteration for structures with multiple records.
- write a function in `tidylib.c` that converts between the exposed, opaque type and the internal, implementation type.
- further reinforce how functionality is added to the API.

26
docs/ATTRIBUTES.md Normal file
View file

@ -0,0 +1,26 @@
# Tidy Element Attributes
This is about adding a **new** HTML attribute to one or more HTML tags, i.e., a new attribute such as `attribute=value`.
Tidys large number of attributes are supported via number of files:
- `tidyenum.h` is where you first define a new attribute in order to give it an internal value.
- `attrs.c` is where you give a unique **string** name to the attribute, as well as a **function** to verify the **value**.
- `attrdict.c` further refines the definition of your attribute, specifying which version(s) of HTML support this attribute.
- `tags.c`, finally, determines which tags support the attribute, in the `tag_defs[]` table.
So, to add a new `attribute=value`, on one or more existing tags, consists of the following simple steps -
1. `tidyenum.h` - Give the attribute an internal name, like `TidyAttr_XXXX`, and thus a value. Please try to keep this enumeration in alphabetical order.
2. `attrs.c` - Assign the string value of the attribute. Of course this must be unique. And then assign a `function` to verify the attribute value. There are already a considerable number of defined functions to verify specific attribute values, but maybe this new attribute requires a new function, so that should be written, and defined.
3. `attrdict.c` - If this attribute only relates to specific tags, then it should be added to their list. There are some general attributes that are allowed on every, or most tags, so this new attribute and value should be added accordingly.
4. `tags.c` - Now the new attribute will be verified for each tag it is associated with in the `tag_defs[]` table. Like for example the `<button ...>`, `{ TidyTag_BUTTON, ...` has `&TY_(W3CAttrsFor_BUTTON)[0]` assigned.
So, normally, just changing 3 files, `tidyenum.h`, `attrs.c`, and `attrdict.c`, will already adjust `tags.c` to accept a new `attribute=value` for any tag, or all tags. Simple...
Now, one could argue that this is not the **best** way to verify every attribute and value, for every tag, but that is a moot point - that is how Tidy does it!
; eof 20170205

28
docs/BRANCHES.md Normal file
View file

@ -0,0 +1,28 @@
# HTML Tidy Branches
## About Branches
Starting with **HTML Tidy** 5.4.0, HTACG will adopt a new branch management strategy utilizing **master** as the _release branch_, and **next** as the active development branch.
As described thoroughly in our [VERSION.md](VERSION.md) document, this means that **master** will always consist of an even-numbered minor version, and activity will remain relatively quiet unless we backport a critical bug fix from **next**.
The **next** branch, then will host the majority of our development activity, and any contributions and PRs should be against this branch. This means that **next** will always consist of an odd minor version number.
## About Versioning
You can read the specifics about version numbers in our [VERSION.md](VERSION.md) document.
## FAQs
### Which version or branch should I choose?
As described above, the branch is very strongly correlated with the version. If you require a stable API and relatively stable output and dont require the features and enhancements of an odd-numbered **next** version, then you should stick to **master**, even-numbered versions.
On the other hand if you are primarily a console application user, then the API isnt likely as important to you, and you probably want the latest and greatest. If this describes you, you probably want to at least try out **next**.
If you are developing for Tidy, then you _definitely_ want to stick to **next**, even for bug fixes meant for **master**. If its a critical enough bug fix, then one of our friendly team will back-port the fix to **master**.

91
docs/BUILD.md Normal file
View file

@ -0,0 +1,91 @@
# HTACG HTML Tidy
## Prerequisites
1. git - [https://git-scm.com/book/en/v2/Getting-Started-Installing-Git][1]
2. cmake - [https://cmake.org/download/][2]
3. appropriate build tools for the platform
4. the [xsltproc][3] tool is required to build and install the `tidy.1` man page on Unix-like platforms.
CMake comes in two forms - command line and GUI. Some installations only install one or the other, but sometimes both. The build commands below are only for command line use.
Also the actual build tools vary for each platform. But that is one of the great features of CMake, it can generate various 'native' build files. Running `cmake --help` should list the generators available on that platform. For sure one of the common ones is "Unix Makefiles", which needs autotools make installed, but many other generators are supported.
In Windows CMake offers various versions for MSVC. Again below only the command line use of MSVC is shown, but the tidy solution (*.sln) file can be loaded into the MSVC IDE, and the building done in there.
## Build the tidy library and command line tool
### macOS/Linux/Unix
~~~
cd build/cmake
cmake ../.. -DCMAKE_BUILD_TYPE=Release [-DCMAKE_INSTALL_PREFIX=/path/for/install]
make
[sudo] make install
~~~
### macOS (multi-architecture)
~~~
cd build/cmake
cmake ../.. -DCMAKE_BUILD_TYPE=Release "-DCMAKE_OSX_ARCHITECTURES=x86_64;arm64"
make
[sudo] make install
~~~
### Windows
~~~
cd build/cmake
cmake ../.. -DCMAKE_BUILD_TYPE=Release [-DCMAKE_INSTALL_PREFIX=/path/for/install]
cmake --build . --config Release
cmake --build . --config Release --target INSTALL
~~~
### Build options
By default cmake sets the install path to `/usr/local/bin` in Unix. If you wanted the binary in say `/usr/bin` instead, then in the second step, use `-DCMAKE_INSTALL_PREFIX=/usr`.
Also, in Unix if you want to build the release library without any debug `assert` in the code then add `-DCMAKE_BUILD_TYPE=Release` in the second step. This adds a `-DNDEBUG` macro to the compile switches. This is normally added in windows build for the `Release` config.
In Windows the default install is to `C:\Program Files\tidy`, or `C:/Program Files (x86)/tidy`, which is not very useful. After the build the `tidy.exe` is in the `Release` directory, and can be copied to any directory in your `PATH` environment variable for global use.
On macOS, you can build for both Intel and Apple Silicon by adding "-DCMAKE_OSX_ARCHITECTURES=x86_64;arm64" in the second step.
If you do **not** need the tidy library built as a 'shared' (DLL) library, then in the second step add the command `-DBUILD_SHARED_LIB:BOOL=OFF`. This option is **ON** by default. The static library is always built and linked with the command line tool for convenience in Windows, and so the binary can be run as part of the man page build without the shared library being installed in unix.
See the `CMakeLists.txt` file for other CMake **options** offered.
## Build the tidy packages
1. `cd build/cmake`
2. `cmake ../.. -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr`
3. Unix/OS X: `make package`
## Build PHP with the tidy-html5 library
Before PHP 7.1, due to API changes in the PHP source, `buffio.h` needs to be renamed to `tidybuffio.h` in the file `ext/tidy/tidy.c` in PHP's source.
That is - prior to configuring PHP run this in the PHP source directory:
~~~
sed -i 's/buffio.h/tidybuffio.h/' ext/tidy/*.c
~~~
And then continue with (just an example here, use your own PHP config options):
~~~
./configure --with-tidy=/usr/local
make
make test
make install
~~~
[1]: https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
[2]: https://cmake.org/download/
[3]: http://xmlsoft.org/XSLT/xsltproc2.html

246
docs/CHANGELOG.md Normal file
View file

@ -0,0 +1,246 @@
# Changelog
## [5.8.0](https://github.com/htacg/tidy-html5/tree/5.8.0) (2021-07-10)
[Full Changelog](https://github.com/htacg/tidy-html5/compare/5.6.0...5.8.0)
**Fixed bugs:**
- Details open, Value added to Attribute [\#925](https://github.com/htacg/tidy-html5/issues/925)
- Fix handling of percent symbols in CheckLength validation routine [\#910](https://github.com/htacg/tidy-html5/issues/910)
- What is the true purpose and use case of the --bare option? [\#896](https://github.com/htacg/tidy-html5/issues/896)
- Warning about missing \</summary\> [\#895](https://github.com/htacg/tidy-html5/issues/895)
- DecodeMacRoman\(\) is missing an upper bounds check before indexing into Mac2Unicode array [\#891](https://github.com/htacg/tidy-html5/issues/891)
- Can't disable wrap [\#858](https://github.com/htacg/tidy-html5/issues/858)
- Recursion limit exceeded [\#850](https://github.com/htacg/tidy-html5/issues/850)
- template tag should be allowed in head [\#836](https://github.com/htacg/tidy-html5/issues/836)
- tag\_defs + AdjustTags\(\) and ResetTags\(\) during parsing is not thread-safe \(tags.c\) [\#816](https://github.com/htacg/tidy-html5/issues/816)
- Unexpected parsing with uppercase DOCTYPE [\#815](https://github.com/htacg/tidy-html5/issues/815)
- bugfix for messageobj.c for windows vc++ [\#800](https://github.com/htacg/tidy-html5/issues/800)
- Tidy 5.7.20 GetSurrogatePair can use uninitialised value processing malformed entity refs [\#798](https://github.com/htacg/tidy-html5/issues/798)
- regression tests fail if /etc/tidy.conf or ~/.tidyrc exists [\#778](https://github.com/htacg/tidy-html5/issues/778)
- AddByte allocAmt overflows for large input files [\#761](https://github.com/htacg/tidy-html5/issues/761)
- --strict-tags-attributes no doesn't ignore \<td align\> [\#729](https://github.com/htacg/tidy-html5/issues/729)
- "Too many title elements in \<title\>" should say "Too many title elements in \<head\>" [\#692](https://github.com/htacg/tidy-html5/issues/692)
- Tidy 5.6.0 on Mac says Not a file when file is not writeable [\#681](https://github.com/htacg/tidy-html5/issues/681)
- Tidy fails if html contains a section \<!\[endif\]—\> [\#487](https://github.com/htacg/tidy-html5/issues/487)
- "Malformed" Word 2000 sequence may cause Tidy to skip document content [\#462](https://github.com/htacg/tidy-html5/issues/462)
- Change open tag to Boolean [\#932](https://github.com/htacg/tidy-html5/pull/932) (@arrmo)
- Is \#729 - Show 'warnings' in all td cases [\#928](https://github.com/htacg/tidy-html5/pull/928) (@geoffmcl)
- Issue \#692 - too many titles [\#927](https://github.com/htacg/tidy-html5/pull/927) (@geoffmcl)
- Is. \#681 - read-only files, and dirs [\#926](https://github.com/htacg/tidy-html5/pull/926) (@geoffmcl)
- Free attributes before return NULL [\#899](https://github.com/htacg/tidy-html5/pull/899) (@ltx2018)
- Is. \#896 - make 'bear' docs match code [\#898](https://github.com/htacg/tidy-html5/pull/898) (@geoffmcl)
- Correction for issue-895 [\#897](https://github.com/htacg/tidy-html5/pull/897) (@arrmo)
- fix memleak in GetTokenFromStream [\#884](https://github.com/htacg/tidy-html5/pull/884) (@ltx2018)
- Protect against NULL in PruneSection. [\#853](https://github.com/htacg/tidy-html5/pull/853) (@esclim)
- Is \#815 - Use case-insensitive test 'html' [\#832](https://github.com/htacg/tidy-html5/pull/832) (@geoffmcl)
- Is. \#761 - just deal with the 'uint' wrap [\#830](https://github.com/htacg/tidy-html5/pull/830) (@geoffmcl)
- Tidy 5.7.20 crashes if allocator replaced [\#797](https://github.com/htacg/tidy-html5/issues/797)
- --mute should suppress non-zero exit code [\#794](https://github.com/htacg/tidy-html5/issues/794)
- Seems tidy.c has sprung a leak [\#791](https://github.com/htacg/tidy-html5/issues/791)
- Cannot handle read-only html files \(possibly regression?\) [\#789](https://github.com/htacg/tidy-html5/issues/789)
- setlocale\( LC\_ALL, ""\) changes the locale for the entire application [\#770](https://github.com/htacg/tidy-html5/issues/770)
- mute in ~/.tidyrc runs fine but triggers exit\(1\) [\#752](https://github.com/htacg/tidy-html5/issues/752)
- Duplicate IDs are not detected if the ID has an uppercase letter [\#726](https://github.com/htacg/tidy-html5/issues/726)
- Tidy gets confused with a \<span\> around a block element [\#709](https://github.com/htacg/tidy-html5/issues/709)
- Tidy seems to get confused by HTML strings in JavaScript blocks. [\#700](https://github.com/htacg/tidy-html5/issues/700)
- tidy indent+wrap breaks \<pre\> formatting [\#697](https://github.com/htacg/tidy-html5/issues/697)
- -export-config creates invalid configuration file [\#679](https://github.com/htacg/tidy-html5/issues/679)
- Segmentation Fault [\#656](https://github.com/htacg/tidy-html5/issues/656)
- Maybe a problem with some vsnprintf implementations? [\#655](https://github.com/htacg/tidy-html5/issues/655)
- Why is libtidy complaining \<data\> isnt approved by W3C? [\#649](https://github.com/htacg/tidy-html5/issues/649)
- Is. \#791 - free some allocations [\#809](https://github.com/htacg/tidy-html5/pull/809) (@geoffmcl)
- Issue 726 upper case anchors [\#731](https://github.com/htacg/tidy-html5/pull/731) (@geoffmcl)
- Is \#673 - Revert 350f7b4 and 86e62db AdjustConfig logic [\#705](https://github.com/htacg/tidy-html5/pull/705) (@geoffmcl)
- Issue \#655 - Fix unsafe use of output buffer as input param [\#662](https://github.com/htacg/tidy-html5/pull/662) (@geoffmcl)
- Issue \#656 - protect against NULL node set in loop [\#661](https://github.com/htacg/tidy-html5/pull/661) (@geoffmcl)
**Closed issues:**
- No NPM? [\#960](https://github.com/htacg/tidy-html5/issues/960)
- Where can I find the list of known tags? [\#958](https://github.com/htacg/tidy-html5/issues/958)
- fix non-standard static library name [\#952](https://github.com/htacg/tidy-html5/issues/952)
- Lot of config options; but where are the defaults specified [\#948](https://github.com/htacg/tidy-html5/issues/948)
- Any Windows Binaries for 5.7.28, just like 5.6? [\#947](https://github.com/htacg/tidy-html5/issues/947)
- Setup continuous integration and testing [\#944](https://github.com/htacg/tidy-html5/issues/944)
- Linux binaries for latest releases [\#939](https://github.com/htacg/tidy-html5/issues/939)
- Outdated warnings [\#938](https://github.com/htacg/tidy-html5/issues/938)
- Umlauts/special characters not converted to correct html entities [\#936](https://github.com/htacg/tidy-html5/issues/936)
- tidy hanging [\#935](https://github.com/htacg/tidy-html5/issues/935)
- Tidy catches repeated attributes, but misses identical ids [\#924](https://github.com/htacg/tidy-html5/issues/924)
- drop-empty-elements is not removing empty Table elements [\#923](https://github.com/htacg/tidy-html5/issues/923)
- Tag "main" is shown as error [\#922](https://github.com/htacg/tidy-html5/issues/922)
- Unexpected parsing a tag in table [\#919](https://github.com/htacg/tidy-html5/issues/919)
- beginner on windows -- tidy reports: document: "a0.htm" is not a file! -- But it is [\#918](https://github.com/htacg/tidy-html5/issues/918)
- tidy says this misplaced \</dl\> is OK [\#917](https://github.com/htacg/tidy-html5/issues/917)
- Tidy can't deal with \<中文\> XML tags [\#913](https://github.com/htacg/tidy-html5/issues/913)
- Support extended color names in HTML 5 [\#908](https://github.com/htacg/tidy-html5/issues/908)
- Unknown type uint trying to use the shared lib. [\#906](https://github.com/htacg/tidy-html5/issues/906)
- \</select\> ending tag missing [\#904](https://github.com/htacg/tidy-html5/issues/904)
- SVG attributes flagged as proprietary [\#903](https://github.com/htacg/tidy-html5/issues/903)
- tidy-mark option is not working [\#901](https://github.com/htacg/tidy-html5/issues/901)
- Need help controlling output [\#894](https://github.com/htacg/tidy-html5/issues/894)
- Say how to deal with XHTML input [\#893](https://github.com/htacg/tidy-html5/issues/893)
- Help output refers to a non-existent -options option [\#892](https://github.com/htacg/tidy-html5/issues/892)
- Tidy gets confused with u tags and underline styles [\#890](https://github.com/htacg/tidy-html5/issues/890)
- how to forbid auto insert tag? [\#889](https://github.com/htacg/tidy-html5/issues/889)
- Incorrectly changing — to - \(emdash \[alt 0151\]\) to hypens. [\#885](https://github.com/htacg/tidy-html5/issues/885)
- html-tidy site does not work with https [\#883](https://github.com/htacg/tidy-html5/issues/883)
- Use with TextPad 8 [\#882](https://github.com/htacg/tidy-html5/issues/882)
- Translation: TidyKeepTabs [\#880](https://github.com/htacg/tidy-html5/issues/880)
- \<img\> proprietary attribute "loading" [\#879](https://github.com/htacg/tidy-html5/issues/879)
- Versioning seems a bit off [\#877](https://github.com/htacg/tidy-html5/issues/877)
- --quote-ampersand yes doesn't work [\#876](https://github.com/htacg/tidy-html5/issues/876)
- Convert spaces to non-breaking space [\#875](https://github.com/htacg/tidy-html5/issues/875)
- Tidy 5.6.0 mangled html / php code. [\#872](https://github.com/htacg/tidy-html5/issues/872)
- Even with -utf8 tidy replaces UTF8 code U+00A0 into numeric entity &\#160; [\#871](https://github.com/htacg/tidy-html5/issues/871)
- http-equiv metas should trigger helpful upgrade messages [\#868](https://github.com/htacg/tidy-html5/issues/868)
- HTML Tidy website does not render propertly when using HTTPS [\#867](https://github.com/htacg/tidy-html5/issues/867)
- \[-Wignored-qualifiers\] warning in tidy [\#866](https://github.com/htacg/tidy-html5/issues/866)
- Wrong character encoding [\#863](https://github.com/htacg/tidy-html5/issues/863)
- Missing semicolon after html entity sometimes returns generic 'unknown entity' warning instead of specific 'missing semicolon' [\#862](https://github.com/htacg/tidy-html5/issues/862)
- Warning: unescaped & or unknown entity "&P" when encoding as utf-8 [\#861](https://github.com/htacg/tidy-html5/issues/861)
- Tidy output clutter [\#857](https://github.com/htacg/tidy-html5/issues/857)
- Trailing backspace removed [\#856](https://github.com/htacg/tidy-html5/issues/856)
- Only wrap at tags [\#854](https://github.com/htacg/tidy-html5/issues/854)
- ENABLE\_DEBUG\_LOG is ignored on Windows [\#852](https://github.com/htacg/tidy-html5/issues/852)
- Kill off alphabetical ordering clause for publicly-exposed enum defs [\#851](https://github.com/htacg/tidy-html5/issues/851)
- For Sublime Text 3 [\#849](https://github.com/htacg/tidy-html5/issues/849)
- \<li\> tags skipped in tidy result shown on screen [\#847](https://github.com/htacg/tidy-html5/issues/847)
- man page missing header causing appending to XML discussion [\#846](https://github.com/htacg/tidy-html5/issues/846)
- \<input type="file"\> needs name= [\#845](https://github.com/htacg/tidy-html5/issues/845)
- Expose node-\>last in the public API [\#844](https://github.com/htacg/tidy-html5/issues/844)
- Support EJS? [\#842](https://github.com/htacg/tidy-html5/issues/842)
- Tidy 5.2 cleaned up curly quotes but 5.6 doesn't [\#841](https://github.com/htacg/tidy-html5/issues/841)
- Jekyll headings removed [\#840](https://github.com/htacg/tidy-html5/issues/840)
- Should tidy allow an empty title element? [\#839](https://github.com/htacg/tidy-html5/issues/839)
- Missing tags for 5.7.\* [\#834](https://github.com/htacg/tidy-html5/issues/834)
- Python binding? [\#826](https://github.com/htacg/tidy-html5/issues/826)
- Self-closing tags are not correctly recognized [\#813](https://github.com/htacg/tidy-html5/issues/813)
- Different output when parsing HTML [\#790](https://github.com/htacg/tidy-html5/issues/790)
- Continuously fuzzing tidy-html5 with OSS-Fuzz [\#788](https://github.com/htacg/tidy-html5/issues/788)
- I18N isn't working \(mostly\) via changing the environment variables [\#783](https://github.com/htacg/tidy-html5/issues/783)
- 5.6.0 and breakage with php-tidy [\#780](https://github.com/htacg/tidy-html5/issues/780)
- Tidy needs a changelog [\#776](https://github.com/htacg/tidy-html5/issues/776)
- TidyNodeGetText returns text with a new line appended [\#775](https://github.com/htacg/tidy-html5/issues/775)
- Breaks microseconds after call tidy\_repair\_string [\#771](https://github.com/htacg/tidy-html5/issues/771)
- Typos in language\_en.h, etc [\#765](https://github.com/htacg/tidy-html5/issues/765)
- Document accessibility priority numbers better [\#756](https://github.com/htacg/tidy-html5/issues/756)
- Xcode not working with tidylib [\#751](https://github.com/htacg/tidy-html5/issues/751)
- Intent-To-Package: Snaps are Universal Linux Packages [\#748](https://github.com/htacg/tidy-html5/issues/748)
- Can't parse UTF16 html string [\#744](https://github.com/htacg/tidy-html5/issues/744)
- libtidy.so.5 has removed symbols between 5.2.0 and 5.6.0, but kept SONAME [\#743](https://github.com/htacg/tidy-html5/issues/743)
- Tidy 5.7.16 -\> empty result [\#740](https://github.com/htacg/tidy-html5/issues/740)
- Crash with malformed \<meta\> tag [\#739](https://github.com/htacg/tidy-html5/issues/739)
- bug\(encoding\): non-ASCII characters in configuration file [\#737](https://github.com/htacg/tidy-html5/issues/737)
- Improve documentation re: wrap-script-literals [\#736](https://github.com/htacg/tidy-html5/issues/736)
- feature\_request\(validation\): “preserve-entities yes” by default [\#732](https://github.com/htacg/tidy-html5/issues/732)
- Tidy emits warnings that aren't in order [\#696](https://github.com/htacg/tidy-html5/issues/696)
- Option to disable tidy code fixing option [\#693](https://github.com/htacg/tidy-html5/issues/693)
- tidy change html view when deal with white-space:pre tag [\#685](https://github.com/htacg/tidy-html5/issues/685)
- CLI option to stop insertion/deletion of tags [\#682](https://github.com/htacg/tidy-html5/issues/682)
- Tidy does not strip leading and trailing spaces in HTML href [\#678](https://github.com/htacg/tidy-html5/issues/678)
- Use tidy with json custom attributes on custom components [\#677](https://github.com/htacg/tidy-html5/issues/677)
- \[Question\] How to use tidy for multiple files? [\#668](https://github.com/htacg/tidy-html5/issues/668)
- How to run a test-kit from terminal? [\#667](https://github.com/htacg/tidy-html5/issues/667)
- Dependency on DLLs not Documented [\#666](https://github.com/htacg/tidy-html5/issues/666)
- tidylib.c fails to compile on Visual Studio 2010 [\#665](https://github.com/htacg/tidy-html5/issues/665)
- Minify HTML [\#628](https://github.com/htacg/tidy-html5/issues/628)
- Do not insert newlines into TEXT when wrapping! [\#625](https://github.com/htacg/tidy-html5/issues/625)
- Configuration Options "cleanup" [\#609](https://github.com/htacg/tidy-html5/issues/609)
- Next Release 5.6.0 [\#600](https://github.com/htacg/tidy-html5/issues/600)
- anchor-as-name: false replaces name attribute of a form tag with id attribute [\#571](https://github.com/htacg/tidy-html5/issues/571)
- Why does tidy format the '\<' and '\>' numeric operator? [\#485](https://github.com/htacg/tidy-html5/issues/485)
- span with display: inline-block is treated as inline [\#448](https://github.com/htacg/tidy-html5/issues/448)
- wrap-php multiple lines [\#437](https://github.com/htacg/tidy-html5/issues/437)
- Option to always encode double ampersands [\#827](https://github.com/htacg/tidy-html5/issues/827)
- \[ENH\] Add meta options to disable/enable cleanup and repair option [\#819](https://github.com/htacg/tidy-html5/issues/819)
- --vertical-space yes adds too much after comment [\#811](https://github.com/htacg/tidy-html5/issues/811)
- Line breaking on "|" [\#810](https://github.com/htacg/tidy-html5/issues/810)
- Installs library in /usr/local/lib/lib instead of /usr/local/lib [\#807](https://github.com/htacg/tidy-html5/issues/807)
- Publishing in VS2015 - System.DllNotFoundException [\#804](https://github.com/htacg/tidy-html5/issues/804)
- can not fix script async Attr to async="async" [\#799](https://github.com/htacg/tidy-html5/issues/799)
- Feature Request: Omit boilerplate [\#795](https://github.com/htacg/tidy-html5/issues/795)
- html conversion to xml leaves many tags unclosed [\#792](https://github.com/htacg/tidy-html5/issues/792)
- NppTidy 5.6.0 quickref.html link broken - please fix [\#787](https://github.com/htacg/tidy-html5/issues/787)
- Redundant blank lines when printing -help [\#781](https://github.com/htacg/tidy-html5/issues/781)
- --css-prefix option no longer adds a hyphen to its built classes [\#777](https://github.com/htacg/tidy-html5/issues/777)
- Build error on Android \(Termux\): unknown type name 'ulong' [\#773](https://github.com/htacg/tidy-html5/issues/773)
- alter default config file processing [\#772](https://github.com/htacg/tidy-html5/issues/772)
- Tidy output going to stderr [\#763](https://github.com/htacg/tidy-html5/issues/763)
- --tidy-mark no inserts blank line [\#760](https://github.com/htacg/tidy-html5/issues/760)
- tidy -access: \<doctype\> NOT missing [\#758](https://github.com/htacg/tidy-html5/issues/758)
- type qualifiers ignored on function return type \[-Werror=ignored-qualifiers\] [\#746](https://github.com/htacg/tidy-html5/issues/746)
- tidy dies on unexpected character [\#745](https://github.com/htacg/tidy-html5/issues/745)
- tidy 5.6.0 warning `inserting missing 'title' element` appears in php-only files [\#728](https://github.com/htacg/tidy-html5/issues/728)
- bug\(build\): tidyBufAppend\(&buf1, d-\>def, strlen\(d-\>def\)\); [\#721](https://github.com/htacg/tidy-html5/issues/721)
- Allow specify ranges of code that do not get checked [\#720](https://github.com/htacg/tidy-html5/issues/720)
- README/CONTRIBUTING.md [\#718](https://github.com/htacg/tidy-html5/issues/718)
- tidy's error messages should include filename somewhere [\#713](https://github.com/htacg/tidy-html5/issues/713)
- Tidy does not complain about valign in \<tr\>, \<th\> or \<td\> [\#711](https://github.com/htacg/tidy-html5/issues/711)
- tidy converts '&' in query parameters \(&aen=true =\> &amp;aen=true\) in relative paths [\#710](https://github.com/htacg/tidy-html5/issues/710)
- TidyHtml not working properly in C++ [\#707](https://github.com/htacg/tidy-html5/issues/707)
- Unescaped `&` emitted despite using \*\*output-xhtml\*\* key bindings in 5.6.0 in PHP bindings [\#704](https://github.com/htacg/tidy-html5/issues/704)
- How to ignore specific warnings [\#699](https://github.com/htacg/tidy-html5/issues/699)
- Mention the need for a `:` before options' value in the configuration file [\#698](https://github.com/htacg/tidy-html5/issues/698)
- Tidy 5.6.0 -\> bug with pre tag [\#690](https://github.com/htacg/tidy-html5/issues/690)
- Is there any way to remove inline styles? [\#689](https://github.com/htacg/tidy-html5/issues/689)
- feature request\(safari\): Pinned Tab Icons support [\#686](https://github.com/htacg/tidy-html5/issues/686)
- Adopt Cygwin tidy package [\#680](https://github.com/htacg/tidy-html5/issues/680)
- Clarification on releases / release tarballs missing [\#676](https://github.com/htacg/tidy-html5/issues/676)
- --fix-uri no does not turn off check [\#675](https://github.com/htacg/tidy-html5/issues/675)
- Unexpected behavior of 'add-xml-space' setting when used with 'wrap' =\> 0 and saveBuffer is called twice in tidy-html5 5.6.0 [\#673](https://github.com/htacg/tidy-html5/issues/673)
- show-body-only [\#672](https://github.com/htacg/tidy-html5/issues/672)
- Tidy deletes empty tags [\#669](https://github.com/htacg/tidy-html5/issues/669)
- unbalanced \#endif's [\#663](https://github.com/htacg/tidy-html5/issues/663)
- Feature request: option to replace inline styles with classes + `<style>` tag styles [\#638](https://github.com/htacg/tidy-html5/issues/638)
- Windows 32-bit XP Release [\#568](https://github.com/htacg/tidy-html5/issues/568)
- Release an updated HTML::Tidy perl library [\#562](https://github.com/htacg/tidy-html5/issues/562)
- \<Script\> tag gets removed [\#528](https://github.com/htacg/tidy-html5/issues/528)
- Word filtered html doesn't convert accents to utf8 [\#512](https://github.com/htacg/tidy-html5/issues/512)
- option to ignore attribute-errors if attribute contains pseudo-elements [\#505](https://github.com/htacg/tidy-html5/issues/505)
- Allow \<div\> inside \<pre\> [\#479](https://github.com/htacg/tidy-html5/issues/479)
**Merged pull requests:**
- Fixes \#743. [\#966](https://github.com/htacg/tidy-html5/pull/966) (@balthisar)
- Fixed merge conflict; fixed non-build issue on macOS. RC for testing. [\#965](https://github.com/htacg/tidy-html5/pull/965) (@balthisar)
- README.md: add Wikidata link [\#961](https://github.com/htacg/tidy-html5/pull/961) (@vitaly-zdanevich)
- Fix issues with user-specified settings changing [\#959](https://github.com/htacg/tidy-html5/pull/959) (@balthisar)
- Automated Testing [\#957](https://github.com/htacg/tidy-html5/pull/957) (@balthisar)
- simple fix for the range of the condition. [\#953](https://github.com/htacg/tidy-html5/pull/953) (@ihsinme)
- Add muted and playsinline video attributes for HTML5. [\#949](https://github.com/htacg/tidy-html5/pull/949) (@drichardson)
- Add German Language [\#943](https://github.com/htacg/tidy-html5/pull/943) (@balthisar)
- Link macOS console application with required plist [\#942](https://github.com/htacg/tidy-html5/pull/942) (@balthisar)
- Is. \#839 - new message for 'blank' title [\#930](https://github.com/htacg/tidy-html5/pull/930) (@geoffmcl)
- Support extended color names in HTML 5 validation [\#914](https://github.com/htacg/tidy-html5/pull/914) (@cqcallaw)
- Fix percentage validation in CheckLength [\#912](https://github.com/htacg/tidy-html5/pull/912) (@cqcallaw)
- Add SVG paint attributes [\#907](https://github.com/htacg/tidy-html5/pull/907) (@cqcallaw)
- Is. \#879: add loading attribute for img, iframe [\#902](https://github.com/htacg/tidy-html5/pull/902) (@sidvishnoi)
- COMPILE\_FLAGS property only once per target, avoid overwriting. [\#886](https://github.com/htacg/tidy-html5/pull/886) (@SvenPStarFinanz)
- Complete pt\_br translation [\#881](https://github.com/htacg/tidy-html5/pull/881) (@hugotiburtino)
- Support the \<slot\> tag [\#848](https://github.com/htacg/tidy-html5/pull/848) (@lhchavez)
- Issue \#437 - re-use of 'wrap-php' option [\#645](https://github.com/htacg/tidy-html5/pull/645) (@geoffmcl)
- Change "tidyLocalMapItem" to "tidyLocaleMapItem" [\#829](https://github.com/htacg/tidy-html5/pull/829) (@MrSorcus)
- added OS \_\_ANDROID\_\_ in tidyplatform.h [\#823](https://github.com/htacg/tidy-html5/pull/823) (@naveedpash)
- Update BRANCHES.md [\#793](https://github.com/htacg/tidy-html5/pull/793) (@SConaway)
- Is. \#783 - Fix language detection [\#785](https://github.com/htacg/tidy-html5/pull/785) (@Lin-Buo-Ren)
- Is. \#781 - Drop redundant blank lines in -help [\#782](https://github.com/htacg/tidy-html5/pull/782) (@Lin-Buo-Ren)
- Issue 649 adding tag \<data\> [\#769](https://github.com/htacg/tidy-html5/pull/769) (@AntoniosHadji)
- Issue 752 [\#764](https://github.com/htacg/tidy-html5/pull/764) (@geoffmcl)
- PHP ≥ 7.1.0 recognizes tidy-html5 [\#762](https://github.com/htacg/tidy-html5/pull/762) (@cmb69)
- Fix typo [\#753](https://github.com/htacg/tidy-html5/pull/753) (@Lin-Buo-Ren)
- Fix extra const modifier [\#747](https://github.com/htacg/tidy-html5/pull/747) (@drizt)
- Is \#721 - cast away some gcc warnings [\#722](https://github.com/htacg/tidy-html5/pull/722) (@geoffmcl)
- Doc nits [\#717](https://github.com/htacg/tidy-html5/pull/717) (@ler762)
- Is \#709 - Improve message if 'implict' [\#714](https://github.com/htacg/tidy-html5/pull/714) (@geoffmcl)
- Make global attribute `dir` accept auto as well. [\#712](https://github.com/htacg/tidy-html5/pull/712) (@doronbehar)
- Is \#697 - Add NOWRAP to print of pre tag [\#708](https://github.com/htacg/tidy-html5/pull/708) (@geoffmcl)
- Is \#700 - change script parsing if in html5 mode [\#703](https://github.com/htacg/tidy-html5/pull/703) (@geoffmcl)
- Issue 698 - docs update [\#702](https://github.com/htacg/tidy-html5/pull/702) (@geoffmcl)
- Is \#686 - Add attr COLOR to W3CAttrsFor\_LINK [\#701](https://github.com/htacg/tidy-html5/pull/701) (@geoffmcl)
- Issue 679 [\#695](https://github.com/htacg/tidy-html5/pull/695) (@geoffmcl)
- Issue 663 - fixes for Haiku port [\#664](https://github.com/htacg/tidy-html5/pull/664) (@geoffmcl)
\* *This Changelog was automatically generated by [github_changelog_generator](https://github.com/github-changelog-generator/github-changelog-generator)*

27
docs/CODESTYLE.md Normal file
View file

@ -0,0 +1,27 @@
# HTML Tidy Code Style
The source code of **libTidy** and console app **tidy** mostly follow the preferences of the original maintainers. Perhaps some of these decisions were arbitrary and based on their sense of aesthetics at the time, but it is good to have all the code looking the same even if it is not exactly what everyone would prefer.
Developers adding code to HTML Tidy are urged to try to follow the existing code style. Code that does not follow these conventions may be accepted, but may be modified as time goes by to best fit the “Tidy Style.”
There has been a suggestion of using available utilities to make the style consistent, like [Uncrustify](https://github.com/uncrustify/uncrustify) - see [issue #245](https://github.com/htacg/tidy-html5/issues/245), and maybe others.
Others have suggested the [AStyle](http://astyle.sourceforge.net/) formatting program with say `-taOHUKk3 -M8` arguments, to conform, but there are a few bugs in AStyle.
But again, these and other tools may not produce code that everybody agrees with, and are presently not formally used in Tidy!
#### Known Conventions
From reading of the Tidy source, some things are self evident, in no particular order...
- Use of 4-space indenting, and no tabs.
- No C++ single line comments using `//`.
- The opening `{` is indented on the next newline.
- While the maximum code line length varies, generally long `if`, `while`, ... statements are wrapped to newlines.
- Pointer operators in declarations must precede any macro documentation, e.g, `const tidyLocaleMapItem* TIDY_CALL getNextWindowsLanguage( TidyIterator* iter )` instead of `const tidyLocaleMapItem TIDY_CALL *getNextWindowsLanguage( TidyIterator* iter )` in case `TIDY_CALL` is defined.
Look forward to this document being filled out in detail...
Date: 20150904

99
docs/CONTRIBUTING.md Normal file
View file

@ -0,0 +1,99 @@
# Contributing to HTML Tidy
So you want to contribute to Tidy? Fantastic! Here's a brief overview on how best to do so.
### Support request
If you are having trouble running console `Tidy`, or using the `LibTidy` API in your own project, then maybe the best places to get help is either via a comment in [Tidy Issues](https://github.com/htacg/tidy-html5/issues), or on the [Tidy Mail Archive](https://lists.w3.org/Archives/Public/html-tidy/) list.
And please do a **search** using different **key** words - see [searching](https://help.github.com/articles/searching-issues-and-pull-requests/) - to make sure it is **not** a duplicate. If something similar has been discussed before, but you still feel this is **different**, then add that related reference in your post...
In either place please start with a short subject to describe the issue. If it involves running Tidy on an html file, or if its an API question, make sure to include:
- the version: `$ tidy -v`
- what was the configuration used
- a small sample input
- the output
- the _expected_ output
- some sample code (if an API question).
This information will make replication of your issue much simpler for us.
If you do add sample HTML input, then it can also be very helpful if that sample **passes** the W3C [validator](https://validator.w3.org/#validate_by_upload). Tidy attempts to follow all current W3C standards.
If you are able to build tidy from [source](https://github.com/htacg/tidy-html5) (requires [CMake](https://cmake.org/download/)), and you can find the problem in the source code, then read on about how you can create a Pull Request (“PR”) to share your code and ideas.
### What to change
Here are some examples of things you might want to make a PR for:
- New features
- Bug fixes
- Inefficient blocks of code
- Memory problems
- Language translations
If you have a more deeply-rooted problem with how the program is built or some of the stylistic decisions made in the code, it is best to [create an issue](https://github.com/htacg/tidy-html5/issues/new) before putting the effort into a pull request. The same goes for new features - it might be best to check the project's direction, existing pull requests, and currently open and closed issues first.
Concerning the “Tidy Code Style,” checkout [CODESTYLE.md](CODESTYLE.md), but looking at existing code is the best way to get a good feel for the patterns we use.
### Using Git appropriately
1. Fork tidy to your own github account. Use top right `Fork` icon.
2. Optional: Generate a SSH Key, and add it to your `https://github.com/<name>` settings, SSH and GPG keys
3. Clone your own fork - `$ git clone git@github.com:<name>/tidy-html5.git [tidy-fork]` Or using `https`.
4. Create a branch - `$ cd tidy-fork; $ git checkout -b <branch-name>`
5. Edit, and commit your changes to this `branch` of your fork.
6. Test your changes, and if appropriate run [regression](https://github.com/htacg/tidy-html5-tests/blob/next/README/RUNTESTS.md) tests.
7. Publish the branch - `$ git push -u origin <branch-name.` - to your remote fork.
8. Create a [Pull Request](https://help.github.com/articles/about-pull-requests/), a **PR**, here.
9. Watch for comments, acceptance.
Item 2., SSH Key, is optional, and only required if you want to use `clone git@github.com...`. And if you generate the ssh without a `passphrase`, things like `git push` can be done without a password. Just convenience. Alternatively you can use the `HTTPS` protocol...
Concerning 5., editing and committing your changes, **generally** it is better to `commit` changes often, adding an appropriate commit message to each, like `$ git commit -m "Is. #NNN - reason for change" <file[s]>`. This also aids in the **PR** review.
But the situation varies. Like adding say an option, which can mean several files have to be edited, where it is likely appropriate to combine a considerable number of edits into one commit. There can be no hard and fast rules on this.
Please note, if you want to change **multiple** things that don't depend on each other, use **different** `branches`, and make sure you check the `next` branch back out, before making more changes in a **new** branch name. That way we can take in each **change** separately, otherwise Github will **combine** all your branch commits into one **PR**.
See below on keeping your forks `next` fully in sync with here, called `upstream` - **this is important**.
```
$ git remote add upstream git@github.com:htacg/tidy-html5.git # once only
$ git checkout next
$ git status
$ git stash # if not clean
$ git fetch upstream
$ git rebase upstream/next
$ git stash pop # if required, and fix conflicts
$ git push # update the fork next
```
This has to be repeated for other branches, too. `$ git checkout <your-branch>`, `$ git rebase next`, fix conflict, if any, and `$ git push`, for **each** branch. It is **not** fun to keep multiple `branches` fully up-to-date with an active `upstream`...
Of course, the **regression** tests, 6., are really only if you have made `code` changes, but it is a good habit to get into. As can be seen the `tests` are in a **separate** repo, so you must also clone that, or **fork** and clone that, to be able to present a **PR**. This is best done in the same `root` folder where where you cloned `tidy-html5`, and your `tidy-fork`. See [RUNTESTS.md](https://github.com/htacg/tidy-html5-tests/blob/next/README/RUNTESTS.md).
In brief, for unix, to use your potentially **new** `tidy-fork` tidy executable -
```
$ git clone git@github.com:htacg/tidy-html5-tests.git
$ cd tidy-html5-tests/tools-sh
$ ./testall.sh ../../tidy-fork/build/cmake/tidy
$ diff -u ../cases/testbase-expects ../cases/testbase-results
```
Use folder `tools-cmd` for windows. Run `alltest.bat --help`.
If the `tests` shows a different exit value, or there are differences between the `expects` and `results`, these **must** be studied, and checked, very carefully. There may be cases where the **new** `results` are correct, in which case a simultaneous **PR** for the forked `tests` **must** be created to match your forked source **PR**.
Do **NOT** change either the root `version.txt` here, nor the `cases/_version.txt` in `tests`. This will be handled by the person that does the **PR** merge. To differentiate your modified `tidy` there is a cmake option, like `-DTIDY_RC_NUMBER=I123`, which will appear in `tidy -v` as `5.7.16.I123`. The number can be anything, but using the relevant issue value is a good choice.
Add an `issue` if you need further **help**... thanks...
### Help Tidy Get Better
It goes without saying **all help is appreciated**. We need to work together to make Tidy better!

50
docs/LICENSE.md Normal file
View file

@ -0,0 +1,50 @@
# HTML Tidy
## HTML parser and pretty printer
Copyright (c) 1998-2016 World Wide Web Consortium
(Massachusetts Institute of Technology, European Research
Consortium for Informatics and Mathematics, Keio University).
All Rights Reserved.
Additional contributions (c) 2001-2016 University of Toronto, Terry Teague,
@geoffmcl, HTACG, and others.
### Contributing Author(s):
Dave Raggett <dsr@w3.org>
The contributing author(s) would like to thank all those who
helped with testing, bug fixes and suggestions for improvements.
This wouldn't have been possible without your help.
## COPYRIGHT NOTICE:
This software and documentation is provided "as is," and
the copyright holders and contributing author(s) make no
representations or warranties, express or implied, including
but not limited to, warranties of merchantability or fitness
for any particular purpose or that the use of the software or
documentation will not infringe any third party patents,
copyrights, trademarks or other rights.
The copyright holders and contributing author(s) will not be held
liable for any direct, indirect, special or consequential damages
arising out of any use of the software or documentation, even if
advised of the possibility of such damage.
Permission is hereby granted to use, copy, modify, and distribute
this source code, or portions hereof, documentation and executables,
for any purpose, without fee, subject to the following restrictions:
1. The origin of this source code must not be misrepresented.
2. Altered versions must be plainly marked as such and must
not be misrepresented as being the original source.
3. This Copyright notice may not be removed or altered from any
source or altered source distribution.
The copyright holders and contributing author(s) specifically
permit, without fee, and encourage the use of this source code
as a component for supporting the Hypertext Markup Language in
commercial products. If you use this source code in a product,
acknowledgement is not required but would be appreciated.

50
docs/LICENSE.txt Normal file
View file

@ -0,0 +1,50 @@
# HTML Tidy
## HTML parser and pretty printer
Copyright (c) 1998-2016 World Wide Web Consortium
(Massachusetts Institute of Technology, European Research
Consortium for Informatics and Mathematics, Keio University).
All Rights Reserved.
Additional contributions (c) 2001-2016 University of Toronto, Terry Teague,
@geoffmcl, HTACG, and others.
### Contributing Author(s):
Dave Raggett <dsr@w3.org>
The contributing author(s) would like to thank all those who
helped with testing, bug fixes and suggestions for improvements.
This wouldn't have been possible without your help.
## COPYRIGHT NOTICE:
This software and documentation is provided "as is," and
the copyright holders and contributing author(s) make no
representations or warranties, express or implied, including
but not limited to, warranties of merchantability or fitness
for any particular purpose or that the use of the software or
documentation will not infringe any third party patents,
copyrights, trademarks or other rights.
The copyright holders and contributing author(s) will not be held
liable for any direct, indirect, special or consequential damages
arising out of any use of the software or documentation, even if
advised of the possibility of such damage.
Permission is hereby granted to use, copy, modify, and distribute
this source code, or portions hereof, documentation and executables,
for any purpose, without fee, subject to the following restrictions:
1. The origin of this source code must not be misrepresented.
2. Altered versions must be plainly marked as such and must
not be misrepresented as being the original source.
3. This Copyright notice may not be removed or altered from any
source or altered source distribution.
The copyright holders and contributing author(s) specifically
permit, without fee, and encourage the use of this source code
as a component for supporting the Hypertext Markup Language in
commercial products. If you use this source code in a product,
acknowledgement is not required but would be appreciated.

19
docs/LOCALIZE.md Normal file
View file

@ -0,0 +1,19 @@
# Localize HTML Tidy
HTML Tidy is used worldwide but is not very friendly to non-English speakers.
The latest versions of HTML Tidy and `libtidy` now support other languages and
regional variations, but we need your help to make it accessible to these users
by using your knowledge of other languages to make Tidy better.
Help us translate HTML Tidy into another language and as part of our project
team you will certainly earn the admiration of fellow Tidy users worldwide.
## How to Contribute
All READMEs (including [instructions][2] on how to localize Tidy) and related
materials can be found in [localize][1].
[1]: ../localize
[2]: ../localize/README.md

41
docs/MESSAGES.md Normal file
View file

@ -0,0 +1,41 @@
# Message System
Tidy has a quite complex warning/error report and footnote messaging system, but most of this complexity is completely hidden away from you in order to make adding messages as simple as possible. This particular README instructs you how to add a new warning/error report to **libTidy**.
First assign the message a **key** value. This is done in `tidyenum.h`, in one of the two enumerations that are listed there.
1. `tidyStrings` - starts with the value `TIDYSTRINGS_FIRST = 500`, and it must be first. This is the list of all strings available in Tidy with the exception of strings provided by other enumerations. **However** don't modify this enum directly. You'll modify a preprocessor macro instead.
2. `TidyOptionId` - You probably won't need this unless you're adding new options, and there's another readme for that.
3. `TidyConfigCategory` - You probably won't need this, either, unless you're adding a whole new category for options.
4. `TidyReportLevel` - And you probably won't need this, either.
All enum values are only ever used by name within **libTidy** (and incidentally, should only ever be used by name in your client applications; never trust the value!), so feel free to enter new strings in English alphabetical order (this helps us audit all of the strings from time to time).
As mentioned above, `tidyStrings` messages must be defined in one of the existing macros named like `FOREACH_...(FN)`, such as `FOREACH_DIALOG_MSG(FN)`. These macros ensure that another data structure used for localization and key lookup is updated automatically any time strings are added or removed, thus limiting the possibility of developer error.
## Step 1
So in this case I want to add 3 warning messages: `BAD_SURROGATE_PAIR`, `BAD_SURROGATE_TAIL`, and `BAD_SURROGATE_LEAD`. Because these are error reports, they belong in the `tidyStrings` enum, and they fit into nicely into the macro beginning `FOREACH_REPORT_MSG(FN)`. Add the message key values into this macro, ensuring they are nested in the `FN()` syntax.
## Step 2
The next step is adding a `format` string to `language_en.h`. This string may later be translated to various supported language strings, but even if you wish to support another language, its critical that you add the message format string to `language_en.h`, which serves as the base language for `LibTidy`.
Where to add this seems a bit of a mess, but in general things are grouped by where they're used in `libTidy`, and often in alphabetical order within those groups. Here I've added them in alphabetical order in the section where all of the other report messages are.
Depending on which of the output routines you use (consult `message.c`) you may be able to use parameters such as `%u` and `%s` in your format strings. The available data is currently limited to the available message output routines. Please don't use `printf` for message output within **libTidy**.
Note that Tidy doesn't currently support numbered `printf` parameters; parameters will be consumed in the order the report output function calls them.
## Step 3
The last step — hopefully — is adding the message key to the `dispatchTable[]` structure in `message.c`. This structure determines the `TidyReportLevel` (report severity) and message formatter (how to print the message). Then whenever you issue the report with `TY_(Report)()` or one of the existing convenience report functions, the correct message formatter will be used for the parameters that you specify.
Please read the source code in `message.c` for help on how to choose a message formatter, or how to modify one of the existing message formatters if you need to accommodate a new function signature for your report.
eof;

134
docs/OPTIONS.md Normal file
View file

@ -0,0 +1,134 @@
# Tidy Config Options
Tidy supports a quite large number of configuration options. The full list can be output using `-help-config`. This will show the option to be used either on the command line or in a configuration file, the type of option, and the value(s) that can be used. The current default value for each option can be seen using `-show-config`.
The options can also be listed in xml format. `-xml-help` will output each option plus a description. `-xml-config` will not only output the option and a description, but will include the type, default and examples. These xml outputs are used, with the aid of `xsltproc` and `doxygen`, to generate the [API Documentation](https://api.html-tidy.org/).
These options can also be used by application linking with `libtidy`. For each option there is a `TidyOptionId` enumeration in the `tidyenum.h` file, and get/set functions for each option type.
This file indicates how to add a new option to tidy, here adding an option `TidyEscapeScripts`. In essence it consists of 4 steps:
1. Add the option `ID` to `tidyenum.h`.
2. Add to the `table` `TidyOptionImpl option_defs[]` in `config.c`
3. Add the id, with a `description` to `language_en.h`
4. Use the option in the code.
#### 1. Option ID
In `tidyenum.h` the `TidyOptionId` can be in any order, but please try to keep things alphabetical, and keep in mind that `N_TIDY_OPTIONS` must remain the last. Choosing the id name can be any string, but by convention it will commence with `Tidy` followed by brief descriptive text.
Naturally it can not be the same as any existing option. That is, it must be unique. And it will be followed by a brief descriptive special doxygen formatted comment. So for this new option I have chosen -
~~~
TidyEscapeScripts, /**< Escape items that look like closing tags */
~~~
#### 2. Table Definition
In `config.c`, added in `TidyOptionImpl option_defs[]`. Again it can be in any order, but normally a new option would be added just before the last `N_TIDY_OPTIONS`, which must remain the last.
The structure definition of the table entries is simple -
~~~
struct _tidy_option
{
TidyOptionId id;
TidyConfigCategory category; /* put 'em in groups */
ctmbstr name; /* property name */
TidyOptionType type; /* string, int or bool */
ulong dflt; /* default for TidyInteger and TidyBoolean */
ParseProperty* parser; /* parsing method, read-only if NULL */
PickListItems* pickList; /* pick list */
ctmbstr pdflt; /* default for TidyString */
};
~~~
Naturally it will commence with the above chosen unique `id`.
The `category` will be one of this enumeration -
~~~
typedef enum
{
TidyMarkup, /**< Markup options: (X)HTML version, etc */
TidyDiagnostics, /**< Diagnostics */
TidyPrettyPrint, /**< Output layout */
TidyEncoding, /**< Character encodings */
TidyMiscellaneous, /**< File handling, message format, etc. */
TidyInternalCategory /**< Option is internal only. */
} TidyConfigCategory;
~~~
Care, each of these enumeration strings have been equated to 2 uppercase letters. If you feel there should be another `category` or group then this can be discussed, and added.
The `name` can be anything, but should try to be somewhat descriptive of the option. Again this string must be unique. It should be lowercase alphanumeric characters, and can contain a `-` separator. Remember this is the name places on the command line, or in a configuration file to set the option.
The `type` is one of the following enumeration items -
~~~
typedef enum
{
TidyString, /**< String */
TidyInteger, /**< Integer or enumeration */
TidyBoolean /**< Boolean flag */
} TidyOptionType;
~~~
Care, each of these enumeration strings have been equated to two uppercase letters. If you feel there should be another `type` then this can be discussed, but would require other additional things. And also note the `TidyTriState` is the same as a `TidyInteger` except uses its own parser.
The next item is the `default` value for a boolean, tristate or integer. Note tidy set `no=0` and `yes=1` as its own `Bool` enumeration.
There are a number of `parser` for the options. Likewise a number of `pickList`. Find another option similar to your new option and use the same values. The `parser` is the function that parses config file or command line text input, and the `picklist` constitutes the canonical values for the option. Some types of values logically don't have picklists, such as strings or pure integers.
Presently no options have the final `default` string, and it is left out of the table. The compiler will add a NULL.
The final table entry added. Note in here the spacing has been compressed, but in the actual code the current column settings should be maintained if possible -
~~~
{ TidyEscapeScripts, PP, "escape-scripts", BL, yes, ParseBool, boolPicks[, NULL] }, /* 20160227 - Issue #348 */
~~~
#### 3. Option Description
In `language_en.h`, in the section labelled **Options Documentation**. Please try to keep this in alphabetical order.
Each entry is a structure with 3 members -
~~~
typedef struct languageDictionaryEntry {
uint key;
uint pluralForm;
ctmbstr value;
} languageDictionaryEntry;
~~~
The `key` is the option `ID`; The `pluralForm` is not used for options, and should be `0`; The `value` is the description string.
Some care has to be taken with the description string. The only html allowed here is `<code>...</code>`, `<var>...</var>`, `<em>...</em>`, `<strong>...</strong>`, and `<br/>`. Entities, tags, attributes, etc., should be enclosed in `<code>...</code>`. Option values should be enclosed in `<var>...</var>`. It's very important that `<br/>` be self-closing! This string is processed to build the API documentation.
This is the description added for this new option.
~~~
{
TidyEscapeScripts, 0,
"This option causes items that look like closing tags, like <code>&lt;/g</code> to be "
"escaped to <code>&lt;\\/g</code>. Set this option to 'no' if you do not want this."
},
~~~
#### 4. Use in Code
This can be added anywhere in the code to change the current code action. While the testing of the option depends on the option type, the most common is `cfgBool( doc, id )`. Here is an example of where this new option is used -
~~~
/*\ if javascript insert backslash before /
* Issue #348 - Add option, escape-scripts, to skip
\*/
if ((TY_(IsJavaScript)(container)) && cfgBool(doc, TidyEscapeScripts))
{
~~~
#### Summary
That's about it. Just 4 places. Obviously the best idea it to search for an existing option `ID`, and follow where it is all defined and used, and copy that. It is not difficult.
; eof 20160310

57
docs/README.html Normal file
View file

@ -0,0 +1,57 @@
<h1 id="htacghtmltidy">HTACG HTML Tidy</h1>
<h2 id="prerequisites">Prerequisites</h2>
<ol>
<li><p>git - <a href="https://git-scm.com/book/en/v2/Getting-Started-Installing-Git">https://git-scm.com/book/en/v2/Getting-Started-Installing-Git</a></p></li>
<li><p>cmake - <a href="https://cmake.org/download/">https://cmake.org/download/</a></p></li>
<li><p>appropriate build tools for the platform</p></li>
<li><p>the <a href="http://xmlsoft.org/XSLT/xsltproc2.html">xsltproc</a> tool is required to build and install the <code>tidy.1</code> man page on Unix-like platforms.</p></li>
</ol>
<p>CMake comes in two forms - command line and GUI. Some installations only install one or the other, but sometimes both. The build commands below are only for command line use.</p>
<p>Also the actual build tools vary for each platform. But that is one of the great features of CMake, it can generate various &#8216;native&#8217; build files. Running <code>cmake --help</code> should list the generators available on that platform. For sure one of the common ones is &#8220;Unix Makefiles&#8221;, which needs autotools make installed, but many other generators are supported.</p>
<p>In Windows CMake offers various versions for MSVC. Again below only the command line use of MSVC is shown, but the tidy solution (*.sln) file can be loaded into the MSVC IDE, and the building done in there.</p>
<h2 id="buildthetidylibraryandcommandlinetool">Build the tidy library and command line tool</h2>
<ol>
<li><p><code>cd build/cmake</code></p></li>
<li><p><code>cmake ../.. -DCMAKE_BUILD_TYPE=Release [-DCMAKE_INSTALL_PREFIX=/path/for/install]</code></p></li>
<li><p>Windows: <code>cmake --build . --config Release</code><br/>
Unix/OS X: <code>make</code></p></li>
<li><p>Install, if desired:<br/>
Windows: <code>cmake --build . --config Release --target INSTALL</code><br/>
Unix/OS X: <code>[sudo] make install</code></p></li>
</ol>
<p>By default cmake sets the install path to <code>/usr/local/bin</code> in Unix. If you wanted the binary in say <code>/usr/bin</code> instead, then in 2. above use <code>-DCMAKE_INSTALL_PREFIX=/usr</code>.</p>
<p>Also, in Unix if you want to build the release library without any debug <code>assert</code> in the code then add <code>-DCMAKE_BUILD_TYPE=Release</code> in step 2. This adds a <code>-DNDEBUG</code> macro to the compile switches. This is normally added in windows build for the <code>Release</code> config.</p>
<p>In Windows the default install is to <code>C:\Program Files\tidy</code>, or <code>C:/Program Files (x86)/tidy</code>, which is not very useful. After the build the <code>tidy.exe</code> is in the <code>Release</code> directory, and can be copied to any directory in your <code>PATH</code> environment variable for global use.</p>
<p>If you do <strong>not</strong> need the tidy library built as a &#8216;shared&#8217; (DLL) library, then in 2. add the command <code>-DBUILD_SHARED_LIB:BOOL=OFF</code>. This option is <strong>ON</strong> by default. The static library is always built and linked with the command line tool for convenience in Windows, and so the binary can be run as part of the man page build without the shared library being installed in unix.</p>
<p>See the <code>CMakeLists.txt</code> file for other CMake <strong>options</strong> offered.</p>
<h2 id="buildphpwiththetidy-html5library">Build PHP with the tidy-html5 library</h2>
<p>Due to API changes in the PHP source, <code>buffio.h</code> needs to be renamed to <code>tidybuffio.h</code> in the file <code>ext/tidy/tidy.c</code> in PHP&#8217;s source.</p>
<p>That is - prior to configuring PHP run this in the PHP source directory:
<code>
sed -i 's/buffio.h/tidybuffio.h/' ext/tidy/*.c
</code></p>
<p>And then continue with (just an example here, use your own PHP config options):</p>
<pre><code>./configure --with-tidy=/usr/local
make
make test
make install
</code></pre>
<p>; eof</p>

186
docs/RELEASE.md Normal file
View file

@ -0,0 +1,186 @@
HTML Tidy Release Process Single Point Lesson (SPL)
===================================================
Purpose
-------
This lesson documents how to release a new, officially released version of HTML Tidy. Following the steps in this SPL ensures consistency between releases and results in a predictable experience for the end users of HTML Tidy.
Definition
----------
HTACG HTML Tidy is a library and console application, and the release process is intended to officially designate a point of stability, where "stability" is ABI and API stability. "Stability" is represented by an even minor version number, and "released" means the HTACG has published and made available an even minor version number version of HTML Tidy.
Due to the number of platforms and language bindings which have adopted HTML Tidy, we are highly dependent on package managers to track the latest release versions of HTML Tidy, and release it on these other platforms. We cannot maintain distributions and language bindings for your favorite operating system or language package manager.
Release Process Overview
------------------------
The release process consists largely of the following steps:
### Lead up:
- Create the next release milestone on Github if not already done.
- Decide on PR's to include in the release, bumping version.txt, accordingly.
- Decide on any show-stopper outstanding issues, and action them.
- Change milestone of all excluded-this-time issues to the next milestone, or to indefinite.
- Decide target date for release.
### Release:
- Update the version number to the next release version, e.g., from 5.5.xx to 5.6.0.
- Generate a change log.
- Push this change to `next`.
- Merge this branch to `master`.
- Tag the release on `master`.
- Update the version number of the `next` branch to the next development version, e.g., from 5.8.0 to 5.9.0. At this point, both versions are identical except for the version number.
- Generate the API documentation for `next`.
- Generate the API documentation for `master`.
- Update the https://api.html-tidy.org/ website with the new API documentation.
- Update the https://www.html-tidy.org/ website with the new release version.
- Update the https://binaries.html-tidy.org/ website.
- Push all of the changes.
- Build binaries.
- Create a Github release per the Git tag created above.
- Post the binaries in the release.
Release Steps in Detail
--------------------------
### Update the version number to the next release version
In the `next` branch, modify the `version.txt` file to the next release version, for example, 5.8.0, and set the date.
### Generate a change log
If necessary, install the `github_changelog_generator`. This requires that you have a Ruby environment on your computer, which you probably do because the regression tests require it.
~~~
gem install github_changelog_generator
~~~
Youll also need a Github personal access token in order for the tool to pull information from Github. You can acquire one [here](https://github.com/settings/tokens).
Generate the change log like so:
~~~
github_changelog_generator -u htacg -p tidy-html5 \
--token [github_access_token] \
--since-tag 5.6.0 \
--usernames-as-github-logins \
--future-release 5.8.0 \
-o README/CHANGELOG.md
~~~
**Important**: the `--since-tag` value should be the git tag of the previous release, because we're only interested in the changes since then. The `--future-release` value should be the git tag that you _will apply_ (but have not yet) for this release.
### Commit this change to `next`
~~~
git add .
git commit -m 'Releasing HTML Tidy 5.8.0'
~~~
### Merge this branch to `master`
~~~
git checkout master
git merge next
~~~
### Tag the release on `master`
~~~
git tag -a 5.8.0 -m "HTML Tidy version 5.8.0"
~~~
### Update the version number of the `next` branch
For example, from from 5.8.0 to 5.9.0. At this point, both versions are identical except for the version number. Edit the `version.txt` file to make this change, and then
commit it to the repository.
### Generate the API documentation for `next`
First, build the `next` binary in `build/cmake`, so that API documentation tools have a correct version of HTML Tidy to work with. Perform a `git clean -xdf` in this directory prior to building for good measure.
After building, `cd` to the correct directory and build the documentation:
~~~
cd html-tidy.org.api/tidy-html5-doxygen
./build_docs.sh
~~~
Then in the `html-tidy.org.api/tidy-html5-doxygen/output` directory, rename the resulting directory and file to `tidylib_api_next/` and `quickref_next.html`, and move them into `html-tidy.org.api/tidy/`.
### Generate the API documentation for `master`
First, checkout `master` build the binary in `build/cmake`, so that API documentation tools have a correct version of HTML Tidy to work with. Perform a `git clean -xdf` in this directory prior to building for good measure.
After building, `cd` to the correct directory and build the documentation:
~~~
cd html-tidy.org.api/tidy-html5-doxygen
./build_docs.sh
~~~
Move the resulting directory and file into `html-tidy.org.api/tidy/` directly.
### Update the https://api.html-tidy.org/ website with the new API documentation
Check the copyright dates in `_includes/footer.md`.
~~~
cd html-tidy.org.api/
git add .
git commit -m "Added API documentation for master and next."
git push
~~~
### Update the https://www.html-tidy.org/ website with the new release version
Check the copyright dates in `_includes/footer.md`.
Update `html-tidy.org/homepage/_posts/1970-01-01-htmltidy.md` for the current release version and release year.
~~~
cd html-tidy.org/
git add .
git commit -m "Added API documentation for master and next."
git push
~~~
### Push all of the changes
Back in `tidy-html5/`
~~~
git checkout master
git push
git push origin <tag_name>
git checkout next
git push
~~~
### Build binaries
This is OS specific.
### Create a Github release per the Git tag created above
Do this on Github.
### Post the binaries in the release
Post the binaries here.
### Update the https://binaries.html-tidy.org/ website
Modify the files to point to the binaries in the Github releases.

19
docs/TAGS.md Normal file
View file

@ -0,0 +1,19 @@
# Tidy HTML Elements
This is about adding a new HTML **tag**.
Tidy tries to support all **tags** supported by the W3C. To add a new supported **tag**, the definition begins in `tidyenum.h`, to give it a value. Then it is added to the `tag_defs[]` table in `tags.c`, where it is given a unique string, supported html versions, attributes support, and a bit `type`.
Note, there are a group of configuration options to add **tags** not yet approved by the W3C. These are [new-blocklevel-tags](https://api.html-tidy.org/tidy/quickref_next.html#new-blocklevel-tags), [new-empty-tags](https://api.html-tidy.org/tidy/quickref_next.html#new-empty-tags), [new-inline-tags](https://api.html-tidy.org/tidy/quickref_next.html#new-inline-tags). and [new-pre-tags](https://api.html-tidy.org/tidy/quickref_next.html#new-pre-tags). This provides a way to extend the `tag_defs[]` table just for that tidy session.
So, adding a new HTML **tag** consists of the following simple steps:
1. `tidyenum.h` - Give the element an internal name, like `TidyTag_XXXX`, and thus a value. Please keep this list in alphabetical order.
2. `tags.c` - Add a line to the `tag_defs[]` table. This assigns the unique string value of the element. Then the html versions that support the element, a pointer to the attributes supported by that elelment, and a bit field of the elements characteristics, inline, block, etc.
So, just changing 2 files, `tidyenum.h` and `tags.c`, and libTidy will now support that element, tag, as W3C approved. Simple... And at times, there is some case for adding **tags** that are still in the `Working Draft` stage, especially when there has been wide spread support in the community, even before it reaches `REC` stage.
Now, one could argue that this is not the **best** way to verify every attribute and value, for every tag, but that is a moot point - that is how tidy does it!
; eof 20170205

72
docs/TESTING.md Normal file
View file

@ -0,0 +1,72 @@
# Testing
Its critical that changes you introduce do not cause regressions, i.e., that
Tidys output remains consistent with the introduction of your changes, except
for very specific circumstances.
Additionally, changes that you introduce to Tidy must usually be accompanied by
one or more test cases demonstrating the new feature or changed behavior.
Both of these concerns can be addressed with the Tidy repositorys automated
regression testing features, which are enabled by Github Actions. Any pull
request you make will automatically test your PR against the existing set of
test cases, and any failures are prima facie grounds for rejecting the PR.
You _must_ test your changes locally using the tools and test cases provided in
the `regression_testing/` directory prior to submitting a PR, including adding
test cases to this directory as needed.
## Changes to Existing Output
If your changes affect existing output, its critical to understand _why_, and
if necessary, regenerate the `-expects` files so that the regression testing
tool will pass with your new changes. These `-expects` changes, of course,
become part of your Pull Request, and will be subject to review and conversation
in the Pull Request thread.
If you do cause such regressions, please be prepared to defend why they are
needed.
## New Tests
If youre adding new features to Tidy, code reviewers need to be able to see the
intended effect of your changes via some type of demonstration. As such, please
write at least one test case in `github-cases` and put the expected results in
`github-expects`. These also constitute a part of your Pull Request, and more
importantly, will become part of the standard regression testing suite once the
PR is merged.
Try to keep your test case(s) as succinct as possible, and do try to put some HTML
comments in the file explaining the purpose of the test case, and if applicable,
the Github issue and/or PR number.
Note that the files generated in `github-results` for your new test cases are
suitable for use in `github-expects` when you are satisfied with the results.
A sample `case-123a@0.html` might represent issue #123, test **a** in a series
of multiple tests for this issue number, expecting Tidy exit code 0, and might
look something like this:
```
<!DOCTYPE html>
<html>
<!--
This test case represents HTML Tidy issue #123, and demonstrates
the use of new feature #xxx. Tidy should exit with status code 0.
The reason this change is needed is because WHATWG suddently
determined that a standards change #yyyy impacts us because of zzz.
-->
<head>
<title>Case #123a</title>
</head>
<p>The quick brown fox jumps over the lazy dog.<//p>
<body>
</body>
</html>
```
## Regression Testing Specifics
The regression testing mechanism is described more fully in [regression_testing/README.md](../regression_testing/README.md).

103
docs/VERSION.md Normal file
View file

@ -0,0 +1,103 @@
# HTML Tidy Versioning
This document provides an explanation of how to interpret HTML Tidys version number, e.g. from command line output as a result of `tidy -v`, and also addresses the role of the file `version.txt` in Tidys build process.
## Background
**HTML Tidy** uses a modified version of [Semantic Versioning](https://semver.org/), and so its important to understand what the version number of **HTML Tidy** means to you, and how it might impact your workflow.
When you execute `tidy -v` on the command line, you might see responses such as:
~~~
HTML Tidy for Mac OS X version 5.1.24
HTML Tidy for Mac OS X version 5.2.0
HTML Tidy for Mac OS X version 5.3.15
HTML Tidy for Mac OS X version 5.4.0
HTML Tidy for Mac OS X version 5.5.1
~~~
_Obviously_ 5.5.1 is higher than 5.2.0, right? This might lead you to consider replacing your stable installations of **HTML Tidy** 5.2.0 across the board in your production process, but this might lead you to trouble. A little word about the meaning of our version numbers might help clear things up.
## Major, Minor, Patch
### Major
When HTACG assumed responsibility of **HTML Tidy** from previous maintainers, we immediately declared that the first release would be version 5.0, in honor of finally being able to offer modern HTML5 support.
Barring some major, monumental, massive change to Tidy, or the [release of HTML6](https://blog.whatwg.org/html-is-the-new-html5), **HTML Tidy** will probably be major version 5 forever. Maybe future maintainers will want to be trendy and release **HTML Tidy 2025**, but thats for them to decide.
### Minor
The minor version tells a lot more about the true version of Tidy that you have, but even so its not a simple matter that 5 > 2 and must be better. The minor number indicates **HTML Tidy** _release versions_ or _development versions_.
- **even numbered minor versions** indicate released versions of **HTML Tidy**. We provide binaries for releases, API documentation, and full support including cherry picking bug fixes back to them. In standard parlance, _released_ versions are _stable_ versions, meaning that the API is stable and you can generally expect Tidys output to be the same (other than as a result of bug fixes).
- **odd numbered minor versions** are development versions, or as is considered in many contexts _bleeding edge_ or _next_ versions. HTACG do not provide binaries, and API documentation is not usually up to date, but you do have access to the latest bug fixes, newest features, and knowledge of where Tidy is going. The downside, though, is that we make absolutely no guarantees that:
- Output remains the same as in previous release versions.
- Output remains the same as in earlier patch versions in the same development series.
- Configuration options may be added.
- Configuration options may be deleted.
- Parts of the C API may be added or deleted without warning.
- In short, development versions are bleeding edge and likely to be unstable (in the API sense -- we try never commit code that will crash).
### Patch
The patch indicates the latest version of the current minor version number. As with minor version numbers, there is some variation in their meaning:
- **patches for even numbered minor versions** indicate whether the latest supported update for that released version. In general, we hope that the patch number remains at 0 forever, e.g., 5.2.0. However if we do backport one or more bug fixes, you might see, e.g., 5.2.1.
- **patches for odd numbered minor versions** indicate our progress in committing code that changes some aspect of **HTML Tidy**s operation, such as the output it generates or a bug fix. Because **git** offers robust commit logging of its own, we won't generally bump the patch number for things such as documenting code, converting tabs to space, or even simplifying some piece of logic. Youll see that lots of good things are happening if the patch number is really large, e.g., 5.7.289!
## Development
### The `version.txt` File
The **libTidy** version is controlled by the contents of `version.txt` in the root.
This file consists of two lines of dot (.) separated items. The first being the **major**, **minor**, and **patch** version values, and the second string is a date. Example:
~~~
5.3.15
2017.01.29
~~~
When **CMake** is run, this file is read and two macros are added to the compile flags:
~~~
add_definitions ( -DLIBTIDY_VERSION="${LIBTIDY_VERSION}" )
add_definitions ( -DRELEASE_DATE="${tidy_YEAR}/${tidy_MONTH}/${tidy_DAY}" )
~~~
And in `CMakeLists.txt` there is the possibility to define another macro, when and if required:
~~~
# add_definitions ( -DRC_NUMBER="D231" )
~~~
These macros are put in `static const char` strings in **libTidys**s internal- only `src/version.h` file:
~~~
static const char TY_(release_date)[] = RELEASE_DATE;
#ifdef RC_NUMBER
static const char TY_(library_version)[] = LIBTIDY_VERSION "." RC_NUMBER;
#else
static const char TY_(library_version)[] = LIBTIDY_VERSION;
#endif
~~~
These strings are returned respectively by the **libTidy** API functions:
~~~
TIDY_EXPORT ctmbstr TIDY_CALL tidyLibraryVersion(void);
TIDY_EXPORT ctmbstr TIDY_CALL tidyReleaseDate(void);
~~~
### Git branches
Starting with HTML Tidy 5.4.0 release, our branching scheme aligns nicely with our version numbering scheme. Please consult [BRANCHES.md](BRANCHES.md).
Updated: 20170210
Date: 20150904