209 lines
8.4 KiB
Plaintext
209 lines
8.4 KiB
Plaintext
/*!
|
||
|
||
\mainpage Tidy home
|
||
|
||
\note The repository <a href="https://github.com/htacg/tidy-html5">github.com/htacg/tidy-html5</a> and this documentation should be considered canonical for HTML Tidy as of 2015-January-15.
|
||
|
||
<h4>What is tidy ?</h4>
|
||
|
||
- \b `tidy`
|
||
- is a console application for Mac OS X, Linux, Windows, UNIX, and more.
|
||
- It corrects and cleans up HTML and XML documents by fixing markup errors and upgrading legacy code to modern standards.
|
||
- \b `tidylib`
|
||
- is a C static or dynamic library that developers can integrate into their applications
|
||
in order to bring all of Tidy’s power to your favorite tools.
|
||
- `tidylib` is used today in desktop applications, web servers, and more.
|
||
|
||
\section content Contents
|
||
|
||
- \ref tidy5_cmd
|
||
- \ref tidy5_lib
|
||
- \ref building_tidy
|
||
- \ref history
|
||
|
||
|
||
|
||
|
||
\page tidy5_cmd `tidy5` command
|
||
|
||
<pre>
|
||
\htmlinclude tidy5.cmd.txt
|
||
</pre>
|
||
|
||
\page TidyLib TidyLib
|
||
|
||
- \b TidyLib - is easy to integrate. Because of the near universal adoption of C linkage, a C interface may be called from a great number of programming languages.
|
||
|
||
- \b TidyLib - is designed to use opaque types in the public interface. This allows the application to just pass an integer around and the need to transform data types in different languages is minimized. As a results it’s straight-forward to write very thin library wrappers for C++, Pascal, and COM/ATL.
|
||
|
||
- \b TidyLib - eats its own dogfood. HTML Tidy links directly to TidyLib.
|
||
|
||
- \b TidyLib - is Thread Safe and Re-entrant. Because there are many uses for HTML Tidy - from content validation, content scraping, conversion to XHTML - it was important to make TidyLib run reasonably well within server applications as well as client side.
|
||
|
||
- \b TidyLib - uses adaptable I/O. As part of the larger integration strategy it was decided to fully abstract all I/O. This means a (relatively) clean separation between character encoding processing and shovelling bytes back and forth. Internally, the library reads from sources and writes to sinks. This abstraction is used for both markup and configuration “files”. Concrete implementations are provided for file and memory I/O, and new sources and sinks may be provided via the public interface.
|
||
|
||
\section example_hello Example
|
||
\code{.c}
|
||
#include <tidy.h>;
|
||
#include <buffio.h>;
|
||
#include <stdio.h>;
|
||
#include <errno.h>;
|
||
|
||
int main(int argc, char **argv )
|
||
{
|
||
const char* input = "<title>Hello</title><p>World!";
|
||
TidyBuffer output = {0};
|
||
TidyBuffer errbuf = {0};
|
||
int rc = -1;
|
||
Bool ok;
|
||
|
||
TidyDoc tdoc = tidyCreate(); // Initialize "document"
|
||
printf( "Tidying:\t%s\n", input );
|
||
|
||
ok = tidyOptSetBool( tdoc, TidyXhtmlOut, yes ); // Convert to XHTML
|
||
if ( ok )
|
||
rc = tidySetErrorBuffer( tdoc, &errbuf ); // Capture diagnostics
|
||
if ( rc >= 0 )
|
||
rc = tidyParseString( tdoc, input ); // Parse the input
|
||
if ( rc >= 0 )
|
||
rc = tidyCleanAndRepair( tdoc ); // Tidy it up!
|
||
if ( rc >= 0 )
|
||
rc = tidyRunDiagnostics( tdoc ); // Kvetch
|
||
if ( rc > 1 ) // If error, force output.
|
||
rc = ( tidyOptSetBool(tdoc, TidyForceOutput, yes) ? rc : -1 );
|
||
if ( rc >= 0 )
|
||
rc = tidySaveBuffer( tdoc, &output ); // Pretty Print
|
||
|
||
if ( rc >= 0 )
|
||
{
|
||
if ( rc > 0 )
|
||
printf( "\nDiagnostics:\n\n%s", errbuf.bp );
|
||
printf( "\nAnd here is the result:\n\n%s", output.bp );
|
||
}
|
||
else
|
||
printf( "A severe error (%d) occurred.\n", rc );
|
||
|
||
tidyBufFree( &output );
|
||
tidyBufFree( &errbuf );
|
||
tidyRelease( tdoc );
|
||
return rc;
|
||
}
|
||
\endcode
|
||
|
||
|
||
\page building_tidy Building Tidy
|
||
|
||
\section Prerequisites
|
||
|
||
- \b git - <a href="http://git-scm.com/book/en/v2/Getting-Started-Installing-Git">git-scm.com/book/en/v2/Getting-Started-Installing-Git</a>
|
||
- \b cmake - <a href="http://www.cmake.org/download/">cmake.org/download/</a>
|
||
- Appropriate build tools for the platform
|
||
|
||
CMake comes in two forms - command line and gui. Some installations only install one or the other, but sometimes both. The build
|
||
commands below are only for the command line use.
|
||
|
||
Also the actual build tools vary for each platform. But that is one of the great features of cmake, it can generate
|
||
variuous 'native' build files. Running cmake without any parameters will list the generators
|
||
available on that platform. For sure one of the common ones is "Unix Makefiles", which needs autotools
|
||
make installed, but many other generators are supported.
|
||
|
||
In windows cmake offers various versions of MSVC. Again below only the command line use of MSVC is shown, but the
|
||
tidy solution (*.sln) file can be loaded into the MSVC IDE, and the building done in there.
|
||
|
||
\section get_source Get the source code
|
||
|
||
Tidy’s sourcecode can be found at <a href="https://github.com/htacg/tidy-html5">github.com/htacg/tidy-html5</a>. There are sometimes
|
||
several branches, but in general `master` is the most recently updated version.
|
||
|
||
\note Note that as “cutting edge,” it may have bugs or other
|
||
unstable behavior. If you prefer a stable, officially released version, be sure to have a look
|
||
at Releases on the github page.
|
||
|
||
In general you can use the <b>Download ZIP</b> button on the github page to download the most recent version of a branch. If you prefer
|
||
Git then you can clone the repository to a working machine with:
|
||
|
||
|
||
\code{.sh}
|
||
git clone git@github.com:htacg/tidy-html5.git
|
||
\endcode
|
||
|
||
\section compile Compile
|
||
|
||
<h4>Enter the `build/cmake` directory</h4>
|
||
\code{.sh}
|
||
# *nix
|
||
cd {your-tidy-html5-directory}/build/cmake
|
||
|
||
# windows
|
||
cd {your-tidy-html5-directory}\build\cmake
|
||
\endcode
|
||
|
||
<h4>Configure the build</h4>
|
||
\code{.sh}
|
||
# *nix
|
||
cmake ../../ [-DCMAKE_INSTALL_PREFIX=/path/for/install]
|
||
|
||
# windows
|
||
cmake ..\..\
|
||
\endcode
|
||
By default cmake sets the install path to `/usr/local` in unix.
|
||
|
||
If you wanted the binary in say `/usr/bin` instead, then use `-DCMAKE_INSTALL_PREFIX=/usr`
|
||
|
||
On windows the default install is to `C:\Program Files\tidy5`, or `C:/Program Files (x86)/tidy5`, which is not very useful. After
|
||
the build the `tidy[n].exe` is in the `Release\` directory, and can be copied to any directory in your `PATH` environment variable, for global use.
|
||
|
||
If you need the tidy library built as a 'shared' (DLL) library, then in add the command `-DBUILD_SHARED_LIB:BOOL=ON`.
|
||
This option is `OFF` by default, so the static library is built and linked with the command line tool for convenience.
|
||
|
||
|
||
<h4>Compile</h4>
|
||
\code{.sh}
|
||
# *nix
|
||
make
|
||
|
||
# windows
|
||
cmake --build . --config Release
|
||
\endcode
|
||
|
||
<h4>Install</h4>
|
||
\code{.sh}
|
||
# *nix
|
||
[sudo] make install
|
||
|
||
# windows
|
||
cmake --build . --config Release --target INSTALL
|
||
\endcode
|
||
|
||
|
||
\page history History
|
||
|
||
- This repository originally transferred from w3c.github.com/tidy-html5.
|
||
|
||
- First moved to <a href="https://github.com/htacg/tidy-html5">Github</a> from <a href="http://tidy.sourceforge.net/">tidy.sourceforge.net</a>
|
||
|
||
|
||
<p><strong>HTML Tidy</strong> was created by the <a href="http://www.w3.org/">W3C’s</a> own <a href="http://www.w3.org/People/Raggett/">Dave Raggett</a> back in the
|
||
dawn of the Internet age. His original Internet page is still available and
|
||
gives a sense of the early history: <a href="http://www.w3.org/People/Raggett/tidy/">Clean up your Web pages with HTML TIDY</a>.</p>
|
||
|
||
<p>Satisfied with his work Dave passed the torch to a dedicated group of
|
||
maintainers at <a href="http://tidy.sourceforge.net/">tidy.sourceforge.net</a> where the important tasks of turning
|
||
<strong>Tidy</strong> into a C library and keeping up with developing standards was
|
||
performed.</p>
|
||
|
||
<p>W3C members took a renewed interest in <strong>Tidy</strong> in 2011 and forked the
|
||
project to <a href="https://github.com/w3c/tidy-html5">github</a> (now redirects to new maintainers), where it featured
|
||
compatibility with HTML5 via a <a href="https://lists.w3.org/Archives/Public/www-archive/2011Nov/0007.html">key contribution</a> from one of the SourceForge
|
||
key members.</p>
|
||
|
||
<p>In 2015 a group of concerned developers, users, and software integrators formed
|
||
<a href="http://www.htacg.org">HTACG</a> with the goal of revitalizing <strong>Tidy</strong>, which had fallen into a
|
||
non-maintained state. As a W3C Community Group, HTACG was deemed worthy by the
|
||
W3C, and W3C passed ownership of their project to HTACG, where it is currently
|
||
being developed and prepped for a new, stable, and modern release.</p>
|
||
|
||
<p>HTACG is also working diligently with the SourceForge maintainers in an effort
|
||
to harmonize <strong>HTML Tidy</strong> into a single, stable, solid release once again.</p>
|
||
|
||
*/ |