563 lines
18 KiB
HTML
563 lines
18 KiB
HTML
<!doctype html>
|
||
<meta charset=utf-8>
|
||
<title>HTML Tidy for HTML5 (experimental)</title>
|
||
<style type="text/css">
|
||
html {
|
||
background: #DDE5D9 url() repeat 0 0;
|
||
font-family: "Lucida Sans Unicode", "Lucida Sans", verdana, arial, helvetica;
|
||
}
|
||
body {
|
||
border: solid 1px #CED4CA;
|
||
background-color: #FFF;
|
||
padding: 4px 40px 40px 40px;
|
||
margin: 20px 20px 20px 20px;
|
||
padding-right: 20%;
|
||
}
|
||
h1, h2 {
|
||
color: #0B5B9D;
|
||
}
|
||
h1 {
|
||
font-size: 39px;
|
||
font-weight: normal;
|
||
vertical-align: top;
|
||
margin-bottom: 0px;
|
||
}
|
||
a {
|
||
text-decoration: none;
|
||
color: #0B5B9D;
|
||
padding: 2px;
|
||
}
|
||
|
||
a:hover {
|
||
text-decoration: none;
|
||
background-color: #0B5B9D;
|
||
color: white;
|
||
}
|
||
a:active {
|
||
text-decoration: none;
|
||
background-color: white;
|
||
color: black;
|
||
}
|
||
#toc {
|
||
position: fixed;
|
||
top: 10px;
|
||
right: 10px;
|
||
border: 2px solid #0B5B9D;
|
||
background: rgba(255,255,255,.9);
|
||
padding: 15px;
|
||
z-index: 999;
|
||
max-height: 400px;
|
||
overflow: auto;
|
||
font-size: 11px;
|
||
font-family: Verdana, sans-serif;
|
||
}
|
||
#toc-button {
|
||
position:fixed;
|
||
top:10px;
|
||
right:10px;
|
||
background:transparent;
|
||
padding:15px;
|
||
z-index:999;
|
||
max-height:400px;
|
||
overflow:auto;
|
||
font-size:11px;
|
||
font-family:Verdana, sans-serif;
|
||
}
|
||
#toc .button,
|
||
#toc-button .button {
|
||
float: right;
|
||
margin: 0 0 5px 5px;
|
||
padding: 5px;
|
||
border: 1px #008 solid;
|
||
color:#00f;
|
||
background-color:#ccf;
|
||
}
|
||
#toc ol {
|
||
margin: 0;
|
||
padding: 0;
|
||
font-size: 11px;
|
||
font-family: Verdana, sans-serif;
|
||
}
|
||
#toc li {
|
||
list-style: decimal outside;
|
||
margin-left: 20px;
|
||
font-size: 11px;
|
||
font-family: Verdana, sans-serif;
|
||
}
|
||
#toc li a {
|
||
font-size: 11px;
|
||
font-family: Verdana, sans-serif;
|
||
}
|
||
.hide {
|
||
display: none;
|
||
}
|
||
.show {
|
||
display: block;
|
||
}
|
||
code { color: green; font-weight: bold; }
|
||
pre { color: green; font-weight: bold; font-family: monospace}
|
||
em { font-style: italic; color: rgb(0, 0, 153) }
|
||
:link { color: rgb(0, 0, 153) }
|
||
:visited { color: rgb(153, 0, 153) }
|
||
</style>
|
||
|
||
<h1 id=intro>HTML Tidy for HTML5 (experimental)</h1>
|
||
<p>This page documents the experimental HTML5 fork of HTML Tidy available
|
||
at
|
||
<a href="https://github.com/w3c/tidy-html5">https://github.com/w3c/tidy-html5</a>.
|
||
|
||
<p>File bug reports and enhancement requests at
|
||
<a href="https://github.com/w3c/tidy-html5/issues">https://github.com/w3c/tidy-html5/issues</a>.</p>
|
||
|
||
<p>The W3C public mailing list for HTML Tidy discussion is
|
||
<b>html-tidy@w3.org</b> (<a href= "http://lists.w3.org/Archives/Public/html-tidy/">list archive</a>).
|
||
|
||
<p>For more information on HTML5:</p>
|
||
<ul>
|
||
<li>
|
||
<a href="http://dev.w3.org/html5/spec-author-view">HTML: Edition for Web Authors</a> (the latest HTML specification)
|
||
<li>
|
||
<a href="http://dev.w3.org/html5/markup/">HTML: The Markup Language</a> (an HTML language reference)
|
||
</ul>
|
||
<p>
|
||
Validate your HTML documents using the
|
||
<a href="http://validator.w3.org/nu/">W3C Nu Markup Validator</a>.
|
||
|
||
<h2 id=what-tidy-does>What Tidy does</h2>
|
||
<p>Tidy corrects and cleans up HTML content by fixing markup errors.
|
||
Here are a few examples:
|
||
<ul>
|
||
<li><b>Mismatched end tags:</b>
|
||
<pre>
|
||
<h2>subheading</h3>
|
||
</pre>
|
||
<p>…is converted to:</p>
|
||
<pre>
|
||
<h2>subheading</h2>
|
||
</pre></li>
|
||
<li><b>Misnested tags:</b>
|
||
<pre>
|
||
<p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
|
||
</pre>
|
||
<p>…is converted to:</p>
|
||
<pre>
|
||
<p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?
|
||
</pre></li>
|
||
<li><b>Missing end tags:</b>
|
||
<pre>
|
||
<h1>heading
|
||
<h2>subheading</h2>
|
||
</pre>
|
||
<p>…is converted to:</p>
|
||
<pre>
|
||
<h1>heading</h1>
|
||
<h2>subheading</h2>
|
||
</pre>
|
||
…and
|
||
<pre>
|
||
<h1><i>italic heading</h1>
|
||
</pre>
|
||
<p>…is converted to:</p>
|
||
<pre>
|
||
<h1><i>italic heading</i></h1>
|
||
</pre></li>
|
||
<li><b>Mixed-up tags</b>
|
||
<pre>
|
||
<i><h1>heading</h1></i>
|
||
<p>new paragraph <b>bold text
|
||
<p>some more bold text
|
||
</pre>
|
||
<p>…is converted to:</p>
|
||
<pre>
|
||
<h1><i>heading</i></h1>
|
||
<p>new paragraph <b>bold text</b>
|
||
<p><b>some more bold text</b>
|
||
</pre></li>
|
||
<li><b>Tag in the wrong place:</b>
|
||
<pre>
|
||
<h1><hr>heading</h1>
|
||
<h2>sub<hr>heading</h2>
|
||
</pre>
|
||
<p>…is converted to:</p>
|
||
<pre>
|
||
<hr>
|
||
<h1>heading</h1>
|
||
<h2>sub</h2>
|
||
<hr>
|
||
<h2>heading</h2>
|
||
</pre></li>
|
||
<li><b>Missing "/" in end tags:</b>
|
||
<pre>
|
||
<a href="#refs">References<a>
|
||
</pre>
|
||
<p>…is converted to:</p>
|
||
<pre>
|
||
<a href="#refs">References</a>
|
||
</pre></li>
|
||
<li><b>List markup with missing tags:</b>
|
||
<pre>
|
||
<body>
|
||
<li>1st list item
|
||
<li>2nd list item
|
||
</pre>
|
||
<p>…is converted to:</p>
|
||
<pre>
|
||
<body>
|
||
<ul>
|
||
<li>1st list item</li>
|
||
<li>2nd list item</li>
|
||
</ul>
|
||
</pre></li>
|
||
<li><b>Missing quotation marks around attribute values</b>
|
||
<p>Tidy inserts quotation marks around all attribute values for you. It
|
||
can also detect when you have forgotten the closing quotation mark,
|
||
although this is something you will have to fix yourself.</p>
|
||
</li>
|
||
<li><b>Unknown/proprietary attributes</b>
|
||
<p>Tidy has a comprehensive knowledge of the attributes defined in HTML5.
|
||
That often allows you to spot where you have mis-typed an attribute.
|
||
</li>
|
||
<li><b>Tags lacking a terminating ">"</b>
|
||
<p>This is something you then have to fix yourself as Tidy cannot
|
||
determine where the ">" was meant to be inserted.</p>
|
||
</li>
|
||
</ul>
|
||
|
||
<h2 id="help">How to run Tidy from the command line</h2>
|
||
<p>This is the syntax for invoking Tidy from the command line:
|
||
<pre>
|
||
<code>tidy <em>[[options] filename]*</em></code>
|
||
</pre>
|
||
<p>
|
||
Tidy defaults to reading from standard input, so if you run Tidy without
|
||
specifying the <code><em>filename</em></code> argument, it will just sit
|
||
there waiting for input to read.
|
||
And Tidy defaults to writing to standard output. So you can pipe output
|
||
from Tidy to other programs, as well as pipe output from other programs to
|
||
Tidy. You can page through the output from Tidy by piping it to a pager:</p>
|
||
<pre>
|
||
tidy file.html | less
|
||
</pre>
|
||
<p>
|
||
To have Tidy write its output to a file instead, either use the
|
||
<code>-o <em>filename</em></code> or <code>-output <em>filename</em></code>
|
||
option, or redirect standard output to the file; for example:
|
||
<pre>
|
||
tidy -o output.html index.html
|
||
tidy index.html > output.html
|
||
</pre>
|
||
<p>Both of those run tidy on the file <b>index.html</b> and write the
|
||
output to the file <b>output.html</b>, while writing any error messages to
|
||
standard error.
|
||
<p>
|
||
Tidy defaults to writing its error messages to standard error (that is, to
|
||
the console where you’re running Tidy). To page through the error messages,
|
||
along with the output, redirect standard error to standard output, and pipe
|
||
it to your pager:
|
||
<pre>
|
||
tidy index.html 2>&1 | less
|
||
</pre>
|
||
<p>
|
||
To have Tidy write the errors to a file instead, either use the
|
||
<code>-f <em>filename</em></code> or <code>-file <em>filename</em></code>
|
||
option, or redirect standard error to a file:</p>
|
||
<pre>
|
||
tidy -o output.html -f errs.txt index.html
|
||
tidy index.html > output.html 2> errs.txt
|
||
</pre>
|
||
<p>Both of those run tidy on the file <b>index.html</b> and write the
|
||
output to the file <b>output.html</b>, while writing any error messages to
|
||
the file <b>errs.txt</b>.
|
||
<p>
|
||
Writing the error messages to a file is especially useful if the file you
|
||
are checking has many errors; reading them from a file instead of the
|
||
console or pager can make it easier to review them.
|
||
<p>You can use the or <code>-m</code> or <code>-modify</code> option to
|
||
modify (in-place) the contents of the input file you are checking; that is,
|
||
to overwrite those contents with the output from Tidy. Example:
|
||
<pre>
|
||
tidy -f errs.txt -m index.html
|
||
</pre>
|
||
<p>That runs tidy on the file <b>index.html</b>, modifying it in place
|
||
and writing the error messages to the file <b>errs.txt</b>.
|
||
<p>
|
||
<b>Caution:</b> If you use the -m option, you should first save a copy of your file.
|
||
<h2 id=options>Options and configuration settings</h2>
|
||
<p>To get a list of available options, use:</p>
|
||
<pre>
|
||
tidy -help
|
||
</pre>
|
||
<p>To get a list of all configuration settings, use:</p>
|
||
<pre>
|
||
tidy -help-config
|
||
</pre>
|
||
<p>To read the help output a page at time, pipe it to a pager:
|
||
<pre>
|
||
tidy -help | less
|
||
tidy -help-config | less
|
||
</pre>
|
||
<p>Single-letter options other than -f may be combined; for example:
|
||
<pre>
|
||
tidy -f errs.txt -imu foo.html
|
||
</pre>
|
||
|
||
<h2 id="config">Using a config file</h2>
|
||
<p>The most convenient way to configure Tidy is by using separate
|
||
config file.
|
||
Assuming you have created a
|
||
Tidy config file named <b>config.txt</b> (the name doesn't matter), you can
|
||
instruct Tidy to use it via the command line option
|
||
<code>-config config.txt</code>; for example:
|
||
<pre>
|
||
tidy -config config.txt file1.html file2.html
|
||
</pre>
|
||
<p>Alternatively, you can name the default config file via the
|
||
environment variable named <b>HTML_TIDY</b>, the value of which is
|
||
the absolute path for the config file.
|
||
<p>You can also set config options on the command line by preceding
|
||
the name of the option immediately (no intervening space) with the string "<code>--</code>";
|
||
for example:</p>
|
||
<pre>
|
||
tidy --break-before-br true --show-warnings false
|
||
</pre>
|
||
<p>You can find documentation for full set of configuration options
|
||
on the
|
||
<a href= "quickref.html">Quick Reference</a>
|
||
page.
|
||
|
||
<h2 id=sample-config>Sample config file</h2>
|
||
<p>The following is an example of a Tidy config file.</p>
|
||
<pre>
|
||
// sample config file for HTML tidy
|
||
indent: auto
|
||
indent-spaces: 2
|
||
wrap: 72
|
||
markup: yes
|
||
output-xml: no
|
||
input-xml: no
|
||
show-warnings: yes
|
||
numeric-entities: yes
|
||
quote-marks: yes
|
||
quote-nbsp: yes
|
||
quote-ampersand: no
|
||
break-before-br: no
|
||
uppercase-tags: no
|
||
uppercase-attributes: no
|
||
char-encoding: latin1
|
||
new-inline-tags: cfif, cfelse, math, mroot,
|
||
mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
|
||
munder, mover, mmultiscripts, msup, msub, mtext,
|
||
mprescripts, mtable, mtr, mtd, mth
|
||
new-blocklevel-tags: cfoutput, cfquery
|
||
new-empty-tags: cfelse
|
||
</pre>
|
||
|
||
<h2 id=indenting>Indenting output for readability</h2>
|
||
<p>Indenting the source markup of an HTML document makes the markup easier
|
||
to read. Tidy can indent the markup for an HTML document while recognizing
|
||
elements whose contents should not be indented. In the example below, Tidy
|
||
indents the output while preserving the formatting of the <pre>
|
||
element:</p>
|
||
<p>Input:</p>
|
||
<pre>
|
||
<html>
|
||
<head>
|
||
<title>Test document</title>
|
||
</head>
|
||
<body>
|
||
<p>This example shows how Tidy can indent output while preserving
|
||
formatting of particular elements.</p>
|
||
|
||
<pre>This is
|
||
<em>genuine
|
||
preformatted</em>
|
||
text
|
||
</pre>
|
||
</body>
|
||
</html>
|
||
|
||
</pre>
|
||
<p>Output:</p>
|
||
<pre>
|
||
<html>
|
||
<head>
|
||
<title>Test document</title>
|
||
</head>
|
||
|
||
<body>
|
||
<p>This example shows how Tidy can indent output while preserving
|
||
formatting of particular elements.</p>
|
||
<pre>
|
||
This is
|
||
<em>genuine
|
||
preformatted</em>
|
||
text
|
||
</pre>
|
||
</body>
|
||
</html>
|
||
</pre>
|
||
<p>Tidy’s indenting behavior is not perfect and can sometimes cause your
|
||
output to be rendered by browsers in a different way than the input.
|
||
You can avoid unexpected indenting-related rendering problems by setting
|
||
<code>indent: no</code> or <code>indent: auto</code> in a config file.</p>
|
||
|
||
<h2 id=preserve-indenting>Preserving original indenting not possible</h2>
|
||
<p>Tidy is not capable of preserving the original indenting of the markup
|
||
from the input it receives. That’s because Tidy starts by building a clean
|
||
parse tree from the input, and that parse tree doesn’t contain any
|
||
information about the original indenting. Tidy then pretty-prints the parse
|
||
tree using the current config settings. Trying to preserve the original
|
||
indenting from the input would interact badly with the repair operations
|
||
needed to build a clean parse tree, and would considerably complicate the
|
||
code.</p>
|
||
|
||
<h2 id=encodings>Encodings and character references</h2>
|
||
<p>
|
||
Tidy defaults to assuming you want output to be encoded in UTF-8.
|
||
But Tidy offers you a choice of other character encodings: US ASCII, ISO
|
||
Latin-1, and the ISO 2022 family of 7 bit encodings.
|
||
<p>
|
||
Tidy doesn't yet recognize the use of the HTML <meta> element for
|
||
specifying the character encoding.</p>
|
||
<p>
|
||
The full set of HTML character references are defined. Cleaned-up output
|
||
uses named character references for characters when appropriate. Otherwise,
|
||
characters outside the normal range are output as numeric character
|
||
references.
|
||
|
||
<h2 id=accessibility>Accessibility</h2>
|
||
<p>Tidy offers advice on potential accessibility problems for people using
|
||
non-graphical browsers.
|
||
|
||
<h2 id=presentational-markup>Cleaning up presentational markup</h2>
|
||
<p>Some tools generate HTML with presentational elements such as <font>,
|
||
<nobr>, and <center>.
|
||
Tidy's <code>-clean</code> option will replace those elements with CSS style
|
||
properties.
|
||
<p>Some HTML documents rely on the presentational effects of <p> start
|
||
tags that are not followed by any content. Tidy deletes such <p> tags
|
||
(as well as any headings that don’t have content). So do not use <p>
|
||
tags simply for adding vertical whitespace; instead use CSS, or the
|
||
<br> element. However, note that Tidy won’t discard <p> tags that
|
||
are followed by any nonbreaking space (that is, the &nbsp; named
|
||
character reference).
|
||
|
||
<h2 id=new-tags>Teaching Tidy about new tags</h2>
|
||
<p>You can teach Tidy about new tags by declaring them in the
|
||
configuration file, the syntax is:</p>
|
||
<pre>
|
||
new-inline-tags: <em>tag1, tag2, tag3</em>
|
||
new-empty-tags: <em>tag1, tag2, tag3</em>
|
||
new-blocklevel-tags: <em>tag1, tag2, tag3</em>
|
||
new-pre-tags: <em>tag1, tag2, tag3</em>
|
||
</pre>
|
||
<p>The same tag can be defined as empty and as inline or as empty
|
||
and as block.</p>
|
||
<p>These declarations can be combined to define a new empty
|
||
inline or empty block element. But you are not advised to declare
|
||
tags as being both inline and block.</p>
|
||
<p>Note that the new tags can only appear where Tidy expects inline
|
||
or block-level tags respectively. That means you can’t place
|
||
new tags within the document head or other contexts with restricted
|
||
content models.
|
||
|
||
<h2 id=php-asp-jste>Ignoring PHP, ASP, and JSTE instructions</h2>
|
||
<p>Tidy will gracefully ignore many cases of PHP, ASP, and JSTE
|
||
instructions within element content and as replacements for attributes,
|
||
and preserve them as-is in output; for example:</p>
|
||
<pre>
|
||
<option <% if rsSchool.Fields("ID").Value
|
||
= session("sessSchoolID")
|
||
then Response.Write("selected") %>
|
||
value='<%=rsSchool.Fields("ID").Value%>'>
|
||
<%=rsSchool.Fields("Name").Value%>
|
||
(<%=rsSchool.Fields("ID").Value%>)
|
||
</option>
|
||
</pre>
|
||
<p>But note that Tidy may report missing attributes when those are “hidden”
|
||
within the PHP, ASP, or JSTE code. If you use PHP, ASP, or JSTE code to
|
||
create a start tag, but place the end tag explicitly in the HTML markup, Tidy
|
||
won’t be able to match them up, and will delete the end tag. So in that
|
||
case you are advised to make the start tag explicit and to use PHP, ASP, or
|
||
JSTE code for just the attributes; for example:</p>
|
||
<pre>
|
||
<a href="<%=random.site()%>">do you feel lucky?</a>
|
||
</pre>
|
||
<p>
|
||
Tidy can also get things wrong if the PHP, ASP, or JSTE code includes
|
||
quotation marks; for example:
|
||
</p>
|
||
<pre>
|
||
value="<%=rsSchool.Fields("ID").Value%>"
|
||
</pre>
|
||
<p>Tidy will see the quotation mark preceding <i>ID</i> as ending the
|
||
attribute value, and proceed to complain about what follows.
|
||
<p>Tidy allows you to control whether line wrapping on spaces within
|
||
PHP, ASP, and JSTE
|
||
instructions is enabled; see the <b>wrap-php</b>, <b>wrap-asp</b>,
|
||
and <b>wrap-jste</b> config options.</p>
|
||
|
||
<h2 id=xml>Correcting well-formedness errors in XML markup</h2>
|
||
<p>Tidy can help you to correct well-formedness errors in XML markup. Tidy
|
||
doesn't yet recognize all XML features, though; for example, it doesn't
|
||
understand CDATA sections or DTD subsets.</p>
|
||
|
||
<h2 id="scripts">Using Tidy from scripts</h2>
|
||
<p>If you want to run Tidy from a Perl or other scripting language
|
||
you may find it of value to inspect the result returned by Tidy
|
||
when it exits: 0 if everything is fine, 1 if there were warnings
|
||
and 2 if there were errors. This is an example using Perl:</p>
|
||
<pre>
|
||
if (close(TIDY) == 0) {
|
||
my $exitcode = $? >> 8;
|
||
if ($exitcode == 1) {
|
||
printf STDERR "tidy issued warning messages\n";
|
||
} elsif ($exitcode == 2) {
|
||
printf STDERR "tidy issued error messages\n";
|
||
} else {
|
||
die "tidy exited with code: $exitcode\n";
|
||
}
|
||
} else {
|
||
printf STDERR "tidy detected no errors\n";
|
||
}
|
||
</pre>
|
||
|
||
<h2 id="implementation">Source code</h2>
|
||
<p>The source code for the experimental HTML5 fork of Tidy can be found at
|
||
<a href="https://github.com/w3c/tidy-html5">https://github.com/w3c/tidy-html5</a>.
|
||
|
||
<h2 id=acks>Acknowledgements</h2>
|
||
<p>Dave Raggett has a list of
|
||
<a href="http://www.w3.org/People/Raggett/tidy/#acks">Acknowledgements</a>
|
||
for people who made suggestions or reported bugs for the
|
||
original version of Tidy.
|
||
|
||
<div id=toc-button style="">
|
||
<a class=button href="
|
||
javascript:document.getElementById('toc').className = 'show';
|
||
document.getElementById('toc-button').className = 'hide';">Show TOC</a>
|
||
</div>
|
||
<div id=toc class=hide>
|
||
<a class=button href="
|
||
javascript:document.getElementById('toc').className = 'hide';
|
||
document.getElementById('toc-button').className = 'show';">Close</a>
|
||
<ol>
|
||
<li><a href="#what-tidy-does">What Tidy does</a>
|
||
<li><a href="#help">How to run Tidy from the command line</a>
|
||
<li><a href="#options">Options and configuration settings</a>
|
||
<li><a href="#config">Using a config file</a>
|
||
<li><a href="#sample-config">Sample config file</a>
|
||
<li><a href="#indenting">Indenting output for readability</a>
|
||
<li><a href="#preserve-indenting">Preserving original indenting not possible</a>
|
||
<li><a href="#encodings">Encodings and character references</a>
|
||
<li><a href="#accessibility">Accessibility</a>
|
||
<li><a href="#presentational-markup">Cleaning up presentational markup</a>
|
||
<li><a href="#new-tags">Teaching Tidy about new tags</a>
|
||
<li><a href="#php-asp-jste">Ignoring PHP, ASP, and JSTE instructions</a>
|
||
<li><a href="#xml">Correcting well-formedness errors in XML markup</a>
|
||
<li><a href="#scripts">Using Tidy from scripts</a>
|
||
<li><a href="#implementation">Source code</a>
|
||
<li><a href="#acks">Acknowledgements</a>
|
||
</ol>
|
||
</div>
|