1547 lines
52 KiB
HTML
1547 lines
52 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org" />
|
|
<title>Clean up your Web pages with HTML TIDY</title>
|
|
<meta name="keywords"
|
|
content="HTML, validation, error correction, pretty-printing" />
|
|
<meta name="author" content="Dave Raggett <dsr@w3.org>" />
|
|
<style type="text/css">
|
|
body {
|
|
margin-left: 10%;
|
|
margin-right: 10%;
|
|
font-family: sans-serif
|
|
}
|
|
h1 { margin-left: -8% }
|
|
h2,h3 { margin-left: -4% }
|
|
pre { color: green; font-weight: bold; font-family: monospace}
|
|
em { font-style: italic; color: rgb(0, 0, 153) }
|
|
strong { text-transform: uppercase; font-weight: bold }
|
|
.note {font-style: italic; color: rgb(192, 101, 101) }
|
|
//hr {text-align: center; width: 60% }
|
|
blockquote {
|
|
color: navy;
|
|
font-family: "Comic Sans MS", "Times New Roman", serif
|
|
}
|
|
blockquote.people { text-align: center; }
|
|
p.splash { color: maroon}
|
|
div h4 {margin-left 3%}
|
|
div p {margin-left: 5%}
|
|
table {
|
|
font-family: sans-serif;
|
|
font-size: 80%;
|
|
background: rgb(255,255,153)
|
|
}
|
|
td {
|
|
font-size: 80%
|
|
}
|
|
.people {font-family: "Lucida Calligraphy", serif}
|
|
:link { color: rgb(0, 0, 153) }
|
|
:visited { color: rgb(153, 0, 153) }
|
|
:active { color: rgb(255, 0, 102) }
|
|
a :hover { color: rgb(0, 0, 255) }
|
|
</style>
|
|
|
|
<style type="text/css">
|
|
blockquote.c9 {font-style: italic}
|
|
span.c8 {color: maroon}
|
|
p.c7 {font-style: italic}
|
|
a.c6 {font-weight: bold}
|
|
div.c5 {text-align: center}
|
|
hr.c4 {text-align: center}
|
|
p.c3 {text-align: center}
|
|
p.c2 {font-weight: bold; text-align: center}
|
|
h1.c1 {text-align: center}
|
|
</style>
|
|
|
|
<style type="text/css">
|
|
p.c1 {font-weight: bold}
|
|
</style>
|
|
</head>
|
|
<body bgcolor="#FFFFFF" background="grid.gif" text="black"
|
|
link="navy" vlink="black" alink="red">
|
|
<h1 class="c1"><img src="tidy.gif" width="32" height="32"
|
|
align="top" alt="icon" /> Clean up your Web pages<br />
|
|
with HTML TIDY</h1>
|
|
|
|
<p class="c2">This version 4th August 2000</p>
|
|
|
|
<p class="c3"><small>Copyright © 1998-2000 <a
|
|
href="http://www.w3.org/">W3C</a>, see <a
|
|
href="tidy.c">tidy.c</a> for copyright notice.</small></p>
|
|
|
|
<blockquote>With many thanks to <a
|
|
href="http://www.hp.com/">Hewlett Packard</a> for financial
|
|
support during the development of this software!</blockquote>
|
|
|
|
<hr width="80%" class="c4" />
|
|
<p class="c3"><a href="#help">How to use Tidy</a> | <a
|
|
href="#download">Downloading Tidy</a> | <a
|
|
href="release-notes.html">Release Notes</a><br />
|
|
<a href="#quotes">Integration with other Software</a> | <a
|
|
href="#acks">Acknowledgements</a></p>
|
|
|
|
<hr width="80%" class="c4" />
|
|
<p>To get the latest version of Tidy please visit the original
|
|
version of this page at: <a
|
|
href="http://www.w3.org/People/Raggett/tidy/">http://www.w3.org/People/Raggett/tidy/</a>.
|
|
Courtesy of Netmind, you can register for email reminders when
|
|
new versions of tidy become available.</p>
|
|
|
|
<form method="get"
|
|
action="http://www.netmind.com/cgi-bin/uncgi/url-mind">
|
|
<div class="c5"><input type="submit"
|
|
value="Press Here to Register" /></div>
|
|
</form>
|
|
|
|
<p>The public email list devoted to HTML Tidy is: <<a
|
|
href="mailto:html-tidy@w3.org">html-tidy@w3.org</a>>. To
|
|
subscribe send an email to html-tidy-request@w3.org with the word
|
|
subscribe in the subject line (include the word unsubscribe if
|
|
you want to unsubscribe). The <a
|
|
href="http://lists.w3.org/Archives/Public/html-tidy/">archive</a>
|
|
for this list is accessible online. Please use this list to
|
|
report errors or enhancement requests. See the <a
|
|
href="release-notes.html" class="c6">release notes</a> for
|
|
information on recent changes. Your feedback is welcome!</p>
|
|
|
|
<p>If you find HTML Tidy useful and you would like to say thanks,
|
|
then please send me a (paper) postcard or other souvenir from the
|
|
area in which you live along with a few words on what you are
|
|
using Tidy for. It will be fun to map out where Tidy users are to
|
|
be found! My <a href="#address">postal address</a> is given at
|
|
the end of this file.</p>
|
|
|
|
<h3>Tutorials for HTML and CSS</h3>
|
|
|
|
<p>If you are just starting off and would like to know more about
|
|
how to author Web pages, you may find my <a
|
|
href="http://www.w3.org/MarkUp/Guide/">guide to HTML and CSS</a>
|
|
helpful. Please send me feedback on this, and I will do my best
|
|
to further improve it.</p>
|
|
|
|
<h4>Support for Word2000</h4>
|
|
|
|
<p>Tidy can now perform wonders on HTML saved from Microsoft Word
|
|
2000! Word bulks out HTML files with stuff for round-tripping
|
|
presentation between HTML and Word. If you are more concerned
|
|
about using HTML on the Web, check out Tidy's "<a
|
|
href="#word2000">Word-2000"</a> config option! Of course Tidy
|
|
does a good job on Word'97 files as well!</p>
|
|
|
|
<h3>Introduction to TIDY</h3>
|
|
|
|
<p>When editing HTML it's easy to make mistakes. Wouldn't it be
|
|
nice if there was a simple way to fix these mistakes
|
|
automatically and tidy up sloppy editing into nicely layed out
|
|
markup? Well now there is! Dave Raggett's HTML TIDY is a free
|
|
utility for doing just that. It also works great on the
|
|
atrociously hard to read markup generated by specialized HTML
|
|
editors and conversion tools, and can help you identify where you
|
|
need to pay further attention on making your pages more
|
|
accessible to people with disabilities.</p>
|
|
|
|
<p>Tidy is able to fix up a wide range of problems and to bring
|
|
to your attention things that you need to work on yourself. Each
|
|
item found is listed with the line number and column so that you
|
|
can see where the problem lies in your markup. Tidy won't
|
|
generate a cleaned up version when there are problems that it
|
|
can't be sure of how to handle. These are logged as "errors"
|
|
rather than "warnings".</p>
|
|
|
|
<p class="c7">Tidy features in a <a
|
|
href="http://webreview.com/wr/pub/1999/07/16/feature/index.html">recent
|
|
article on XHTML</a> by webreview.com.</p>
|
|
|
|
<!-- is the final "index.html" needed or appropriate? -->
|
|
<h3>Examples of TIDY at work</h3>
|
|
|
|
<p>Tidy corrects the markup in a way that matches where possible
|
|
the observed rendering in popular browsers from Netscape and
|
|
Microsoft. Here are just a few examples of how TIDY perfects your
|
|
HTML for you:</p>
|
|
|
|
<ul>
|
|
<li><b>Missing or mismatched end tags are detected and
|
|
corrected</b>
|
|
|
|
<pre>
|
|
<h1>heading
|
|
<h2>subheading</h3>
|
|
</pre>
|
|
|
|
<p>is mapped to</p>
|
|
|
|
<pre>
|
|
<h1>heading</h1>
|
|
<h2>subheading</h2>
|
|
</pre>
|
|
</li>
|
|
|
|
<li><b>End tags in the wrong order are corrected:</b>
|
|
|
|
<pre>
|
|
<p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
|
|
</pre>
|
|
|
|
<p>is mapped to</p>
|
|
|
|
<pre>
|
|
<p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?
|
|
</pre>
|
|
</li>
|
|
|
|
<li><b>Fixes problems with heading emphasis</b>
|
|
|
|
<pre>
|
|
<h1><i>italic heading</h1>
|
|
<p>new paragraph
|
|
</pre>
|
|
|
|
<p>In Netscape and Internet Explorer this causes everything
|
|
following the heading to be in the heading font size, not the
|
|
desired effect at all!</p>
|
|
|
|
<p>Tidy maps the example to</p>
|
|
|
|
<pre>
|
|
<h1><i>italic heading</i></h1>
|
|
<p>new paragraph
|
|
</pre>
|
|
</li>
|
|
|
|
<li><b>Recovers from mixed up tags</b>
|
|
|
|
<pre>
|
|
<i><h1>heading</h1></i>
|
|
<p>new paragraph <b>bold text
|
|
<p>some more bold text
|
|
</pre>
|
|
|
|
<p>Tidy maps this to</p>
|
|
|
|
<pre>
|
|
<h1><i>heading</i></h1>
|
|
<p>new paragraph <b>bold text</b>
|
|
<p><b>some more bold text</b>
|
|
</pre>
|
|
</li>
|
|
|
|
<li><b>Getting the <hr> in the right place:</b>
|
|
|
|
<pre>
|
|
<h1><hr>heading</h1>
|
|
<h2>sub<hr>heading</h2>
|
|
</pre>
|
|
|
|
<p>Tidy maps this to</p>
|
|
|
|
<pre>
|
|
<hr>
|
|
<h1>heading</h1>
|
|
<h2>sub</h2>
|
|
<hr>
|
|
<h2>heading</h2>
|
|
</pre>
|
|
</li>
|
|
|
|
<li><b>Adding the missing "/" in end tags for anchors:</b>
|
|
|
|
<pre>
|
|
<a href="#refs">References<a>
|
|
</pre>
|
|
|
|
<p>Tidy maps this to</p>
|
|
|
|
<pre>
|
|
<a href="#refs">References</a>
|
|
</pre>
|
|
</li>
|
|
|
|
<li><b>Perfecting lists by putting in tags missed out:</b>
|
|
|
|
<pre>
|
|
<body>
|
|
<li>1st list item
|
|
<li>2nd list item
|
|
</pre>
|
|
|
|
<p>is mapped to</p>
|
|
|
|
<pre>
|
|
<body>
|
|
<ul>
|
|
<li>1st list item</li>
|
|
<li>2nd list item</li>
|
|
</ul>
|
|
</pre>
|
|
</li>
|
|
|
|
<li><b>Missing quotes around attribute values are added</b>
|
|
|
|
<p>Tidy inserts quote marks around all attribute values for you.
|
|
It can also detect when you have forgotten the closing quote
|
|
mark, although this is something you will have to fix
|
|
yourself.</p>
|
|
</li>
|
|
|
|
<li><b>Unknown/Proprietary attributes are reported</b>
|
|
|
|
<p>Tidy has a comprehensive knowledge of the attributes defined
|
|
in the HTML 4.0 recommendation from W3C. This often allows you to
|
|
spot where you have mistyped an attribute or value.</p>
|
|
</li>
|
|
|
|
<li><b>Proprietary elements are recognized and reported as
|
|
such.</b>
|
|
|
|
<p>Tidy will even work out which version of HTML you are using
|
|
and insert the appropriate DOCTYPE element, as per the W3C
|
|
recommendations.</p>
|
|
</li>
|
|
|
|
<li><b>Tags lacking a terminating '>' are spotted</b>
|
|
|
|
<p>This is something you then have to fix yourself as Tidy is
|
|
unsure of where the > should be inserted.</p>
|
|
</li>
|
|
</ul>
|
|
|
|
<h3>Layout style</h3>
|
|
|
|
<p>You can choose which style you want Tidy to use when it
|
|
generates the cleaned up markup: for instance whether you like
|
|
elements to indent their contents or not. Several people have
|
|
asked if Tidy could preserve the original layout. I am sorry to
|
|
say that this would be very hard to support due to the way Tidy
|
|
is implemented. Tidy starts by building a clean parse tree from
|
|
the source file. The parse tree doesn't contain any information
|
|
about the original layout. Tidy then pretty prints the parse tree
|
|
using the current layout options. Trying to preserve the original
|
|
layout would interact badly with the repair operations needed to
|
|
build a clean parse tree and considerably complicate the
|
|
code.</p>
|
|
|
|
<p>Some browsers can screw up the right alignment of text
|
|
depending on how you layout headings. As an example,
|
|
consider:</p>
|
|
|
|
<pre>
|
|
<h1 align="right">
|
|
Heading
|
|
</h1>
|
|
|
|
<h1 align="right">Heading</h1>
|
|
</pre>
|
|
|
|
<p>Both of these should be rendered the same. Sadly a common
|
|
browser bug fails to trim trailing whitespace and misaligns the
|
|
first heading. HTML Tidy will protect you from this bug, except
|
|
when you set the indent option to "yes".</p>
|
|
|
|
<p>Setting the indent option to yes can also cause problems with
|
|
table layout for some browsers:</p>
|
|
|
|
<pre>
|
|
<td><img src="foo.gif"></td>
|
|
<td><img src="foo.gif"></td>
|
|
</pre>
|
|
|
|
<p>will look slightly different from:</p>
|
|
|
|
<pre>
|
|
<td>
|
|
<img src="foo.gif">
|
|
</td>
|
|
<td>
|
|
<img src="foo.gif">
|
|
</td>
|
|
</pre>
|
|
|
|
<p>You can avoid such quirks by using indent: no or
|
|
indent: auto in the config file.</p>
|
|
|
|
<h3>Internationalization issues</h3>
|
|
|
|
<p>Tidy offers you a choice of character encodings: US ASCII, ISO
|
|
Latin-1, UTF-8 and the ISO 2022 family of 7 bit encodings. The
|
|
full set of HTML 4.0 entities are defined. Cleaned up output uses
|
|
HTML entity names for characters when appropriate. Otherwise
|
|
characters outside the normal range are output as numeric
|
|
character entities. Tidy defaults to assuming you want the output
|
|
to be in US ASCII. Tidy doesn't yet recognize the use of the HTML
|
|
meta element for specifying the character encoding.</p>
|
|
|
|
<h3>Accessibility</h3>
|
|
|
|
<p>Tidy offers advice on accessibility problems for people using
|
|
non-graphical browsers. The most common thing you will see is the
|
|
suggestion you add a summary attribute to table elements. The
|
|
idea is to provide a summary of the table's role and structure
|
|
suitable for use with aural browsers.</p>
|
|
|
|
<h3>Cleaning up presentational markup</h3>
|
|
|
|
<p>Many tools generate HTML with an excess of FONT, NOBR and
|
|
CENTER tags. Tidy's <em>-clean</em> option will replace them by
|
|
style properties and rules using CSS. This makes the markup
|
|
easier to read and maintain as well as reducing the file size!
|
|
Tidy is expected to get smarter at this in the future.</p>
|
|
|
|
<p>Some pages rely on the presentation effects of isolated
|
|
<p> or </p> tags.Tidy deletes empty paragraph and
|
|
heading elements etc. The use of empty paragraph elements is not
|
|
recommended for adding vertical whitespace. Instead use style
|
|
sheets, or the <br> element. Tidy won't discard paragraphs
|
|
only containing a nonbreaking space &nbsp;</p>
|
|
|
|
<h3>Teaching Tidy about new tags!</h3>
|
|
|
|
<p>You can teach Tidy about new tags by declaring them in the
|
|
configuration file, the syntax is:</p>
|
|
|
|
<pre>
|
|
new-inline-tags: <em>tag1, tag2, tag3</em>
|
|
new-empty-tags: <em>tag1, tag2, tag3</em>
|
|
new-blocklevel-tags: <em>tag1, tag2, tag3</em>
|
|
new-pre-tags: <em>tag1, tag2, tag3</em>
|
|
</pre>
|
|
|
|
<p>The same tag can be defined as empty and as inline or as empty
|
|
and as block.</p>
|
|
|
|
<p>These declarations can be combined to define an a new empty
|
|
inline or empty block element, but you are not advised to declare
|
|
tags as being both inline and block!</p>
|
|
|
|
<p>Note that the new tags can only appear where Tidy expects
|
|
inline or block-level tags respectively. This means you can't
|
|
(yet) place new tags within the document head or other contexts
|
|
with restricted content models. So far the most popular use of
|
|
this feature is to allow Tidy to be applied to Cold Fusion
|
|
files.</p>
|
|
|
|
<p class="c7">I am working on ways to make it easy to customize
|
|
the permitted document syntax using <a
|
|
href="http://www.w3.org/People/Raggett/dtdgen/Docs/">assertion
|
|
grammars</a>, and hope to apply this to a much smarter version of
|
|
Tidy for release later this year or early next year.</p>
|
|
|
|
<h3>Limited support for ASP, JSTE and PHP</h3>
|
|
|
|
<p>Tidy is somewhat aware of the preprocessing language called
|
|
ASP which uses a pseudo element syntax <% ... %>
|
|
to include preprocessor directives. ASP is normally interpreted
|
|
by the web server before delivery to the browser. JSTE shares the
|
|
same syntax, but sometimes also uses <# ... #>.
|
|
Tidy can also cope with another such language called PHP, which
|
|
uses the syntax <?php ... ?></p>
|
|
|
|
<p>Tidy will cope with ASP, JSTE and PHP pseudo elements within
|
|
element content and as replacements for attributes, for
|
|
example:</p>
|
|
|
|
<pre>
|
|
<option <% if rsSchool.Fields("ID").Value
|
|
= session("sessSchoolID")
|
|
then Response.Write("selected") %>
|
|
value='<%=rsSchool.Fields("ID").Value%>'>
|
|
<%=rsSchool.Fields("Name").Value%>
|
|
(<%=rsSchool.Fields("ID").Value%>)
|
|
</option>
|
|
</pre>
|
|
|
|
<p>Note that Tidy doesn't understand the scripting language used
|
|
within pseudo elements and attributes, and can easily get
|
|
confused. Tidy may report missing attributes when these are
|
|
hidden within preprocessor code. Tidy can also get things wrong
|
|
if the code includes quote marks, e.g. if the example above is
|
|
changed to:</p>
|
|
|
|
<pre>
|
|
value="<%=rsSchool.Fields("ID").Value%>"
|
|
</pre>
|
|
|
|
<p>Tidy will now see the quote mark preceding ID as ending the
|
|
attribute value, and proceed to complain about what follows. Note
|
|
you can choose whether to allow line wrapping on spaces within
|
|
pseudo elements or not using the <tt>wrap-asp</tt> option. If you
|
|
used ASP, JSTE or PHP to create a start tag, but placed the end
|
|
tag explicitly in the markup, Tidy won't be able to match them
|
|
up, and will delete the end tag for you. So in this case you are
|
|
advise to make the start tag explicit and to use ASP, JSTE or PHP
|
|
for just the attributes, e.g.</p>
|
|
|
|
<pre>
|
|
<a href="<%=random.site()%>">do you feel lucky?</a>
|
|
</pre>
|
|
|
|
<p>Tidy allows you to control whether line wrapping is enabled
|
|
for ASP, JSTE and PHP instructions, see the wrap-asp, wrap-jste
|
|
and wrap-php config options, respectively.</p>
|
|
|
|
<p>I regret that Tidy does <b>not</b> support Tango preprocessing
|
|
instructions which look like:</p>
|
|
|
|
<pre>
|
|
<@if variable_1='a'>
|
|
do something
|
|
<@else>
|
|
do nothing
|
|
</@if>
|
|
|
|
<@include <@cgi><@appfilepath>includes/message.html>
|
|
</pre>
|
|
|
|
<p>Tidy supports another preprocessing syntax called "Tango", but
|
|
only for attribute values. Adding support for pseudo elements
|
|
written in Tango looks as if it would be quite tough, so I would
|
|
like to gauge the level of interest before committing to this
|
|
work.</p>
|
|
|
|
<h3>Limited support for XML</h3>
|
|
|
|
<p>XML processors compliant with W3C's XML 1.0 recommendation are
|
|
very picky about which files they will accept. Tidy can help you
|
|
to fix errors that cause your XML files to be rejected. Tidy
|
|
doesn't yet recognize all XML features though, e.g. it doesn't
|
|
understand CDATA sections or DTD subsets.</p>
|
|
|
|
<h3>Creating Slides</h3>
|
|
|
|
<p>The <em>-slides</em> option allows you to burst a single HTML
|
|
file into a number of linked slides. Each H2 element in the input
|
|
file is treated as delimiting the start of the next slide. The
|
|
slides are named slide1.html, slide2.html, slide3.html etc. This
|
|
is a relatively new feature and ideas are welcomed as to how to
|
|
improve it. In particular, I plan to add support to the
|
|
configuration file for setting the style sheet for slides and for
|
|
customizing the slides via a template.</p>
|
|
|
|
<p>I would be interested in hearing from anyone who can offer
|
|
help with using JavaScript for adding dynamic effects to slides,
|
|
for instance similar to those available in Microsoft
|
|
PowerPoint.</p>
|
|
|
|
<h3>Indenting text for a better layout</h3>
|
|
|
|
<p>Indenting the content of elements makes the markup easier to
|
|
read. Tidy can do this for all elements or just for those where
|
|
it's needed. The auto-indent mode has been used below to avoid
|
|
indenting the content of title, p and li elements:</p>
|
|
|
|
<pre>
|
|
<html>
|
|
<head>
|
|
<title>Test document</title>
|
|
</head>
|
|
|
|
<body>
|
|
<p>para which has enough text to cause a line break,
|
|
and so test the wrapping mechanism for long lines.</p>
|
|
<pre>
|
|
This is
|
|
<em>genuine
|
|
preformatted</em>
|
|
text
|
|
</pre>
|
|
|
|
<ul>
|
|
<li>1st list item</li>
|
|
|
|
<li>2nd list item</li>
|
|
</ul>
|
|
<!-- end comment -->
|
|
</body>
|
|
</html>
|
|
</pre>
|
|
|
|
<p>Indenting the content does increase the size of the file, so
|
|
you may prefer Tidy's default style:</p>
|
|
|
|
<pre>
|
|
<html>
|
|
<head>
|
|
<title>Test document</title>
|
|
</head>
|
|
<body>
|
|
<p>para which has enough text to cause a line break,
|
|
and so test the wrapping mechanism for long lines.</p>
|
|
|
|
<pre>This is
|
|
<em>genuine
|
|
preformatted</em>
|
|
text
|
|
</pre>
|
|
|
|
<ul>
|
|
<li>1st list item </li>
|
|
|
|
<li>2nd list item</li>
|
|
</ul>
|
|
|
|
<!-- end comment -->
|
|
</body>
|
|
</html>
|
|
|
|
</pre>
|
|
|
|
<h3><a id="help" name="help">How to run tidy</a></h3>
|
|
|
|
<pre>
|
|
<span class="c8">tidy</span> <em>[[options] filename]*</em>
|
|
</pre>
|
|
|
|
<p>HTML tidy is not (yet) a Windows program. If you run tidy
|
|
without any arguments, it will just sit there waiting to read
|
|
markup on the stdin stream. Tidy's input and output default to
|
|
stdin and stdout respectively. Errors are written to stderr but
|
|
can be redirected to a file with the -f <em>filename</em>
|
|
option.</p>
|
|
|
|
<p>I generally use the -m option to get tidy to update the
|
|
original file, and if the file is particularly bad I also use the
|
|
-f option to write the errors to a file to make it easier to
|
|
review them. Tidy supports a small set of character encoding
|
|
options. The default is ASCII, which makes it easy to edit markup
|
|
in regular text editors.</p>
|
|
|
|
<p>For instance:</p>
|
|
|
|
<pre>
|
|
tidy -f errs.txt -m index.html
|
|
</pre>
|
|
|
|
<p>which runs tidy on the file "index.html" updating it in place
|
|
and writing the error messages to the file "errs.txt". Its a good
|
|
idea to save your work before tidying it, as with all complex
|
|
software, tidy may have bugs. If you find any please let me
|
|
know!</p>
|
|
|
|
<p>Thanks to Jacek Niedziela, The Win32 executable for tidy is
|
|
now able to example wild cards in filenames. This utilizes the
|
|
setargv library supplied with VC++.</p>
|
|
|
|
<p>Tidy writes errors to stderr, and won't be paused by the more
|
|
command. A work around is to redirect stderr to stdout as
|
|
follows. This works on Unix and Windows NT, but not on other
|
|
platforms. My thanks to Markus Wolf for this tip!</p>
|
|
|
|
<pre>
|
|
tidy file.html 2>&1 | more
|
|
</pre>
|
|
|
|
<h4>Tidy's Options</h4>
|
|
|
|
<p>To get a list of available options use:</p>
|
|
|
|
<pre>
|
|
tidy -help
|
|
</pre>
|
|
|
|
<p>You may want to run it through more to view the help a page at
|
|
a time.</p>
|
|
|
|
<pre>
|
|
tidy -help | more
|
|
</pre>
|
|
|
|
<p>Input and Output default to stdin/stdout respectively. Single
|
|
letter options apart from -f may be combined as in: tidy -f
|
|
errs.txt -imu foo.html</p>
|
|
|
|
<p>Matej Vela <<a
|
|
href="mailto:vela@debian.org">vela@debian.org</a>> has written
|
|
a <a href="man_page.txt">Unix man page for Tidy</a>, but for the
|
|
latest details on config options and for the release notes please
|
|
visit this page: <a
|
|
href="http://www.w3.org/People/Raggett/tidy">http://www.w3.org/People/Raggett/tidy</a>.</p>
|
|
|
|
<h3><a id="config" name="config">Using a Configuration
|
|
File</a></h3>
|
|
|
|
<p>Tidy now supports a configuration file, and this is now much
|
|
the most convenient way to configure Tidy. Assuming you have
|
|
created a config file named "config.txt" (the name doesn't
|
|
matter), you can instruct Tidy to use it via the command line
|
|
option <tt>-config config.txt</tt>, e.g.</p>
|
|
|
|
<pre>
|
|
tidy -config config.txt file1.html file2.html
|
|
</pre>
|
|
|
|
<p>Alternatively, you can name the default config file via the
|
|
environment variable named "HTML_TIDY". Note this should be the
|
|
absolute path since you are likely to want to run Tidy in
|
|
different directories. You can also set a config file at compile
|
|
time by defining TIDY_CONFIG_FILE as the path string, see
|
|
platform.h.</p>
|
|
|
|
<p>You can now set config options on the command line by
|
|
preceding the name of the option immediately (no intervening
|
|
space) by "--", for example:</p>
|
|
|
|
<pre>
|
|
tidy --break-before-br true --show-warnings false
|
|
</pre>
|
|
|
|
<p>The following options are supported:</p>
|
|
|
|
<dl>
|
|
<dt>tidy-mark: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em> (the default) Tidy will add a meta
|
|
element to the document head to indicate that the document has
|
|
been tidied. To suppress this, set tidy-mark to <em>no</em>. Tidy
|
|
won't add a meta element if one is already present.</dd>
|
|
|
|
<dt>markup: <em>bool</em></dt>
|
|
|
|
<dd>Determines whether Tidy generates a pretty printed version of
|
|
the markup. Bool values are either <em>yes</em> or <em>no</em>.
|
|
Note that Tidy won't generate a pretty printed version if it
|
|
finds unknown tags, or missing trailing quotes on attribute
|
|
values, or missing trailing '>' on tags. The default is
|
|
<em>yes</em>.</dd>
|
|
|
|
<dt>wrap: <em>number</em></dt>
|
|
|
|
<dd>Sets the right margin for line wrapping. Tidy tries to wrap
|
|
lines so that they do not exceed this length. The default is 66.
|
|
Set wrap to zero if you want to disable line wrapping.</dd>
|
|
|
|
<dt>wrap-attributes: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, attribute values may be wrapped
|
|
across lines for easier editing. The default is no. This option
|
|
can be set independently of wrap-scriptlets</dd>
|
|
|
|
<dt>wrap-script-literals: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this allows lines to be wrapped
|
|
within string literals that appear in script attributes. The
|
|
default is <em>no</em>. The example shows how Tidy wraps a really
|
|
really long script string literal inserting a backslash character
|
|
before the linebreak:
|
|
|
|
<pre>
|
|
<a href="somewhere.html" onmouseover="document.status = '...some \
|
|
really, really, really, really, really, really, really, really, \
|
|
really, really long string..';">test</a>
|
|
</pre>
|
|
</dd>
|
|
|
|
<dt>wrap-asp: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>no</em>, this prevents lines from being wrapped
|
|
within ASP pseudo elements, which look like:
|
|
<% ... %>. The default is <em>yes</em>.</dd>
|
|
|
|
<dt>wrap-jste: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>no</em>, this prevents lines from being wrapped
|
|
within JSTE pseudo elements, which look like:
|
|
<# ... #>. The default is <em>yes</em>.</dd>
|
|
|
|
<dt>wrap-php: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>no</em>, this prevents lines from being wrapped
|
|
within PHP pseudo elements. The default is <em>yes</em>.</dd>
|
|
|
|
<dt>literal-attributes: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this ensures that whitespace
|
|
characters within attribute values are passed through unchanged.
|
|
The default is <em>no</em>.</dd>
|
|
|
|
<dt>tab-size: <em>number</em></dt>
|
|
|
|
<dd>Sets the number of columns between successive tab stops. The
|
|
default is 4. It is used to map tabs to spaces when reading
|
|
files. Tidy never outputs files with tabs.</dd>
|
|
|
|
<dt>indent: <em>no, yes</em> or <em>auto</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy will indent block-level tags.
|
|
The default is <em>no</em>. If set to <em>auto</em> Tidy will
|
|
decide whether or not to indent the content of tags such as
|
|
title, h1-h6, li, td, th, or p depending on whether or not the
|
|
content includes a block-level element. You are advised to avoid
|
|
setting indent to yes as this can expose layout bugs in some
|
|
browsers.</dd>
|
|
|
|
<dt>indent-spaces: <em>number</em></dt>
|
|
|
|
<dd>Sets the number of spaces to indent content when indentation
|
|
is enabled. The default is 2 spaces.</dd>
|
|
|
|
<dt>indent-attributes: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, each attribute will begin on a new
|
|
line. The default is <em>no</em>.</dd>
|
|
|
|
<dt>hide-endtags: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, optional end-tags will be omitted
|
|
when generating the pretty printed markup. This option is ignored
|
|
if you are outputting to XML. The default is <em>no</em>.</dd>
|
|
|
|
<dt>input-xml: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy will use the XML parser rather
|
|
than the error correcting HTML parser. The default is
|
|
<em>no</em>.</dd>
|
|
|
|
<dt>output-xml: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy will use generate the pretty
|
|
printed output writing it as well-formed XML. Any entities not
|
|
defined in XML 1.0 will be written as numeric entities to allow
|
|
them to be parsed by an XML parser. The tags and attributes will
|
|
be in the case used in the input document, regardless of other
|
|
options. The default is <em>no</em>.</dd>
|
|
|
|
<dt>add-xml-pi: <em>bool</em></dt>
|
|
|
|
<dt>add-xml-decl: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy will add the XML declatation
|
|
when outputting XML or XHTML. The default is <em>no</em>. Note
|
|
that if the input document includes an <?xml?> declaration
|
|
then it will appear in the output independent of the value of
|
|
this option.</dd>
|
|
|
|
<dt>output-xhtml: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy will generate the pretty printed
|
|
output writing it as extensible HTML. The default is <em>no</em>.
|
|
This option causes Tidy to set the doctype and default namespace
|
|
as appropriate to XHTML. If a doctype or namespace is given they
|
|
will checked for consistency with the content of the document. In
|
|
the case of an inconsistency, the corrected values will appear in
|
|
the output. For XHTML, entities can be written as named or
|
|
numeric entities according to the value of the "numeric-entities"
|
|
property. The tags and attributes will be output in the case used
|
|
in the input document, regardless of other options.</dd>
|
|
|
|
<dt>doctype: <em>omit, auto, strict, loose</em> or
|
|
<<em>fpi</em>></dt>
|
|
|
|
<dd>This property controls the doctype declaration generated by
|
|
Tidy. If set to <em>omit</em> the output file won't contain a
|
|
doctype declaration. If set to <em>auto</em> (the default) Tidy
|
|
will use an educated guess based upon the contents of the
|
|
document. If set to <em>strict</em>, Tidy will set the doctype to
|
|
the strict DTD. If set to <em>loose</em>, the doctype is set to
|
|
the loose (transitional) DTD. Alternatively, you can supply a
|
|
string for the formal public identifier (fpi) for example:</dd>
|
|
|
|
<dd>
|
|
<pre>
|
|
doctype: "-//ACME//DTD HTML 3.14159//EN"
|
|
</pre>
|
|
</dd>
|
|
|
|
<dd>If you specify the fpi for an XHTML document, Tidy will set
|
|
the system identifier to the empty string. Tidy leaves the
|
|
document type for generic XML documents unchanged.</dd>
|
|
|
|
<dt>char-encoding: <em>raw, ascii, latin1, utf8</em> or
|
|
<em>iso2022</em></dt>
|
|
|
|
<dd>Determines how Tidy interprets character streams. For
|
|
<em>ascii</em>, Tidy will accept Latin-1 character values, but
|
|
will use entities for all characters whose value > 127. For
|
|
<em>raw</em>, Tidy will output values above 127 without
|
|
translating them into entities. For <em>latin1</em> characters
|
|
above 255 will be written as entities. For <em>utf8</em>, Tidy
|
|
assumes that both input and output is encoded as UTF-8. You can
|
|
use <em>iso2022</em> for files encoded using the ISO2022 family
|
|
of encodings e.g. ISO 2022-JP. The default is
|
|
<em>ascii</em>.</dd>
|
|
|
|
<dt>numeric-entities: <em>bool</em></dt>
|
|
|
|
<dd>Causes entities other than the basic XML 1.0 named entities
|
|
to be written in the numeric rather than the named entity form.
|
|
The default is <em>no</em></dd>
|
|
|
|
<dt>quote-marks: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this causes " characters to be
|
|
written out as &quot; as is preferred by some editing
|
|
environments. The apostrophe character ' is written out as
|
|
&#39; since many web browsers don't yet support &apos;.
|
|
The default is <em>no</em>.</dd>
|
|
|
|
<dt>quote-nbsp: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this causes non-breaking space
|
|
characters to be written out as entities, rather than as the
|
|
Unicode character value 160 (decimal). The default is
|
|
<em>yes</em>.</dd>
|
|
|
|
<dt>quote-ampersand: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this causes unadorned &
|
|
characters to be written out as &amp;. The default is
|
|
<em>yes</em>.</dd>
|
|
|
|
<dt>assume-xml-procins: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this changes the parsing of
|
|
processing instructions to require ?> as the terminator rather
|
|
than >. The default is <em>no</em>. This option is
|
|
automatically set if the input is in XML.</dd>
|
|
|
|
<dt>fix-backslash: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this causes backslash characters "\"
|
|
in URLs to be replaced by forward slashes "/". The default is
|
|
<em>yes</em>.</dd>
|
|
|
|
<dt>break-before-br: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy will output a line break before
|
|
each <br> element. The default is <em>no</em>.</dd>
|
|
|
|
<dt>uppercase-tags: <em>bool</em></dt>
|
|
|
|
<dd>Causes tag names to be output in upper case. The default is
|
|
<em>no</em> resulting in lowercase, except for XML input where
|
|
the original case is preserved.</dd>
|
|
|
|
<dt>uppercase-attributes: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em> attribute names are output in upper
|
|
case. The default is <em>no</em> resulting in lowercase, except
|
|
for XML where the original case is preserved.</dd>
|
|
|
|
<dt><a id="word2000" name="word2000">word-2000:
|
|
<em>bool</em></a></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy will go to great pains to strip
|
|
out all the surplus stuff Microsoft Word 2000 inserts when you
|
|
save Word documents as "Web pages". The default is <em>no</em>.
|
|
Note that Tidy doesn't yet know what to do with VML markup from
|
|
Word, but in future I hope to be able to map VML to SVG.<br />
|
|
<br />
|
|
Microsoft has developed its own optional filter for exporting to
|
|
HTML, and the 2.0 version is much improved. You can download the
|
|
filter free from the <a
|
|
href="http://officeupdate.microsoft.com/2000/downloadDetails/Msohtmf2.htm">
|
|
Microsoft Office Update site</a>.</dd>
|
|
|
|
<dt>clean: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, causes Tidy to strip out surplus
|
|
presentational tags and attributes replacing them by style rules
|
|
and structural markup as appropriate. It works well on the html
|
|
saved from Microsoft Office'97. The default is <em>no</em>.</dd>
|
|
|
|
<dt>logical-emphasis: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, causes Tidy to replace any occurrence
|
|
of i by em and any occurrence of b by strong. In both cases, the
|
|
attributes are preserved unchanged. The default is <em>no</em>.
|
|
This option can now be set independently of the clean and
|
|
drop-font-tags options.</dd>
|
|
|
|
<dt>drop-empty-paras: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, empty paragraphs will be discarded.
|
|
If set to no, empty paragraphs are replaced by a pair of
|
|
<code>br</code> elements as HTML4 precludes empty paragraphs. The
|
|
default is <em>yes</em>.</dd>
|
|
|
|
<dt>drop-font-tags: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em> together with the clean option (see
|
|
above), Tidy will discard font and center tags rather than
|
|
creating the corresponding style rules. The default is
|
|
<em>no</em>.</dd>
|
|
|
|
<dt>enclose-text: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this causes Tidy to enclose any text
|
|
it finds in the body element within a p element. This is useful
|
|
when you want to take an existing html file and use it with a
|
|
style sheet. Any text at the body level will screw up the
|
|
margins, but wrap the text within a p element and all is well!
|
|
The default is <em>no</em>.</dd>
|
|
|
|
<dt>enclose-block-text: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this causes Tidy to insert a p
|
|
element to enclose any text it finds in any element that allows
|
|
mixed content for HTML transitional but not HTML strict. The
|
|
default is <em>no</em>.</dd>
|
|
|
|
<dt>fix-bad-comments: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this causes Tidy to replace
|
|
unexpected hyphens with "=" characters when it comes across
|
|
adjacent hyphens. The default is <em>yes</em>. This option is
|
|
provided for users of Cold Fusion which uses the comment syntax:
|
|
<!--- ---></dd>
|
|
|
|
<dt>add-xml-space: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, this causes Tidy to add
|
|
xml:space="preserve" to elements such as pre, style and script
|
|
when generating XML. This is needed if the whitespace in such
|
|
elements is to be parsed appropriately without having access to
|
|
the DTD. The default is <em>no</em>.</dd>
|
|
|
|
<dt>alt-text: <em>string</em></dt>
|
|
|
|
<dd>This allows you to set the default alt text for img
|
|
attributes. This feature is dangerous as it suppresses further
|
|
accessibility warnings. <b>YOU ARE RESPONSIBLE FOR MAKING YOUR
|
|
DOCUMENTS ACCESSIBLE TO PEOPLE WHO CAN'T SEE THE
|
|
IMAGES!!!</b></dd>
|
|
|
|
<dt>write-back: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy will write back the tidied
|
|
markup to the same file it read from. The default is <em>no</em>.
|
|
You are advised to keep copies of important files before tidying
|
|
them as on rare occasions the result may not always be what you
|
|
expect.</dd>
|
|
|
|
<dt>keep-time: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy won't alter the last modified
|
|
time for files it writes back to. The default is <em>yes</em>.
|
|
This allows you to tidy files without effecting which ones will
|
|
be uploaded to the Web server when using a tool such as
|
|
'SiteCopy'. Note that this feature may not work on some
|
|
platforms.</dd>
|
|
|
|
<dt>error-file: <em>filename</em></dt>
|
|
|
|
<dd>Writes errors and warnings to the named file rather than to
|
|
stderr.</dd>
|
|
|
|
<dt>show-warnings: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>no</em>, warnings are suppressed. This can be
|
|
useful when a few errors are hidden in a flurry of warnings. The
|
|
default is <em>yes</em>.</dd>
|
|
|
|
<dt>quiet: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy won't output the welcome message
|
|
or the summary of the numbers of errors and warnings. The default
|
|
is <em>no</em>.</dd>
|
|
|
|
<dt>gnu-emacs: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em>, Tidy changes the format for reporting
|
|
errors and warnings to a format that is more easily parsed by GNU
|
|
Emacs. The default is <em>no</em>.</dd>
|
|
|
|
<dt>split: <em>bool</em></dt>
|
|
|
|
<dd>If set to <em>yes</em> Tidy will use the input file to create
|
|
a sequence of slides, splitting the markup prior to each
|
|
successive <h2>. You can see an example of the results in a
|
|
<a
|
|
href="http://www.w3.org/Talks/1999/03/24-stockholm-xhtml/">recent
|
|
talk I made on XHTML</a>. The slides are written to
|
|
"slide1.html", "slide2.html" etc. The default is
|
|
<em>no</em>.</dd>
|
|
|
|
<dt>new-empty-tags: <em>tag1, tag2, tag3</em></dt>
|
|
|
|
<dd>Use this to declare new empty inline tags. The option takes a
|
|
space or comma separated list of tag names. Unless you declare
|
|
new tags, Tidy will refuse to generate a tidied file if the input
|
|
includes previously unknown tags. Remember to also declare empty
|
|
tags as either inline or blocklevel, see below.</dd>
|
|
|
|
<dt>new-inline-tags: <em>tag1, tag2, tag3</em></dt>
|
|
|
|
<dd>Use this to declare new non-empty inline tags. The option
|
|
takes a space or comma separated list of tag names. Unless you
|
|
declare new tags, Tidy will refuse to generate a tidied file if
|
|
the input includes previously unknown tags.</dd>
|
|
|
|
<dt>new-blocklevel-tags: <em>tag1, tag2, tag3</em></dt>
|
|
|
|
<dd>Use this to declare new block-level tags. The option takes a
|
|
space or comma separated list of tag names. Unless you declare
|
|
new tags, Tidy will refuse to generate a tidied file if the input
|
|
includes previously unknown tags. Note you can't change the
|
|
content model for elements such as table, ul, ol and dl. This is
|
|
explained in more detail in the <a
|
|
href="release-notes.html">release notes</a>.</dd>
|
|
|
|
<dt>new-pre-tags: <em>tag1, tag2, tag3</em></dt>
|
|
|
|
<dd>Use this to declare new tags that are to be processed in
|
|
exactly the same way as HTML's pre element. The option takes a
|
|
space or comma separated list of tag names. Unless you declare
|
|
new tags, Tidy will refuse to generate a tidied file if the input
|
|
includes previously unknown tags. Note you can't as yet add new
|
|
CDATA elements (similar to script).</dd>
|
|
</dl>
|
|
|
|
<h4>Sample Config File</h4>
|
|
|
|
<p>This is just an example to get you started.</p>
|
|
|
|
<pre>
|
|
// sample config file for HTML tidy
|
|
indent: auto
|
|
indent-spaces: 2
|
|
wrap: 72
|
|
markup: yes
|
|
output-xml: no
|
|
input-xml: no
|
|
show-warnings: yes
|
|
numeric-entities: yes
|
|
quote-marks: yes
|
|
quote-nbsp: yes
|
|
quote-ampersand: no
|
|
break-before-br: no
|
|
uppercase-tags: no
|
|
uppercase-attributes: no
|
|
char-encoding: latin1
|
|
new-inline-tags: cfif, cfelse, math, mroot,
|
|
mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
|
|
munder, mover, mmultiscripts, msup, msub, mtext,
|
|
mprescripts, mtable, mtr, mtd, mth
|
|
new-blocklevel-tags: cfoutput, cfquery
|
|
new-empty-tags: cfelse
|
|
</pre>
|
|
|
|
<h3><a id="scripts" name="scripts">Using Tidy from
|
|
scripts</a></h3>
|
|
|
|
<p>If you want to run Tidy from a Perl or other scripting
|
|
language you may find it of value to inspect the result returned
|
|
by Tidy when it exits: 0 if everything is fine, 1 if there were
|
|
warnings and 2 if there were errors. This is an example using
|
|
Perl:</p>
|
|
|
|
<pre>
|
|
if (close(TIDY) == 0) {
|
|
my $exitcode = $? >> 8;
|
|
if ($exitcode == 1) {
|
|
printf STDERR "tidy issued warning messages\n";
|
|
} elsif ($exitcode == 2) {
|
|
printf STDERR "tidy issued error messages\n";
|
|
} else {
|
|
die "tidy exited with code: $exitcode\n";
|
|
}
|
|
} else {
|
|
printf STDERR "tidy detected no errors\n";
|
|
}
|
|
</pre>
|
|
|
|
<h3><a id="download" name="download">Downloadable
|
|
Binaries</a></h3>
|
|
|
|
<p class="note">If you are prepared to maintain a public URL for
|
|
HTML Tidy compiled for a specific platform, please let me know so
|
|
that I can add a link to your page. This will avoid the need for
|
|
me to update this page whenever you recompile.</p>
|
|
|
|
<div class="platforms">
|
|
<h4>Windows 95/98/NT/2000</h4>
|
|
|
|
<p><b><a
|
|
href="http://www.w3.org/People/Raggett/tidy.exe">tidy.exe</a></b>.
|
|
Windows 95/98/NT/2000 executable (32-bit Windows console-mode
|
|
program). This is the executable that I maintain as part of the
|
|
HTML Tidy distribution. The command line parameters are described
|
|
above, along with the extensive configuration file options.</p>
|
|
|
|
<p><b><a
|
|
href="http://www.chami.com/free/html-kit/">HTML-Kit</a></b> - a
|
|
free HTML editor for Windows 95/98/NT/2000 with integrated
|
|
support for Tidy.</p>
|
|
|
|
<p><b><a
|
|
href="http://perso.wanadoo.fr/ablavier/TidyGUI/">TidyGUI</a></b>.
|
|
Windows front end for running Tidy, written by André
|
|
Blavier. André has also written a <b><a
|
|
href="http://perso.wanadoo.fr/ablavier/TidyCOM/">Windows COM
|
|
wrapper</a></b> for Tidy. He describes how to use this from
|
|
Visual Basic.</p>
|
|
|
|
<p><b><a href="http://www.evrsoft.com/">Evrsoft's 1st Page
|
|
2000</a></b> - a free HTML editor for Windows 95/98/NT/2000 with
|
|
integrated support for Tidy. 1st Page 2000 is a high-end
|
|
authoring tool that makes it easy to add effects based upon
|
|
scripting.</p>
|
|
|
|
<p><b><a href="http://www.notetab.com/">NoteTab</a></b> - an
|
|
award winning text and html editor for Windows with built-in
|
|
support for running HTML Tidy. NoteTab is written by Eric
|
|
Fookes.</p>
|
|
|
|
<h4>Mac OS</h4>
|
|
|
|
Several versions of <a
|
|
href="http://www.geocities.com/SiliconValley/1057/tidy.html">HTML
|
|
Tidy for Mac OS</a> are available, including a standalone
|
|
Macintosh application with a graphical user interface, a BBEdit
|
|
plugin, a MPW tool, or as a FilterTop filter ( <a
|
|
href="http://www.geocities.com/SiliconValley/1057/images/TidyHTML.GIF">
|
|
Screenshot</a>). My thanks to <a
|
|
href="mailto:teague@mailandnews.com">Terry Teague</a> for this
|
|
port.<br />
|
|
<br />
|
|
|
|
|
|
<h4>Atari</h4>
|
|
|
|
<p>Arnaud Bercegeay's site for the <a
|
|
href="http://tidy.atari.org">Atari binary for Tidy</a>.</p>
|
|
|
|
<h4>Amiga</h4>
|
|
|
|
<p>Keith Blakemore-Noble maintains a page for <a
|
|
href="http://www.amiga.u-net.com/MadDogSoftware/Tidy.html">Tidy
|
|
on Amiga</a>.</p>
|
|
|
|
<h4>BeOS</h4>
|
|
|
|
<p>Peter Enzerink is maintaining <a
|
|
href="http://www.bytepeople.com/beos/apps/htmltidy.html">HTML
|
|
Tidy</a> for BeOS. Link points to download for HTML Tidy as well
|
|
as HTML Tidy editor addons for BeOS.</p>
|
|
|
|
<h4>AIX</h4>
|
|
|
|
<p>Ciaran Deignan maintains an <a
|
|
href="http://www-frec.bull.com/cgi-bin/list_dir.cgi/download/">AIX
|
|
binary for Tidy</a>. The link is to a general download page. The
|
|
executable is available for AIX 4.3.2 and later.</p>
|
|
|
|
<h4>Linux</h4>
|
|
|
|
<p>Dimitri Papadopoulos maintains a <a
|
|
href="http://perso.club-internet.fr/dpo/rpm/">Tidy RPM package
|
|
for Redhat Linux</a> You may also be able to find Tidy on other
|
|
Linux distribution sites, e.g. <a
|
|
href="http://rpmfind.net/">http://rpmfind.net/</a>.</p>
|
|
|
|
<!-- no longer accessible :-(
|
|
<p><b><a href=
|
|
"http://www.astro.uni-bonn.de/~webstw/cm/w3c_tidy/index.html">
|
|
Linux users</a></b>! ochen M. Braun is maintaining Tidy binary
|
|
for Linux (ELF 32-bit LSB executable using '<tt>libc.so.5</tt>'
|
|
for Intel 80386): '<a href=
|
|
"ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy"><tt>tidy</tt></a>
|
|
'. Additionally a man page can be downloaded: <a href=
|
|
"ftp://ftp.astro.uni-bonn.de/pub/webstw/linsoft/tidy.1"><tt>
|
|
tidy.1</tt></a>.</p>
|
|
-->
|
|
<h4>UnixWare</h4>
|
|
|
|
<p>Simon Trimmer <<a
|
|
href="mailto:simon@ocston.org">simon@ocston.org</a>> maintains
|
|
a <a href="http://www.ocston.org/~simon/tidy/">Tidy binary for
|
|
Unixware</a>.</p>
|
|
|
|
<h4>HP-UX</h4>
|
|
|
|
<p>You can get precompiled versions of Tidy for HPUX, from <a
|
|
href="http://www.informatik.uni-stuttgart.de/ifi/gr/mitarbeiter/hopp/tidy/tidy.html">
|
|
Olaf Hopp</a>, and from <a
|
|
href="http://geocities.com/ian_springer/hpux_tidy.html">Ian
|
|
Springer</a>.</p>
|
|
|
|
<h4>MSDOS</h4>
|
|
|
|
<p>Nick B. maintains <a
|
|
href="http://members.xoom.com/nickbeee/tidy386/">Tidy386 for
|
|
DOS</a>. This exploits the DPMI mechanism for the memory
|
|
management.</p>
|
|
|
|
<h4>Solaris</h4>
|
|
|
|
<p>Stephen Fuqua maintains a page for <a
|
|
href="http://www.hep.utexas.edu/~sfuqua/unix">Tidy on
|
|
Solaris</a>.</p>
|
|
|
|
<h4>OS/2</h4>
|
|
|
|
<p>Kaz SHiMZ <<a
|
|
href="mailto:kshimz@sfc.co.jp">kshimz@sfc.co.jp</a>> maintains
|
|
an <a
|
|
href="http://www.dd.iij4u.or.jp/~kshimz/warp/tidy/index.html">OS/2
|
|
binary for Tidy</a>.</p>
|
|
|
|
<h4>FreeBSD</h4>
|
|
|
|
<p>Martin Fouts maintains <a
|
|
href="http://www.fogey.com/fouts/tidy.htm">Tidy on
|
|
FreeBSD</a>.</p>
|
|
|
|
<h4>RISC OS</h4>
|
|
|
|
<p><a href="mailto:archifishal@altavista.net">Alex Macfarlane
|
|
Smith</a> maintains a <a
|
|
href="http://www.toth.org.uk/~aardvark/programs/tidy.shtml">port
|
|
of Tidy to the RISC OS</a>.</p>
|
|
|
|
<h4>MiNT (Atari) OS</h4>
|
|
|
|
<p><a href="mailto:eaiching@t0.or.at)">Edgar Aichinger</a>
|
|
maintains a <a
|
|
href="http://wh58-508.st.uni-magdeburg.de/sparemint/html/packages/tidy.html">
|
|
port of Tidy to the MiNT OS</a>. MiNT is a UNIX for m68k Atari
|
|
computers and is nearly FHS compliant (we don't use bootable OS
|
|
images nor have any mounting capabilities, so neither /boot nor
|
|
/mnt are used). The binary also runs on ordinary TOS, since the
|
|
MiNT libraries cover all GEMDOS/GEM functions.</p>
|
|
</div>
|
|
|
|
<h3><a id="quotes" name="quotes">Integrating Tidy as part of
|
|
other Software</a></h3>
|
|
|
|
<p>You can also incorporate Tidy as part of a larger program, for
|
|
instance in HTML editors or HTML transformation tools used for
|
|
import filters, or for when you want to customize Web content to
|
|
get the best out of different kinds of browsers. Imagine
|
|
authoring clean HTML with CSS and at a touch of a button
|
|
producing variants that look great and work reliably on a large
|
|
variety of different browsers, taking into account the quirks of
|
|
each. For instance, providing the ability to tune content for
|
|
different versions of Netscape and Internet Explorer, and for
|
|
browsers running on set-top boxes for televisions, handheld and
|
|
palmtop devices, cell phones, and voice browsers. I am happy to
|
|
quote for software development for such tools.</p>
|
|
|
|
<p>Sebastian Lange has contributed a perl wrapper for calling
|
|
Tidy from your perl scripts, see <a
|
|
href="sl-tidy.pl">sl-tidy.pl</a>.</p>
|
|
|
|
<h4>Using Tidy from emacs</h4>
|
|
|
|
<p>Pete Gelbman emailed this <a
|
|
href="http://lists.w3.org/Archives/Public/html-tidy/2000AprJun/0047.html">
|
|
tip</a> for using Tidy with the Unix version of emacs. lets you
|
|
highlight a region of text and run Tidy on it. Tidy's "fixed"
|
|
output will replace your highlighted region right in place. The
|
|
error/warnings output will be directed into a separate
|
|
mini-buffer below in your main screen.</p>
|
|
|
|
<h3><a id="java" name="java">Java port of HTML Tidy</a></h3>
|
|
|
|
<p>Andy Quick <<a
|
|
href="mailto:ac.quick@sympatico.ca">ac.quick@sympatico.ca</a>>
|
|
maintains a Java port of Tidy, so you can now integrate Tidy into
|
|
your Java applications. Andy is tracking the releases of Tidy in
|
|
C (this page). More information is available on <a
|
|
href="http://www3.sympatico.ca/ac.quick/">Andy's home
|
|
page</a>.</p>
|
|
|
|
<h3><a id="implementation" name="implementation">Source
|
|
Code</a></h3>
|
|
|
|
<p>The code is in ANSI C and uses the C standard library for i/o.
|
|
The parser works top down, building a complete parse tree in
|
|
memory. Document text is held as Unicode represented as UTF-8 in
|
|
a character buffer that expands as needed. The code has so far
|
|
been tested on Windows'95, Windows'98, Windows NT, Windows 2000,
|
|
Linux, FreeBSD, NetBSD, Ultrix, OSF, OS/MP, IRIX, NeXtStep,
|
|
MacOS, BeOS, OS/2, AIX, Amiga, Atari, SunOS, Solaris, IRIX and
|
|
HP-UX, amongst others.</p>
|
|
|
|
<p>Here is a link to the Open Source <a href="tidy.c">copyright
|
|
notice and license</a>.</p>
|
|
|
|
<dl>
|
|
<dt><a href="../tidy4aug00.tgz">tidy4aug00.tgz</a></dt>
|
|
|
|
<dd>gzipped tar file for source code (Unix line ends)</dd>
|
|
|
|
<dt><a href="../tidy4aug00.zip">tidy4aug00.zip</a></dt>
|
|
|
|
<dd>zipped source code (Windows line ends)</dd>
|
|
|
|
<dt><a href="platform.h">platform.h</a>, <a
|
|
href="html.h">html.h</a></dt>
|
|
|
|
<dd>the include files with common definitions</dd>
|
|
|
|
<dt><a href="config.c">config.c</a></dt>
|
|
|
|
<dd>support for customizing Tidy via config files</dd>
|
|
|
|
<dt><a href="lexer.c">lexer.c</a></dt>
|
|
|
|
<dd>lexical analysis and buffer management</dd>
|
|
|
|
<dt><a href="parser.c">parser.c</a></dt>
|
|
|
|
<dd>HTML and XML parsers</dd>
|
|
|
|
<dt><a href="tags.c">tags.c</a></dt>
|
|
|
|
<dd>dictionary of tags and their properties</dd>
|
|
|
|
<dt><a href="attrs.c">attrs.c</a></dt>
|
|
|
|
<dd>dictionary of attributes and their properties</dd>
|
|
|
|
<dt><a href="istack.c">istack.c</a></dt>
|
|
|
|
<dd>stack of active inline elements</dd>
|
|
|
|
<dt><a href="entities.c">entities.c</a></dt>
|
|
|
|
<dd>dictionary of entities</dd>
|
|
|
|
<dt><a href="clean.c">clean.c</a></dt>
|
|
|
|
<dd>smarts for cleaning up presentational markup</dd>
|
|
|
|
<dt><a href="pprint.c">pprint.c</a></dt>
|
|
|
|
<dd>pretty printing for HTML and XML</dd>
|
|
|
|
<dt><a href="localize.c">localize.c</a></dt>
|
|
|
|
<dd>Change this file to localize tidy's messages</dd>
|
|
|
|
<dt><a href="tidy.c">tidy.c</a></dt>
|
|
|
|
<dd>main() and error reporting routines</dd>
|
|
|
|
<dt><a href="Makefile">Makefile</a></dt>
|
|
|
|
<dd>Makefile for gcc</dd>
|
|
|
|
<dt><a href="man_page.txt">Unix Man page</a></dt>
|
|
|
|
<dd>Maintained by Matej Vela <vela@debian.org></dd>
|
|
</dl>
|
|
|
|
<p>Conventions for whether lines end with CRLF, LF or CR vary
|
|
from one system to another. I have included the C source for a
|
|
utility <b>tab2space</b> which can be used to ensure that files
|
|
use the line end convention of your choice, and to expand tabs to
|
|
spaces.</p>
|
|
|
|
<pre>
|
|
tab2space -t4 -unix *.h *.c
|
|
tab2space -tabs -unix Makefile
|
|
</pre>
|
|
|
|
<p>Note use of "-tabs" to ensure that tabs are preserved in the
|
|
Makefile (it won't work without them!).</p>
|
|
|
|
<p>For those of you on Unix, here is a script you can use to
|
|
strip carriage returns:</p>
|
|
|
|
<pre>
|
|
#!/bin/sh
|
|
echo Stripping Carriage Returns from files...
|
|
for i
|
|
do
|
|
# If a writable file
|
|
if [ -f $i ]
|
|
then
|
|
if [ -w $i ]
|
|
then
|
|
echo $i
|
|
# strip CRs from input and output to temp file
|
|
tr -d '\015' < $i > toix.tmp
|
|
mv toix.tmp $i
|
|
else
|
|
echo $i: write-protected
|
|
fi
|
|
else
|
|
echo $i: not a file
|
|
fi
|
|
done
|
|
</pre>
|
|
|
|
<p>Save this script to a file, e.g. "<em>scripcr</em>" and use
|
|
"<em>chmod +x stripcr</em>" to make it executable. You can then
|
|
run it as "<em>stripcr *.c *.h Overview.html Makefile</em>"</p>
|
|
|
|
<h2><a id="acks" name="acks">Acknowledgements</a></h2>
|
|
|
|
<p>I would like to thank the many people who have written to me
|
|
with suggestions for improvements or reporting bugs. Your help
|
|
has been invaluable.</p>
|
|
|
|
<blockquote class="people">Jonathan Adair, Drew Adams, Osma
|
|
Ahvenlampi, Carsten Allefeld, Richard Allsebrook, Jacob Sparre
|
|
Andersen, Joe D'Andrea, Jerry Andrews, Bruce Aron, Takuya Asada,
|
|
Edward Avis, Carlos Piqueres Ayela, Nick B, Chang Hyun Baek, Nick
|
|
B, Denis Barbier, Chuck Baslock, Christer Bernerus, David J.
|
|
Biesack, John Bigby, Yu Jian Bin, Alexander Biron, Keith
|
|
Blakemore-Noble, Eric Blossom, Berend de Boer, Ochen M. Braun,
|
|
Dave Bryan, David Brooke, Andy Brown, Keith B. Brown, Andreas
|
|
Buchholz, Maurice Buxton, Jelks Cabaniss, John Cappelletti,
|
|
Trevor Carden, Terry Cassidy, Mathew Cepl, Kendall Clark, Rob
|
|
Clark, Jeremy Clulow, Dan Connolly, Larry Cousin, Ken Cox, Luis
|
|
M. Cruz, John Cumming, Ian Davey, Keith Davies, Ciaran Deignan,
|
|
David Duffy, Emma Duke-Williams, Tamminen Eero, Bodo Eing, Peter
|
|
Enzerink, Baruch Even, David Fallon, Claus André
|
|
Färber, Stephanie Foott, Darren Forcier, Martin Fouts,
|
|
Frederik Fouvry, Rene Fritz, Stephen Fuqua, Martin Gallwey, Pete
|
|
Gelbman, Francisco Guardiola, David Getchell, Michael Giroux,
|
|
Davor Golek, Guus Goos, Léa Gris, Rainer Gutsche, Kai
|
|
Hackemesser, Juha Häikiö, David Halliday,
|
|
Johann-Christian Hanke, Vlad Harchev, Shane Harrelson, Andre
|
|
Hinrichs, Bjoern Hoehrmann, G. Ken Holman, Bill Homer, Olaf Hopp,
|
|
Craig Horman, Jack Horsfield, Nigel Horspool, Pao-Hsi Huang,
|
|
Stuart Hungerford, Marc Jauvin, Rick Jelliffe, Peter Jeremy,
|
|
Craig Johnson, Charles LaFountain, Steven Lobo, Zdenek Kabelac,
|
|
Michael Kay, Jeffery Kendall, Axel Kielhorn, Konstantinos
|
|
Kleisouris, Johannes Koch, Daniel Kohn, Rudy Kohut, Allan
|
|
Kuchinsky, Volker Kuhlmann, Michael LaStella, Johnny Lee, Steve
|
|
Lee, Tony Leneis, Nick Leverton, Todd Lewis, Dietmar Lippold,
|
|
Gert-Jan C. Lokhorst, Murray Longmore, John Love-Jensen,
|
|
Satwinder Mangat, Carole Mah, Anton Marsden, Bede McCall, Shane
|
|
McCarron, Thomas McGuigan, Ian McKellar, Al Medeiros, Chris
|
|
Nappin, Ann Navarro, Jacek Niedziela, Morten Blinksbjerg Nielsen,
|
|
Kenichi Numata, Allan Odgaard, Matt Oshry, Gerald Oskoboiny, Paul
|
|
Ossenbruggen, Ernst Paalvast, Christian Pantel, Dimitri
|
|
Papadopoulos, Rick Parsons, Steven Pemberton, Daniel Persson, Lee
|
|
Anne Phillips, Xavier Plantefeve, Karl Prinz, Andy Quick, Jany
|
|
Quintard, Julian Reschke, Stephen Reynolds, Thomas Ribbrock, Ross
|
|
L. Richardson, Philip Riebold, Erik Rossen, Dan Rudman, Peter
|
|
Ruevski, Christian Ruetgers, Klaus Johannes Rusch, John Russell,
|
|
Eric Schindler, J. Schlauch, Christian Schüler, Klaus
|
|
Alexander Seistrup, Jim Seymour, Kazuyoshi Shimizu, Geoff
|
|
Sinclair, Jo Smith, Paul Smith, Steve Spilker, Rafi Stern,
|
|
Jacques Steyn, Michael J. Suzio, Zac Thompson, Eric Thorbjornsen,
|
|
Oren Tirosh, John Tobler, Omri Traub, Loïc Trégan,
|
|
Jason Tribbeck, Simon Trimmer, Steffen Ullrich, Stuart Updegrave,
|
|
Charles A. Upsdell, Jussi Vestman, Larry W. Virden, Daniel
|
|
Vogelheim, Nigel Wadsworth, Jez Wain, Randy Waki, Paul Ward, Neil
|
|
Weber, Bertilo Wennergren, Yudong Yang, Jeff Young, Edward Zalta,
|
|
Johannes Zellner, Christian Zuckschwerdt</blockquote>
|
|
|
|
<h3><a id="address" name="address">Dave's Address</a></h3>
|
|
|
|
<pre>
|
|
73b Ground Corner
|
|
Holt
|
|
Wiltshire
|
|
BA14 6RT
|
|
United Kingdom
|
|
</pre>
|
|
|
|
<p><small><a href="http://www.w3.org/People/Raggett">Dave
|
|
Raggett</a> <<a href="mailto:dsr@w3.org">dsr@w3.org</a>> is
|
|
an engineer from <a href="http://www.hp.com/">Hewlett
|
|
Packard</a>'s <a href="http://www.hpl.hp.co.uk">UK
|
|
Laboratories</a>, and works on assignment to the World Wide Web
|
|
Consortium, where he is the W3C lead for HTML, XForms and Voice
|
|
Browsers and Math.</small></p>
|
|
</body>
|
|
</html>
|
|
|