Updated the docs.
This commit is contained in:
parent
701a17400a
commit
db464df7d9
File diff suppressed because it is too large
Load diff
Binary file not shown.
Before Width: | Height: | Size: 1.3 KiB |
300
htmldoc/faq.html
300
htmldoc/faq.html
|
@ -1,300 +0,0 @@
|
|||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
<head>
|
||||
<meta name="generator" content=
|
||||
"HTML Tidy for Mac OS X (vers 1st June 2003), see www.w3.org" />
|
||||
<link type="text/css" rel="stylesheet" href="tidy.css" />
|
||||
<title>HTML Tidy - Frequently Asked Questions</title>
|
||||
<style type="text/css">
|
||||
code { font-weight: bold; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>HTML Tidy - Frequently Asked Questions</h1>
|
||||
|
||||
<h2>Overview</h2>
|
||||
|
||||
<p class="abstract">Certain questions about Tidy come up on a
|
||||
regular basis. These are some that have been culled from postings
|
||||
to the html-tidy@w3.org and tidy-develop@lists.sourceforge.net
|
||||
mailing lists. If you don't see your question addressed here, see
|
||||
<a href="#support">How To Get Support</a> below.</p>
|
||||
|
||||
<ul>
|
||||
<li><a href="#what-now">What Now?</a></li>
|
||||
|
||||
<li><a href="#support">How to Get Support?</a></li>
|
||||
|
||||
<li><a href="#bug">How to Submit A Bug Report</a></li>
|
||||
|
||||
<li><a href="#feature">How to Submit A Feature Request</a></li>
|
||||
|
||||
<li><a href="#layout">How Do I Control the Output Layout?</a></li>
|
||||
|
||||
<li><a href="#version">What Version of Tidy Should I Use?</a></li>
|
||||
|
||||
<li><a href="#regression">How Do I Run A Regression Test?</a></li>
|
||||
</ul>
|
||||
|
||||
<hr />
|
||||
<dl>
|
||||
<dt><a name="what-now" id="what-now"></a>What Now?</dt>
|
||||
|
||||
<dd><p>If you have a popup screen that reads as follows:
|
||||
<pre>
|
||||
HTML Tidy for Windows <vers 1st August 2002; built on Aug 8 2002, at 15:41:13>
|
||||
Parsing Console input <stdin>
|
||||
</pre>
|
||||
|
||||
<p>and do not know what to do next, read on.</p>
|
||||
|
||||
<p>Tidy is waiting for your HTML to come in, so it can parse it.
|
||||
Tidy is fundamentally a tool that reads in HTML cleans it up and
|
||||
writes it out again. It was developed as a program you run from the
|
||||
console prompt, but there are GUI encapsulations available, e.g.
|
||||
HTML-Kit, which you might prefer.</p>
|
||||
|
||||
<p>If you are using Windows, the first step is to unzip the zip file
|
||||
and place the tidy.exe file in a folder somewhere on your executables
|
||||
path. You may also want to set up a config file to save having to type
|
||||
lots of options each time you run Tidy. From the console prompt you can
|
||||
run Tidy like this:</p>
|
||||
|
||||
<pre>
|
||||
C> tidy -m mywebpage.html
|
||||
</pre>
|
||||
|
||||
<p>In this case, the <code>-m</code> option requests Tidy to write
|
||||
the tidied file back to the same filename as it read from
|
||||
(mywebpage.html). Tidy will give you a breakdown of the problems it
|
||||
found and the version of HTML the file appears to be using.</p>
|
||||
|
||||
<p>To get a listing of Tidy command line options, just type
|
||||
<code>tidy -?</code>. To see a listing on configuration options,
|
||||
try <code>tidy -help-config</code>. To get more info on the
|
||||
config options, see the <a
|
||||
href="http://tidy.sourceforge.net/docs/quickref.html">Quick Reference</a>.</p>
|
||||
|
||||
<p>See also Dave Raggett's <a href="http://tidy.sourceforge.net/docs/Overview.html#help">User Guide</a>.</p>
|
||||
|
||||
<p>If you're not comfortable with the DOS command line, you should
|
||||
try one of the <a href="http://tidy.sourceforge.net/#tidylibapps">GUI
|
||||
Applications</a>.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="support" id="support"></a>How To Get Support</dt>
|
||||
|
||||
<dd>
|
||||
<p>For general HTML Tidy support, the original mailing list
|
||||
html-tidy@w3.org is best. Sometimes developers are the last to
|
||||
know... Also, this list covers both Java and C versions, not to
|
||||
mention various value-added products such as GUI front ends, Perl
|
||||
and Python integration, etc. If you don't get a response after a
|
||||
couple tries or if you have a bug fix, bump it over to the
|
||||
developer list at tidy-develop@lists.sourceforge.net. It's not a
|
||||
hard line, but that is the general arrangement.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="bug" id="bug"></a>How to Submit A Bug Report</dt>
|
||||
|
||||
<dd>
|
||||
<p>You are encouraged to report bugs you found to the Tidy
|
||||
developer team. Tidy's quality depends on your feedback. You can
|
||||
either file your bug report in the Sourceforge <a
|
||||
href="http://sourceforge.net/tracker/?func=add&group_id=27659&atid=390963">
|
||||
bug tracker</a> for HTML Tidy (<em>recommended</em>) or send a mail
|
||||
to the mailing list at html-tidy@w3.org. Note you do <em>not</em>
|
||||
have to have a Sourceforge account in order to file bug reports, or
|
||||
be subscribed to html-tidy@w3.org in order to post messages to the
|
||||
list.</p>
|
||||
|
||||
<p>Prior to submitting a bug report, please check that the bug is
|
||||
not already known. Many are. If you are not sure, just ask. If it
|
||||
is new bug, make sure to include at least the following information
|
||||
in your report:</p>
|
||||
|
||||
<ul>
|
||||
<li>A desciption of what you think went wrong.</li>
|
||||
|
||||
<li>The HTML Tidy version (find it out by running <code>tidy
|
||||
-v</code>) and operating system you are running.</li>
|
||||
|
||||
<li>The input, that exposes the bug.<br />
|
||||
A small HTML document that reproduces the problem is best.</li>
|
||||
|
||||
<li>The configuration options you've used. Command line options
|
||||
like<br />
|
||||
<code>-asxml</code>, configuration files, etc. You may use
|
||||
<code>tidy -show-config</code> to get an overview of the active
|
||||
Tidy settings.</li>
|
||||
|
||||
<li>Your e-mail address for further questions and comments.</li>
|
||||
</ul>
|
||||
|
||||
<p>These information are necessary to reproduce whatever is
|
||||
failing, without them we cannot help you. Additional information -
|
||||
and patches - are very welcome!</p>
|
||||
|
||||
<p><em>Please include only one bug per report.</em> Reports with
|
||||
multiple bugs are less easy to track and some bugs may get
|
||||
missed.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="feature" id="feature"></a>How to Submit A Feature
|
||||
Request</dt>
|
||||
|
||||
<dd>
|
||||
<p>If you want Tidy to do something new that it doesn't do today
|
||||
(or stop doing something), then it is probably a feature
|
||||
request.</p>
|
||||
|
||||
<p>The process for submitting a feature request is very similar to
|
||||
bug requests. A different <a
|
||||
href="http://sourceforge.net/tracker/?atid=390966&group_id=27659">
|
||||
tracker</a> is used on SourceForge to denote the difference in
|
||||
subject matter.</p>
|
||||
|
||||
<p>As with bugs, please be sure that the feature has not already
|
||||
been requested. If the feature has already requested, you can add
|
||||
your comments to the feature request tracker, or send mail to the
|
||||
<a href="mailto:html-tidy@w3.org">mailing list</a> indicating your
|
||||
wish to also have the feature implemented. If the feature has not
|
||||
already been requested, send the same information as for a bug
|
||||
report, but place special emphasis on the desired output for a
|
||||
given input, desired options, etc. - please be as specific as
|
||||
possible about what you want Tidy to <em>do</em>.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="layout" id="layout"></a>How Do I Control the Output Layout?</dt>
|
||||
|
||||
<dd>
|
||||
<p>There are three primary options that control how Tidy
|
||||
formats your markup:</p>
|
||||
<ul>
|
||||
<li><a class="code"
|
||||
href="quickref.html#indent">indent</a></li>
|
||||
<li><a class="code"
|
||||
href="quickref.html#indent-attributes">indent-attributes</a></li>
|
||||
<li><a class="code"
|
||||
href="quickref.html#vertical-space">vertical-space</a></li>
|
||||
</ul>
|
||||
|
||||
<p>Briefly, <code>indent</code> sets the level of left-to-right indenting
|
||||
and, somewhat, how often elements are put onto a new line. The options
|
||||
are <code>yes</code>, <code>no</code>, and <code>auto</code>.
|
||||
<code>indent-attributes</code> is a flag that, when set, tells Tidy to
|
||||
put each attribute on a new line. <code>vertical-space</code> is a flag
|
||||
that, when set, tells Tidy to add some empty lines for readability. The
|
||||
default for all three is <code>no</code>. These options may be used in
|
||||
any combination to control you you want your markup to look. The best
|
||||
thing is to experiment a bit to see what you like. Be aware that
|
||||
<code>indent yes</code> is deprecated for production use as it will
|
||||
cause visual changes in most browsers.</p>
|
||||
|
||||
<p>To get Tidy <em>Classic</em> <code>--indent auto</code> layout, use the following options:</p>
|
||||
|
||||
<pre>
|
||||
indent: auto
|
||||
indent-attributes: no
|
||||
vertical-space: yes
|
||||
</pre>
|
||||
|
||||
<p>You can read about more <em>Pretty Print</em> options
|
||||
<a href="quickref.html#PrettyPrintHeader">here</a>.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="version" id="version"></a>What Version of Tidy Should
|
||||
I Use?</dt>
|
||||
|
||||
<dd>
|
||||
<p>The current Source Forge builds are recommended. You can find these at
|
||||
<a href="http://tidy.sourceforge.net">http://tidy.sourceforge.net</a>.
|
||||
People continue to report examples where Tidy does not catch some
|
||||
ill-formed HTML or, worse, generates ill-formed HTML. These cases have
|
||||
been significantly reduced. That said, be sure to test Tidy with some
|
||||
representative files from your environment.</p>
|
||||
|
||||
<p>For development work, use CVS directly on your development
|
||||
system. For information on how to pull Tidy sources from <a
|
||||
href="http://sourceforge.net/cvs/?group_id=27659">CVS</a>. This way
|
||||
you can keep abreast of changes to Tidy and quickly resolve
|
||||
conflicts.</p>
|
||||
|
||||
<p>For building a front end (e.g. GUI or language binding), the
|
||||
simplest approach is to use TidyLib. For more information
|
||||
about building and coding with TidyLib, see the <a
|
||||
href="http://tidy.sourceforge.net/libintro.html">Introduction To TidyLib</a>.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="regression" id="regression">How Do I Run A
|
||||
Regression Test?</a></dt>
|
||||
<dd>
|
||||
<p>You might ask, "Why should I run a regression test?". If you
|
||||
are a Tidy user, you might want to compare a new version of Tidy
|
||||
to the version you are currently running. This is a good idea
|
||||
if you are using Tidy in production applications such as web
|
||||
publishing. If you are a Tidy developer, it is a good idea to
|
||||
run the regression test suite to make sure your fix or enhancement
|
||||
doesn't add new bugs.</p>
|
||||
|
||||
<p>Detecting new bugs is easier said than done, because sometimes
|
||||
they are subtle and can only be seen in browsers (or one particular
|
||||
browser you don't even have). But you can catch most crashes and
|
||||
many layout problems by running the test suite as described here.</p>
|
||||
|
||||
<p>The basic process is simple: run the test suite <strong>before</strong>
|
||||
and <strong>after</strong> making changes to TidyLib and compare the output
|
||||
markup and messages. Be aware that the test scripts for WinNT/2K/XP
|
||||
(alltest.cmd) and Linux/Unix (testall.sh) place the output files in
|
||||
<code>tidy/test/tmp</code>. If you forget to run the <strong>before</strong>
|
||||
test, you can always download a binary from the <a
|
||||
href="http://tidy.sourceforge.net/#binaries">Project Page</a>. If you
|
||||
are not a TidyLib developer, you can download the <a
|
||||
href="http://tidy.sourceforge.net/test/tidy_test.tgz">Test Suite</a>
|
||||
directly. Here are the steps to evaluate the impact of a TidyLib change.</p>
|
||||
|
||||
<h3>For Windows</h3>
|
||||
<p><strong>Before</strong> making changes:</p>
|
||||
<pre>
|
||||
C:\tidy\test> alltest.cmd
|
||||
C:\tidy\test> ren tmp baseline
|
||||
</pre>
|
||||
|
||||
<p><strong>After</strong> making changes and building Tidy:</p>
|
||||
<pre>
|
||||
C:\tidy\test> alltest.cmd
|
||||
C:\tidy\test> windiff tmp baseline
|
||||
</pre>
|
||||
|
||||
<h3>For Linux/Unix</h3>
|
||||
<p><strong>Before</strong> making changes:</p>
|
||||
<pre>
|
||||
~/tidy/test$ ./testall.sh
|
||||
~/tidy/test$ mv tmp baseline
|
||||
</pre>
|
||||
|
||||
<p><strong>After</strong> making changes and building Tidy:</p>
|
||||
<pre>
|
||||
~/tidy/test$ ./testall.sh
|
||||
~/tidy/test$ diff -u tmp baseline > diff.txt
|
||||
</pre>
|
||||
</dd>
|
||||
|
||||
<!--
|
||||
<dt><a name="" id=""></a></dt>
|
||||
<dd>
|
||||
</dd>
|
||||
|
||||
<dt><a name="" id=""></a></dt>
|
||||
<dd>
|
||||
</dd>
|
||||
-->
|
||||
<!-- Save for future questions
|
||||
<dt><a name="" id=""></a></dt>
|
||||
<dd>
|
||||
</dd>
|
||||
-->
|
||||
</dl>
|
||||
</body>
|
||||
</html>
|
|
@ -1,554 +0,0 @@
|
|||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
<head>
|
||||
<meta name="generator" content="HTML Tidy, see www.w3.org" />
|
||||
<title>HTML TIDY - Notes on pending work</title>
|
||||
<meta name="keywords"
|
||||
content="HTML, validation, error correction, pretty-printing" />
|
||||
<meta name="author" content="Dave Raggett <dsr@w3.org>" />
|
||||
<style type="text/css">
|
||||
body {
|
||||
margin-left: 10%;
|
||||
margin-right: 10%;
|
||||
font-family: sans-serif
|
||||
}
|
||||
h1 { margin-left: -8% }
|
||||
h2,h3,h4,h5,h6 { margin-left: -4% }
|
||||
pre { color: green; font-weight: bold;
|
||||
font-size: 80%; font-family: monospace}
|
||||
em { font-style: italic; font-weight: bold }
|
||||
strong { text-transform: uppercase; font-weight: bold }
|
||||
.note {font-style: italic; color: rgb(192, 101, 101) }
|
||||
//hr {text-align: center; width: 60% }
|
||||
blockquote {
|
||||
color: navy;
|
||||
margin-left: 1%;
|
||||
margin-right: 1%;
|
||||
text-align: center;
|
||||
font-family: "Comic Sans MS", "Times New Roman", serif
|
||||
}
|
||||
table {
|
||||
font-family: sans-serif;
|
||||
font-size: 80%;
|
||||
background: rgb(255,255,153)
|
||||
}
|
||||
td {
|
||||
font-size: 80%
|
||||
}
|
||||
.people {font-family: "Lucida Calligraphy", serif}
|
||||
:link { color: rgb(0, 0, 153) }
|
||||
:visited { color: rgb(153, 0, 153) }
|
||||
:active { color: rgb(255, 0, 102) }
|
||||
a :hover { color: rgb(0, 0, 255) }
|
||||
</style>
|
||||
|
||||
<style type="text/css">
|
||||
p.c1 {font-style: italic}
|
||||
</style>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" background="grid.gif" text="black"
|
||||
link="navy" vlink="black" alink="red">
|
||||
<h1>HTML TIDY - Notes on Pending Work</h1>
|
||||
|
||||
<p><a href="http://www.w3.org/People/Raggett">Dave Raggett</a> <a
|
||||
href="mailto:dsr@w3.org">dsr@w3.org</a></p>
|
||||
|
||||
<p>This is a page where I am keeping the suggestions for
|
||||
improvements or bug fixes. My current work load means that I
|
||||
don't get much time to work on HTML Tidy, so I am interested in
|
||||
offers of help!</p>
|
||||
|
||||
<h4>Public Email List for Tidy: <<a
|
||||
href="mailto:html-tidy@w3.org">html-tidy@w3.org</a>></h4>
|
||||
|
||||
<p>I have set up an archived mailing list devoted to Tidy. To
|
||||
subscribe send an email to html-tidy-request@w3.org with the word
|
||||
subscribe in the subject line (include the word unsubscribe if
|
||||
you want to unsubscribe). The <a
|
||||
href="http://lists.w3.org/Archives/Public/html-tidy/">archive</a>
|
||||
for this list is accessible online. Please use this list to
|
||||
report errors or enhancement requests.</p>
|
||||
|
||||
<h2>Things awaiting further attention</h2>
|
||||
|
||||
<ul>
|
||||
<li>Support for BIG5 and ShiftJIS (Rick Jelliffe)</li>
|
||||
|
||||
<li>Stronger checking on which attributes appear on what
|
||||
elements</li>
|
||||
|
||||
<li>Sorting attributes in a canonical order</li>
|
||||
|
||||
<li>Version checking for HTML 4.01 vs 4.0 (Tidy currently will
|
||||
set the document type to 4.01 in preference to 4.0)</li>
|
||||
|
||||
<li>Noticing that the document isn't really XHTML if it isn't
|
||||
wellformed, i.e. it lacks end tags and quotes on attribute
|
||||
values</li>
|
||||
|
||||
<li>Converting <font face="Symbol">a</font> etc. to
|
||||
the corresponding Unicode characters, when cleaning HTML.</li>
|
||||
|
||||
<li>link checking - this would involve some platform dependent
|
||||
code as the network interface varies significantly from one
|
||||
platform to the next.</li>
|
||||
|
||||
<li>When exporting Word2000 to Web page, there is a need for
|
||||
smarter rules of thumb for working out whether the paragraph is a
|
||||
bulletted or numbered list item, and determining the level of
|
||||
nesting. Perhaps the style attribute holds the key? This tends to
|
||||
include substrings like: "mso-list:l0 level1 lfo2;" and
|
||||
"mso-list:l1 level1 lfo1;". Unfortunately, these aren't always
|
||||
present, and I have yet to figure out a foolproof heuristic.</li>
|
||||
</ul>
|
||||
|
||||
<p>I need to set up an index of precisely what attributes are
|
||||
supported on each element. Right now, some elements check their
|
||||
own attributes, whilst others are checked via default checks
|
||||
defined for each attribute independently of the element. Until
|
||||
this is done, you sometimes find that validation services
|
||||
discovering errors unnoticed by Tidy itself.</p>
|
||||
|
||||
<p>Jelks Cabaniss asks: <i>Could Tidy be made to automatically
|
||||
"clean" (FONTs to CSS) if the Strict DOCTYPE is requested? An
|
||||
HTML or XHTML Strict document can't have FONT tags according to
|
||||
the DTDs</i>. Jelks has a bunch of other good ideas such as
|
||||
converting the bgcolor attribute over to CSS.</p>
|
||||
|
||||
<p>Adding an option to select slide transition effects. I would
|
||||
also like to provide an optional feature for sorting attribute
|
||||
values.</p>
|
||||
|
||||
<p>I am having problems with form elements as direct children of
|
||||
tr or table. It is dangerous to create an implicit table cell,
|
||||
and what is needed is a way to move the form element into the
|
||||
next cell. If this can't be done an error needs to be raised
|
||||
since Tidy will be stuck. On a separate note, Tidy is still
|
||||
breaking lines between <img> and </a> which in
|
||||
Netscape shows as an underlined space. It's fine in IE.</p>
|
||||
|
||||
<p>Benjamin Holzman <bah@orientation.com> writes: I'm
|
||||
wrapping tidy (release-date 2000.01.13) in some perl objects
|
||||
(using SWIG), and CharEncoding being a global is a bit of a pain.
|
||||
I was wondering what your thoughts would be on how to fix that.
|
||||
The character encoding is already a property of struct Out; is
|
||||
there any reason why making it part of struct StreamIn as well,
|
||||
and perhaps setting that property in OpenInput, based on the
|
||||
existing CharEncoding variable, wouldn't allow us to move
|
||||
CharEncoding to be local to main?</p>
|
||||
|
||||
<p>Oh, in case you're curious about the API, here's a short
|
||||
script using my wrappers to be an html to xhtml filter:</p>
|
||||
|
||||
<pre>
|
||||
#!/usr/bin/perl
|
||||
|
||||
require tidy;
|
||||
|
||||
my $tidy = Tidy->new(*STDIN);
|
||||
my $document = $tidy->parse;
|
||||
$tidy->as_xhtml(*STDOUT);
|
||||
</pre>
|
||||
|
||||
<p>Rick Parsons would like there to be a new wrap-attributes
|
||||
option that can be used to suppress line wrapping within
|
||||
attributes. There is already a similar option for JavaScript
|
||||
literals.</p>
|
||||
|
||||
<p>Vijay Patil would like tidy -h to display options sorted
|
||||
alphabetically.</p>
|
||||
|
||||
<p>Julian Reschke would like there to be an option to add the
|
||||
xml:space="preserve" attribute to pre elements when outputting
|
||||
xml.</p>
|
||||
|
||||
<p>Armando Asantos would like to use Tidy to produce a list of
|
||||
URLs for images or hypertext links according to a config option.
|
||||
This would be straightforward, but is a lower priority than bug
|
||||
fixes etc.</p>
|
||||
|
||||
<p>Omri Traub would like an option to wrap the contents of style
|
||||
and script elements in CDATA marked sections when converting to
|
||||
XHTML. He is also interested in direct support for 16 bit
|
||||
character file I/O.</p>
|
||||
|
||||
<p>Bertilo Wennergren notes:</p>
|
||||
|
||||
<blockquote>If I configure Tidy to "upgrade to style sheets", it
|
||||
does so for a few things in my main document, but the code thus
|
||||
created get error reports if I feed it back to Tidy. It turns out
|
||||
that Tidy creates extra "class" attributes on tags that already
|
||||
have "class" attributes set. This happens with this page:
|
||||
<http://www.concinnity.se/bertilow/index.htm>.</blockquote>
|
||||
|
||||
<p>Randi Waki notes:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>If a quoted URL attribute value (e.g., href in <a>
|
||||
elements) contains a line break, 13-Jan-2000 Tidy changes the
|
||||
line break to a space while IE and Netscape discard the line
|
||||
break. This can result in a broken link in the tidied
|
||||
document.</p>
|
||||
|
||||
<p>I believe the following change fixes the problem. In lexer.c,
|
||||
insert the following lines before line 2502:</p>
|
||||
|
||||
<pre>
|
||||
/* discard line breaks in quoted URLs */
|
||||
if (c == '\n' && IsUrl(name))
|
||||
continue;
|
||||
|
||||
/* existing line 2502 */ c = ' ';
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>Stephen Reynolds would like Tidy to keep track of whether a
|
||||
comment started on a new line and preserve this in the
|
||||
output.</p>
|
||||
|
||||
<p>Terry Teague says:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>Sorry, I should have been more clear. Part of the problem is
|
||||
the current HelpText() function in localize.c doesn't actually
|
||||
reflect current reality.</p>
|
||||
|
||||
<p>You need to at least add the following line to HelpText()
|
||||
:</p>
|
||||
|
||||
<pre>
|
||||
tidy_out(out, " -version or -v show version\n");
|
||||
</pre>
|
||||
|
||||
<p>And I suppose it should mention the use of the new
|
||||
"--<config options>" type syntax.</p>
|
||||
|
||||
<p>Regards, Terry</p>
|
||||
</blockquote>
|
||||
|
||||
<p>John Russel notes:</p>
|
||||
|
||||
<pre>
|
||||
what i wonder is
|
||||
1] does the specification indicate these are WRONG
|
||||
2] if so why do they pass thru tidy ....
|
||||
is url syntax such a can of worms that it is left to user
|
||||
to check .......
|
||||
|
||||
CASE 1: misuse of slash for folders
|
||||
site had background="pics\fancy.jpg"
|
||||
instead of "pics/fancy.jpg"
|
||||
|
||||
CASE 2: spaces in filename
|
||||
site had href="coin album.html"
|
||||
instead of "coin%20album.html"
|
||||
</pre>
|
||||
|
||||
<p>Andre Stechert would like a way to prevent Tidy from
|
||||
"cleaning" newly declared elements which don't have any content
|
||||
but do have end tags, see his mail of 17th January 2000</p>
|
||||
|
||||
<p>Todd Clark would like to use Tidy with Microsoft's WebClass
|
||||
tags. Unfortunately these include unusual characters in the tag
|
||||
names such as @ which Tidy objects to, for instance:</p>
|
||||
|
||||
<pre>
|
||||
<WC@DOMAINNAME>test.com</WC@DOMAINNAME>
|
||||
</pre>
|
||||
|
||||
<p>Perhaps it makes sense to offer an option to make Tidy less
|
||||
picky about what characters it accepts in tag names. Or perhaps
|
||||
"WebClass: yes".</p>
|
||||
|
||||
<p>Jelks Cabaniss suggests an option to control dropping of empty
|
||||
elements, e.g. according to what attributes they have.</p>
|
||||
|
||||
<p>Paavo Hartikainen writes:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>Tidy always expands '&' to '&' even if I have
|
||||
'quote-ampersand: no' defined in configuration file. This is not
|
||||
a good thing to do for URLs that have '&' characters in them.
|
||||
OS is Debian GNU/Linux 2.1 SPARC. Same thing happens on Alpha.
|
||||
Other architectures I have not tried.</p>
|
||||
|
||||
<p>My configuration looks like this:</p>
|
||||
|
||||
<pre>
|
||||
char-encoding: latin1
|
||||
error-file: ./errors
|
||||
indent-spaces: 2
|
||||
logical-emphasis: yes
|
||||
output-xhtml: yes
|
||||
quiet: no
|
||||
quote-ampersand: no
|
||||
show-warnings: yes
|
||||
tidy-mark: yes
|
||||
wrap: 78
|
||||
wrap-attributes: no
|
||||
write-back: yes
|
||||
keep-time: yes
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>Paul White reports that Tidy isn't recognizing HTML 3.2 when
|
||||
the doctype is "-//W3C//DTD HTML 3.2 Final//EN" (as per the REC),
|
||||
and similarly for HTML 4.01. This would appear to call for a
|
||||
change to the table of names in lexer.c.</p>
|
||||
|
||||
<p>Stuart Hungerford would like Tidy to detect and fix duplicate
|
||||
attributes e.g. multiple class attributes. Celeste Suliin Burris
|
||||
would like Tidy to replace spaces in URLs by %20 as some versions
|
||||
of Netscape "croak big time" on this. Denis Kokarev also wants
|
||||
Tidy to remove duplicate attributes when the values are the same.
|
||||
This apparently stops XSLT from working. Brian Schweitzer notes
|
||||
that Tidy adds a 2nd class attribute rather than merging the
|
||||
classes into a space separated list.</p>
|
||||
|
||||
<p>Bertilo Wennergren writes: Tidy seems not to recognize frame
|
||||
elements with a closing "/". It actually removes them. Try his <a
|
||||
href="http://www.concinnity.se/bertilow/pmeg/pmeg9/k_bazo.htm">example</a>.
|
||||
Tidy can produce XHTML Frameset docs, but when fed them back</p>
|
||||
|
||||
<p>again it cries foul.</p>
|
||||
|
||||
<p>Jose Manuel Cerqueira Esteves notes:</p>
|
||||
|
||||
<pre>
|
||||
I've used `tidy' to convert a few HTML 4.0 files to XHTML 1.0 and noticed
|
||||
a problem when dealing with constructs like
|
||||
|
||||
<small><small>some text</small></small>
|
||||
|
||||
First, `tidy' acts as if the second "<small>" was meant as a closing tag:
|
||||
|
||||
Warning: "<small> is probably intended as </small>"
|
||||
|
||||
Then it trims the resulting empty <small></small>:
|
||||
|
||||
Warning: trimming empty <small>
|
||||
|
||||
And finally both remaining closing tags ("</small>"), now spurious,
|
||||
are removed:
|
||||
|
||||
Warning: discarding unexpected </small>
|
||||
Warning: discarding unexpected </small>
|
||||
|
||||
It would be convenient to have at least some `tidy' option to prevent this
|
||||
from happening (or perhaps some different heuristics?).
|
||||
</pre>
|
||||
|
||||
<p>Robbert Hans Baron would like to see Tidy warning about
|
||||
duplicate attributes and fixing these when the values are
|
||||
identical.</p>
|
||||
|
||||
<p>Jutta Wrage notes that: When parsing HTML 3.2 Pages, tidy
|
||||
doesn't accept textareas in forms correctly. The HTML Reference
|
||||
specification (HTML 3.2 Final) allows: name, rows and cols, but
|
||||
upon seeing these Tidy thinks the document is 4.0.</p>
|
||||
|
||||
<p>Matthew Brealey notes that a heading start tag is coerced to
|
||||
an end heading tag when the end tag is missing. This is
|
||||
deliberate, but perhaps not the best heuristic.</p>
|
||||
|
||||
<p>HIYAMA Masayuki notes that Tidy should set the encoding
|
||||
attribute to match the language encoding, e.g. ?xml version="1.0"
|
||||
encoding="iso-2022-jp"?><.</p>
|
||||
|
||||
<p>Mark Modrall has extended Tidy to support selectively
|
||||
stripping out listed tags and attributes, see his email of March
|
||||
14th.</p>
|
||||
|
||||
<p>Yong Taek Bae notes that with the omit end tags option Tidy
|
||||
omits the body tag even if it has attributes. This is an
|
||||
error.</p>
|
||||
|
||||
<p>Tapio Markula reports that Tidy is incorrectly replacing
|
||||
accented characters in script elements by entities. The script
|
||||
element (in HTML but not XHTML) is CDATA and as such entities
|
||||
won't be expanded. This bug needs to be fixed along with the
|
||||
support for CDATA sections.</p>
|
||||
|
||||
<p>Terrill Bennett reports tidy crashing when producing slides,
|
||||
and when the -i option has been set. He later added the crash
|
||||
occurs when the page doesn't include an h1 element. See
|
||||
Terrill-Bennett-11mar00.txt.</p>
|
||||
|
||||
<p>Stephen Lewis notes that if an <hr> element is present
|
||||
in the head before the title element, then Tidy gets confused and
|
||||
adds in a spurious extra empty title element. This would be
|
||||
avoided if Tidy could move the hr into the body before the body
|
||||
element is encountered. This raises a number of problems for
|
||||
instance working out when to copy in attributes from an explicit
|
||||
body element.</p>
|
||||
|
||||
<p>Carl Osterly would like Tidy to avoid breaking lines before or
|
||||
after the = sign in attribute values when this is practical.
|
||||
Perhaps a simple rule of thumb could be used to decide this?</p>
|
||||
|
||||
<p>Rick H Wesson notes that Tidy crashes on CDATA marked sections
|
||||
when parsing XML.</p>
|
||||
|
||||
<p>Luigi Federici would like an option to set the DTD URI for XML
|
||||
or XHTML.</p>
|
||||
|
||||
<p>Mat Sander notes: If I have php code the indentation behaves
|
||||
strange. Repeated tidying php content and end tag indented one
|
||||
level extra for each time. The result ends up something like
|
||||
this:</p>
|
||||
|
||||
<pre>
|
||||
...
|
||||
<?php
|
||||
$r=0;
|
||||
?<
|
||||
...
|
||||
|
||||
I have the fillowing config file for Tidy:
|
||||
---
|
||||
tidy-mark: no
|
||||
markup: yes
|
||||
wrap: 0
|
||||
indent: auto
|
||||
output-xml: no
|
||||
output-xhtml: yes
|
||||
doctype: loose
|
||||
char-encoding: latin1
|
||||
quote-marks: yes
|
||||
assume-xml-procins: yes
|
||||
word-2000: yes
|
||||
clean: yes
|
||||
logical-emphasis: yes
|
||||
drop-empty-paras: yes
|
||||
enclose-text: yes
|
||||
fix-bad-comments: yes
|
||||
alt-text: .
|
||||
write-back: bool
|
||||
keep-time: yes
|
||||
show-warnings: no
|
||||
quiet: yes
|
||||
split: no
|
||||
---
|
||||
|
||||
Best Regards,
|
||||
Mats-Olof Sander
|
||||
|
||||
</pre>
|
||||
|
||||
<p>Don Hasson notes that if you make a mistake and leave off the
|
||||
ending "/" in the <title> tag, tidy will generate an extra
|
||||
set of <title>s.</p>
|
||||
|
||||
<p>Example:</p>
|
||||
|
||||
<pre>
|
||||
<html>
|
||||
<head><title>No end here<title></head>
|
||||
<body>
|
||||
Empty
|
||||
</body>
|
||||
</html>
|
||||
|
||||
</pre>
|
||||
|
||||
<p>produces this:</p>
|
||||
|
||||
<pre>
|
||||
<html>
|
||||
<head>
|
||||
<title>No end here</title>
|
||||
<title></title>
|
||||
</head>
|
||||
<body>
|
||||
Empty
|
||||
</body>
|
||||
</html>
|
||||
|
||||
</pre>
|
||||
|
||||
<p>Jeff Wilkinson would like the HTML Tidy page to include
|
||||
internal anchors so that he can link directly to the appropriate
|
||||
sections.</p>
|
||||
|
||||
<p>Peter Vince would like to be able to clean presentation
|
||||
attributes on the body element, as well as translating b and i to
|
||||
span.</p>
|
||||
|
||||
<p>Dave Bryan and Mathew Brealey would like there to be a way to
|
||||
suppress the default handling of inline elements in favor of
|
||||
simply inserting the appropriate end tag when encountering an
|
||||
element that isn't allowed in an inline context. The default
|
||||
behavior replicates the rendering on existing browsers but can
|
||||
cause problems for hand editors.</p>
|
||||
|
||||
<p>Dave Bryan notes that tidy isn't updating the column position
|
||||
when parsing attributes.</p>
|
||||
|
||||
<p>Can Tidy track when a line break occurs after a PI or comment
|
||||
and reproduce this in the output? This idea occurred to me after
|
||||
reading a comment from Brad Stowers.</p>
|
||||
|
||||
<p>One interesting suggestion is to make some of Tidy's rules of
|
||||
thumb sensitive to the program that generated the markup as
|
||||
indicated by the meta element. This would allow for greater
|
||||
robustness in how the rules operate.</p>
|
||||
|
||||
<p>Dave Bryan would like the quiet mode to be tweaked to suppress
|
||||
the general info at the end of the report. see
|
||||
Dave-Bryan-24mar00.txt.</p>
|
||||
|
||||
<p>Erik Rossen would like an option to suppress line wrap within
|
||||
tags, so that the tag is always on the same line regardless of
|
||||
the number and length of the attributes.</p>
|
||||
|
||||
<p>Dan Satria suggest that the clean mechanism check to see if
|
||||
there are any existing matching style rules before adding new
|
||||
ones.</p>
|
||||
|
||||
<p>Zoltan Hawryluk suggests mapping the Netscape layer tag into
|
||||
the equivalent CSS positioning syntax.</p>
|
||||
|
||||
<p>Jim Walker says Tidy doesn't correctly report errors such as
|
||||
<tt></</head></tt>.</p>
|
||||
|
||||
<p>Tidy's slide feature: see Johannes-Poutre-12jul00.txt</p>
|
||||
|
||||
<p>Carole Mah suggests Tidy should recover from multiple class
|
||||
attributes on the same element.</p>
|
||||
|
||||
<h2>Other ideas</h2>
|
||||
|
||||
<ul>
|
||||
<li>Recursion through subdirectories, so you can fix up your
|
||||
entire web site at one go. This assumes I can find a way that is
|
||||
portable across a wide range of platforms!</li>
|
||||
|
||||
<li>Support for W3C's <a
|
||||
href="http://www.w3.org/TR/REC-DOM-Level-1/">Document Object
|
||||
Model</a> (DOM) level one.</li>
|
||||
|
||||
<li>Full validation of all attribute values.</li>
|
||||
|
||||
<li>Mapping Unicode bidi control characters to HTML tags.</li>
|
||||
|
||||
<li>Full support for parsing XML (still somewhat limited).</li>
|
||||
|
||||
<li>How to say which XML elements should be printed
|
||||
"inline".</li>
|
||||
|
||||
<li>Acting on the XML encoding attribute, e.g.
|
||||
<?xml encoding="iso-8859-1"></li>
|
||||
|
||||
<li>Improved mapping from HTML presentation attributes/elements
|
||||
to CSS.</li>
|
||||
|
||||
<li>Improved support for <a
|
||||
href="http://java.sun.com/products/jsp/">JSP</a> (Java Server
|
||||
pages)</li>
|
||||
|
||||
<li>Ugly print option which removes all optional whitespace</li>
|
||||
</ul>
|
||||
</body>
|
||||
</html>
|
||||
|
File diff suppressed because it is too large
Load diff
BIN
htmldoc/tidy.gif
BIN
htmldoc/tidy.gif
Binary file not shown.
Before Width: | Height: | Size: 244 B |
562
index.html
Normal file
562
index.html
Normal file
|
@ -0,0 +1,562 @@
|
|||
<!doctype html>
|
||||
<meta charset=utf-8>
|
||||
<title>HTML Tidy for HTML5 (experimental)</title>
|
||||
<style type="text/css">
|
||||
html {
|
||||
background: #DDE5D9 url() repeat 0 0;
|
||||
font-family: "Lucida Sans Unicode", "Lucida Sans", verdana, arial, helvetica;
|
||||
}
|
||||
body {
|
||||
border: solid 1px #CED4CA;
|
||||
background-color: #FFF;
|
||||
padding: 4px 40px 40px 40px;
|
||||
margin: 20px 20px 20px 20px;
|
||||
padding-right: 20%;
|
||||
}
|
||||
h1, h2 {
|
||||
color: #0B5B9D;
|
||||
}
|
||||
h1 {
|
||||
font-size: 39px;
|
||||
font-weight: normal;
|
||||
vertical-align: top;
|
||||
margin-bottom: 0px;
|
||||
}
|
||||
a {
|
||||
text-decoration: none;
|
||||
color: #0B5B9D;
|
||||
padding: 2px;
|
||||
}
|
||||
|
||||
a:hover {
|
||||
text-decoration: none;
|
||||
background-color: #0B5B9D;
|
||||
color: white;
|
||||
}
|
||||
a:active {
|
||||
text-decoration: none;
|
||||
background-color: white;
|
||||
color: black;
|
||||
}
|
||||
#toc {
|
||||
position: fixed;
|
||||
top: 10px;
|
||||
right: 10px;
|
||||
border: 2px solid #0B5B9D;
|
||||
background: rgba(255,255,255,.9);
|
||||
padding: 15px;
|
||||
z-index: 999;
|
||||
max-height: 400px;
|
||||
overflow: auto;
|
||||
font-size: 11px;
|
||||
font-family: Verdana, sans-serif;
|
||||
}
|
||||
#toc-button {
|
||||
position:fixed;
|
||||
top:10px;
|
||||
right:10px;
|
||||
background:transparent;
|
||||
padding:15px;
|
||||
z-index:999;
|
||||
max-height:400px;
|
||||
overflow:auto;
|
||||
font-size:11px;
|
||||
font-family:Verdana, sans-serif;
|
||||
}
|
||||
#toc .button,
|
||||
#toc-button .button {
|
||||
float: right;
|
||||
margin: 0 0 5px 5px;
|
||||
padding: 5px;
|
||||
border: 1px #008 solid;
|
||||
color:#00f;
|
||||
background-color:#ccf;
|
||||
}
|
||||
#toc ol {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
font-size: 11px;
|
||||
font-family: Verdana, sans-serif;
|
||||
}
|
||||
#toc li {
|
||||
list-style: decimal outside;
|
||||
margin-left: 20px;
|
||||
font-size: 11px;
|
||||
font-family: Verdana, sans-serif;
|
||||
}
|
||||
#toc li a {
|
||||
font-size: 11px;
|
||||
font-family: Verdana, sans-serif;
|
||||
}
|
||||
.hide {
|
||||
display: none;
|
||||
}
|
||||
.show {
|
||||
display: block;
|
||||
}
|
||||
code { color: green; font-weight: bold; }
|
||||
pre { color: green; font-weight: bold; font-family: monospace}
|
||||
em { font-style: italic; color: rgb(0, 0, 153) }
|
||||
:link { color: rgb(0, 0, 153) }
|
||||
:visited { color: rgb(153, 0, 153) }
|
||||
</style>
|
||||
|
||||
<h1 id=intro>HTML Tidy for HTML5 (experimental)</h1>
|
||||
<p>This page documents the experimental HTML5 fork of HTML Tidy available
|
||||
at
|
||||
<a href="https://github.com/w3c/tidy-html5">https://github.com/w3c/tidy-html5</a>.
|
||||
|
||||
<p>File bug reports and enhancement requests at
|
||||
<a href="https://github.com/w3c/tidy-html5/issues">https://github.com/w3c/tidy-html5/issues</a>.</p>
|
||||
|
||||
<p>The W3C public mailing list for HTML Tidy discussion is
|
||||
<b>html-tidy@w3.org</b> (<a href= "http://lists.w3.org/Archives/Public/html-tidy/">list archive</a>).
|
||||
|
||||
<p>For more information on HTML5:</p>
|
||||
<ul>
|
||||
<li>
|
||||
<a href="http://dev.w3.org/html5/spec-author-view">HTML: Edition for Web Authors</a> (the latest HTML specification)
|
||||
<li>
|
||||
<a href="http://dev.w3.org/html5/markup/">HTML: The Markup Language</a> (an HTML language reference)
|
||||
</ul>
|
||||
<p>
|
||||
Validate your HTML documents using the
|
||||
<a href="http://validator.w3.org/nu/">W3C Nu Markup Validator</a>.
|
||||
|
||||
<h2 id=what-tidy-does>What Tidy does</h2>
|
||||
<p>Tidy corrects and cleans up HTML content by fixing markup errors.
|
||||
Here are a few examples:
|
||||
<ul>
|
||||
<li><b>Mismatched end tags:</b>
|
||||
<pre>
|
||||
<h2>subheading</h3>
|
||||
</pre>
|
||||
<p>…is converted to:</p>
|
||||
<pre>
|
||||
<h2>subheading</h2>
|
||||
</pre></li>
|
||||
<li><b>Misnested tags:</b>
|
||||
<pre>
|
||||
<p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
|
||||
</pre>
|
||||
<p>…is converted to:</p>
|
||||
<pre>
|
||||
<p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?
|
||||
</pre></li>
|
||||
<li><b>Missing end tags:</b>
|
||||
<pre>
|
||||
<h1>heading
|
||||
<h2>subheading</h2>
|
||||
</pre>
|
||||
<p>…is converted to:</p>
|
||||
<pre>
|
||||
<h1>heading</h1>
|
||||
<h2>subheading</h2>
|
||||
</pre>
|
||||
…and
|
||||
<pre>
|
||||
<h1><i>italic heading</h1>
|
||||
</pre>
|
||||
<p>…is converted to:</p>
|
||||
<pre>
|
||||
<h1><i>italic heading</i></h1>
|
||||
</pre></li>
|
||||
<li><b>Mixed-up tags</b>
|
||||
<pre>
|
||||
<i><h1>heading</h1></i>
|
||||
<p>new paragraph <b>bold text
|
||||
<p>some more bold text
|
||||
</pre>
|
||||
<p>…is converted to:</p>
|
||||
<pre>
|
||||
<h1><i>heading</i></h1>
|
||||
<p>new paragraph <b>bold text</b>
|
||||
<p><b>some more bold text</b>
|
||||
</pre></li>
|
||||
<li><b>Tag in the wrong place:</b>
|
||||
<pre>
|
||||
<h1><hr>heading</h1>
|
||||
<h2>sub<hr>heading</h2>
|
||||
</pre>
|
||||
<p>…is converted to:</p>
|
||||
<pre>
|
||||
<hr>
|
||||
<h1>heading</h1>
|
||||
<h2>sub</h2>
|
||||
<hr>
|
||||
<h2>heading</h2>
|
||||
</pre></li>
|
||||
<li><b>Missing "/" in end tags:</b>
|
||||
<pre>
|
||||
<a href="#refs">References<a>
|
||||
</pre>
|
||||
<p>…is converted to:</p>
|
||||
<pre>
|
||||
<a href="#refs">References</a>
|
||||
</pre></li>
|
||||
<li><b>List markup with missing tags:</b>
|
||||
<pre>
|
||||
<body>
|
||||
<li>1st list item
|
||||
<li>2nd list item
|
||||
</pre>
|
||||
<p>…is converted to:</p>
|
||||
<pre>
|
||||
<body>
|
||||
<ul>
|
||||
<li>1st list item</li>
|
||||
<li>2nd list item</li>
|
||||
</ul>
|
||||
</pre></li>
|
||||
<li><b>Missing quotation marks around attribute values</b>
|
||||
<p>Tidy inserts quotation marks around all attribute values for you. It
|
||||
can also detect when you have forgotten the closing quotation mark,
|
||||
although this is something you will have to fix yourself.</p>
|
||||
</li>
|
||||
<li><b>Unknown/proprietary attributes</b>
|
||||
<p>Tidy has a comprehensive knowledge of the attributes defined in HTML5.
|
||||
That often allows you to spot where you have mis-typed an attribute.
|
||||
</li>
|
||||
<li><b>Tags lacking a terminating ">"</b>
|
||||
<p>This is something you then have to fix yourself as Tidy cannot
|
||||
determine where the ">" was meant to be inserted.</p>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
<h2 id="help">How to run Tidy from the command line</h2>
|
||||
<p>This is the syntax for invoking Tidy from the command line:
|
||||
<pre>
|
||||
<code>tidy <em>[[options] filename]*</em></code>
|
||||
</pre>
|
||||
<p>
|
||||
Tidy defaults to reading from standard input, so if you run Tidy without
|
||||
specifying the <code><em>filename</em></code> argument, it will just sit
|
||||
there waiting for input to read.
|
||||
And Tidy defaults to writing to standard output. So you can pipe output
|
||||
from Tidy to other programs, as well as pipe output from other programs to
|
||||
Tidy. You can page through the output from Tidy by piping it to a pager:</p>
|
||||
<pre>
|
||||
tidy file.html | less
|
||||
</pre>
|
||||
<p>
|
||||
To have Tidy write its output to a file instead, either use the
|
||||
<code>-o <em>filename</em></code> or <code>-output <em>filename</em></code>
|
||||
option, or redirect standard output to the file; for example:
|
||||
<pre>
|
||||
tidy -o output.html index.html
|
||||
tidy index.html > output.html
|
||||
</pre>
|
||||
<p>Both of those run tidy on the file <b>index.html</b> and write the
|
||||
output to the file <b>output.html</b>, while writing any error messages to
|
||||
standard error.
|
||||
<p>
|
||||
Tidy defaults to writing its error messages to standard error (that is, to
|
||||
the console where you’re running Tidy). To page through the error messages,
|
||||
along with the output, redirect standard error to standard output, and pipe
|
||||
it to your pager:
|
||||
<pre>
|
||||
tidy index.html 2>&1 | less
|
||||
</pre>
|
||||
<p>
|
||||
To have Tidy write the errors to a file instead, either use the
|
||||
<code>-f <em>filename</em></code> or <code>-file <em>filename</em></code>
|
||||
option, or redirect standard error to a file:</p>
|
||||
<pre>
|
||||
tidy -o output.html -f errs.txt index.html
|
||||
tidy index.html > output.html 2> errs.txt
|
||||
</pre>
|
||||
<p>Both of those run tidy on the file <b>index.html</b> and write the
|
||||
output to the file <b>output.html</b>, while writing any error messages to
|
||||
the file <b>errs.txt</b>.
|
||||
<p>
|
||||
Writing the error messages to a file is especially useful if the file you
|
||||
are checking has many errors; reading them from a file instead of the
|
||||
console or pager can make it easier to review them.
|
||||
<p>You can use the or <code>-m</code> or <code>-modify</code> option to
|
||||
modify (in-place) the contents of the input file you are checking; that is,
|
||||
to overwrite those contents with the output from Tidy. Example:
|
||||
<pre>
|
||||
tidy -f errs.txt -m index.html
|
||||
</pre>
|
||||
<p>That runs tidy on the file <b>index.html</b>, modifying it in place
|
||||
and writing the error messages to the file <b>errs.txt</b>.
|
||||
<p>
|
||||
<b>Caution:</b> If you use the -m option, you should first save a copy of your file.
|
||||
<h2 id=options>Options and configuration settings</h2>
|
||||
<p>To get a list of available options, use:</p>
|
||||
<pre>
|
||||
tidy -help
|
||||
</pre>
|
||||
<p>To get a list of all configuration settings, use:</p>
|
||||
<pre>
|
||||
tidy -help-config
|
||||
</pre>
|
||||
<p>To read the help output a page at time, pipe it to a pager:
|
||||
<pre>
|
||||
tidy -help | less
|
||||
tidy -help-config | less
|
||||
</pre>
|
||||
<p>Single-letter options other than -f may be combined; for example:
|
||||
<pre>
|
||||
tidy -f errs.txt -imu foo.html
|
||||
</pre>
|
||||
|
||||
<h2 id="config">Using a config file</h2>
|
||||
<p>The most convenient way to configure Tidy is by using separate
|
||||
config file.
|
||||
Assuming you have created a
|
||||
Tidy config file named <b>config.txt</b> (the name doesn't matter), you can
|
||||
instruct Tidy to use it via the command line option
|
||||
<code>-config config.txt</code>; for example:
|
||||
<pre>
|
||||
tidy -config config.txt file1.html file2.html
|
||||
</pre>
|
||||
<p>Alternatively, you can name the default config file via the
|
||||
environment variable named <b>HTML_TIDY</b>, the value of which is
|
||||
the absolute path for the config file.
|
||||
<p>You can also set config options on the command line by preceding
|
||||
the name of the option immediately (no intervening space) with the string "<code>--</code>";
|
||||
for example:</p>
|
||||
<pre>
|
||||
tidy --break-before-br true --show-warnings false
|
||||
</pre>
|
||||
<p>You can find documentation for full set of configuration options
|
||||
on the
|
||||
<a href= "quickref.html">Quick Reference</a>
|
||||
page.
|
||||
|
||||
<h2 id=sample-config>Sample config file</h2>
|
||||
<p>The following is an example of a Tidy config file.</p>
|
||||
<pre>
|
||||
// sample config file for HTML tidy
|
||||
indent: auto
|
||||
indent-spaces: 2
|
||||
wrap: 72
|
||||
markup: yes
|
||||
output-xml: no
|
||||
input-xml: no
|
||||
show-warnings: yes
|
||||
numeric-entities: yes
|
||||
quote-marks: yes
|
||||
quote-nbsp: yes
|
||||
quote-ampersand: no
|
||||
break-before-br: no
|
||||
uppercase-tags: no
|
||||
uppercase-attributes: no
|
||||
char-encoding: latin1
|
||||
new-inline-tags: cfif, cfelse, math, mroot,
|
||||
mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
|
||||
munder, mover, mmultiscripts, msup, msub, mtext,
|
||||
mprescripts, mtable, mtr, mtd, mth
|
||||
new-blocklevel-tags: cfoutput, cfquery
|
||||
new-empty-tags: cfelse
|
||||
</pre>
|
||||
|
||||
<h2 id=indenting>Indenting output for readability</h2>
|
||||
<p>Indenting the source markup of an HTML document makes the markup easier
|
||||
to read. Tidy can indent the markup for an HTML document while recognizing
|
||||
elements whose contents should not be indented. In the example below, Tidy
|
||||
indents the output while preserving the formatting of the <pre>
|
||||
element:</p>
|
||||
<p>Input:</p>
|
||||
<pre>
|
||||
<html>
|
||||
<head>
|
||||
<title>Test document</title>
|
||||
</head>
|
||||
<body>
|
||||
<p>This example shows how Tidy can indent output while preserving
|
||||
formatting of particular elements.</p>
|
||||
|
||||
<pre>This is
|
||||
<em>genuine
|
||||
preformatted</em>
|
||||
text
|
||||
</pre>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
</pre>
|
||||
<p>Output:</p>
|
||||
<pre>
|
||||
<html>
|
||||
<head>
|
||||
<title>Test document</title>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
<p>This example shows how Tidy can indent output while preserving
|
||||
formatting of particular elements.</p>
|
||||
<pre>
|
||||
This is
|
||||
<em>genuine
|
||||
preformatted</em>
|
||||
text
|
||||
</pre>
|
||||
</body>
|
||||
</html>
|
||||
</pre>
|
||||
<p>Tidy’s indenting behavior is not perfect and can sometimes cause your
|
||||
output to be rendered by browsers in a different way than the input.
|
||||
You can avoid unexpected indenting-related rendering problems by setting
|
||||
<code>indent: no</code> or <code>indent: auto</code> in a config file.</p>
|
||||
|
||||
<h2 id=preserve-indenting>Preserving original indenting not possible</h2>
|
||||
<p>Tidy is not capable of preserving the original indenting of the markup
|
||||
from the input it receives. That’s because Tidy starts by building a clean
|
||||
parse tree from the input, and that parse tree doesn’t contain any
|
||||
information about the original indenting. Tidy then pretty-prints the parse
|
||||
tree using the current config settings. Trying to preserve the original
|
||||
indenting from the input would interact badly with the repair operations
|
||||
needed to build a clean parse tree, and would considerably complicate the
|
||||
code.</p>
|
||||
|
||||
<h2 id=encodings>Encodings and character references</h2>
|
||||
<p>
|
||||
Tidy defaults to assuming you want output to be encoded in UTF-8.
|
||||
But Tidy offers you a choice of other character encodings: US ASCII, ISO
|
||||
Latin-1, and the ISO 2022 family of 7 bit encodings.
|
||||
<p>
|
||||
Tidy doesn't yet recognize the use of the HTML <meta> element for
|
||||
specifying the character encoding.</p>
|
||||
<p>
|
||||
The full set of HTML character references are defined. Cleaned-up output
|
||||
uses named character references for characters when appropriate. Otherwise,
|
||||
characters outside the normal range are output as numeric character
|
||||
references.
|
||||
|
||||
<h2 id=accessibility>Accessibility</h2>
|
||||
<p>Tidy offers advice on potential accessibility problems for people using
|
||||
non-graphical browsers.
|
||||
|
||||
<h2 id=presentational-markup>Cleaning up presentational markup</h2>
|
||||
<p>Some tools generate HTML with presentational elements such as <font>,
|
||||
<nobr>, and <center>.
|
||||
Tidy's <code>-clean</code> option will replace those elements with CSS style
|
||||
properties.
|
||||
<p>Some HTML documents rely on the presentational effects of <p> start
|
||||
tags that are not followed by any content. Tidy deletes such <p> tags
|
||||
(as well as any headings that don’t have content). So do not use <p>
|
||||
tags simply for adding vertical whitespace; instead use CSS, or the
|
||||
<br> element. However, note that Tidy won’t discard <p> tags that
|
||||
are followed by any nonbreaking space (that is, the &nbsp; named
|
||||
character reference).
|
||||
|
||||
<h2 id=new-tags>Teaching Tidy about new tags</h2>
|
||||
<p>You can teach Tidy about new tags by declaring them in the
|
||||
configuration file, the syntax is:</p>
|
||||
<pre>
|
||||
new-inline-tags: <em>tag1, tag2, tag3</em>
|
||||
new-empty-tags: <em>tag1, tag2, tag3</em>
|
||||
new-blocklevel-tags: <em>tag1, tag2, tag3</em>
|
||||
new-pre-tags: <em>tag1, tag2, tag3</em>
|
||||
</pre>
|
||||
<p>The same tag can be defined as empty and as inline or as empty
|
||||
and as block.</p>
|
||||
<p>These declarations can be combined to define a new empty
|
||||
inline or empty block element. But you are not advised to declare
|
||||
tags as being both inline and block.</p>
|
||||
<p>Note that the new tags can only appear where Tidy expects inline
|
||||
or block-level tags respectively. That means you can’t place
|
||||
new tags within the document head or other contexts with restricted
|
||||
content models.
|
||||
|
||||
<h2 id=php-asp-jste>Ignoring PHP, ASP, and JSTE instructions</h2>
|
||||
<p>Tidy will gracefully ignore many cases of PHP, ASP, and JSTE
|
||||
instructions within element content and as replacements for attributes,
|
||||
and preserve them as-is in output; for example:</p>
|
||||
<pre>
|
||||
<option <% if rsSchool.Fields("ID").Value
|
||||
= session("sessSchoolID")
|
||||
then Response.Write("selected") %>
|
||||
value='<%=rsSchool.Fields("ID").Value%>'>
|
||||
<%=rsSchool.Fields("Name").Value%>
|
||||
(<%=rsSchool.Fields("ID").Value%>)
|
||||
</option>
|
||||
</pre>
|
||||
<p>But note that Tidy may report missing attributes when those are “hidden”
|
||||
within the PHP, ASP, or JSTE code. If you use PHP, ASP, or JSTE code to
|
||||
create a start tag, but place the end tag explicitly in the HTML markup, Tidy
|
||||
won’t be able to match them up, and will delete the end tag. So in that
|
||||
case you are advised to make the start tag explicit and to use PHP, ASP, or
|
||||
JSTE code for just the attributes; for example:</p>
|
||||
<pre>
|
||||
<a href="<%=random.site()%>">do you feel lucky?</a>
|
||||
</pre>
|
||||
<p>
|
||||
Tidy can also get things wrong if the PHP, ASP, or JSTE code includes
|
||||
quotation marks; for example:
|
||||
</p>
|
||||
<pre>
|
||||
value="<%=rsSchool.Fields("ID").Value%>"
|
||||
</pre>
|
||||
<p>Tidy will see the quotation mark preceding <i>ID</i> as ending the
|
||||
attribute value, and proceed to complain about what follows.
|
||||
<p>Tidy allows you to control whether line wrapping on spaces within
|
||||
PHP, ASP, and JSTE
|
||||
instructions is enabled; see the <b>wrap-php</b>, <b>wrap-asp</b>,
|
||||
and <b>wrap-jste</b> config options.</p>
|
||||
|
||||
<h2 id=xml>Correcting well-formedness errors in XML markup</h2>
|
||||
<p>Tidy can help you to correct well-formedness errors in XML markup. Tidy
|
||||
doesn't yet recognize all XML features, though; for example, it doesn't
|
||||
understand CDATA sections or DTD subsets.</p>
|
||||
|
||||
<h2 id="scripts">Using Tidy from scripts</h2>
|
||||
<p>If you want to run Tidy from a Perl or other scripting language
|
||||
you may find it of value to inspect the result returned by Tidy
|
||||
when it exits: 0 if everything is fine, 1 if there were warnings
|
||||
and 2 if there were errors. This is an example using Perl:</p>
|
||||
<pre>
|
||||
if (close(TIDY) == 0) {
|
||||
my $exitcode = $? >> 8;
|
||||
if ($exitcode == 1) {
|
||||
printf STDERR "tidy issued warning messages\n";
|
||||
} elsif ($exitcode == 2) {
|
||||
printf STDERR "tidy issued error messages\n";
|
||||
} else {
|
||||
die "tidy exited with code: $exitcode\n";
|
||||
}
|
||||
} else {
|
||||
printf STDERR "tidy detected no errors\n";
|
||||
}
|
||||
</pre>
|
||||
|
||||
<h2 id="implementation">Source code</h2>
|
||||
<p>The source code for the experimental HTML5 fork of Tidy can be found at
|
||||
<a href="https://github.com/w3c/tidy-html5">https://github.com/w3c/tidy-html5</a>.
|
||||
|
||||
<h2 id=acks>Acknowledgements</h2>
|
||||
<p>Dave Raggett has a list of
|
||||
<a href="http://www.w3.org/People/Raggett/tidy/#acks">Acknowledgements</a>
|
||||
for people who made suggestions or reported bugs for the
|
||||
original version of Tidy.
|
||||
|
||||
<div id=toc-button style="">
|
||||
<a class=button href="
|
||||
javascript:document.getElementById('toc').className = 'show';
|
||||
document.getElementById('toc-button').className = 'hide';">Show TOC</a>
|
||||
</div>
|
||||
<div id=toc class=hide>
|
||||
<a class=button href="
|
||||
javascript:document.getElementById('toc').className = 'hide';
|
||||
document.getElementById('toc-button').className = 'show';">Close</a>
|
||||
<ol>
|
||||
<li><a href="#what-tidy-does">What Tidy does</a>
|
||||
<li><a href="#help">How to run Tidy from the command line</a>
|
||||
<li><a href="#options">Options and configuration settings</a>
|
||||
<li><a href="#config">Using a config file</a>
|
||||
<li><a href="#sample-config">Sample config file</a>
|
||||
<li><a href="#indenting">Indenting output for readability</a>
|
||||
<li><a href="#preserve-indenting">Preserving original indenting not possible</a>
|
||||
<li><a href="#encodings">Encodings and character references</a>
|
||||
<li><a href="#accessibility">Accessibility</a>
|
||||
<li><a href="#presentational-markup">Cleaning up presentational markup</a>
|
||||
<li><a href="#new-tags">Teaching Tidy about new tags</a>
|
||||
<li><a href="#php-asp-jste">Ignoring PHP, ASP, and JSTE instructions</a>
|
||||
<li><a href="#xml">Correcting well-formedness errors in XML markup</a>
|
||||
<li><a href="#scripts">Using Tidy from scripts</a>
|
||||
<li><a href="#implementation">Source code</a>
|
||||
<li><a href="#acks">Acknowledgements</a>
|
||||
</ol>
|
||||
</div>
|
Loading…
Reference in a new issue