Updated the docs.
This commit is contained in:
parent
701a17400a
commit
db464df7d9
File diff suppressed because it is too large
Load diff
Binary file not shown.
Before Width: | Height: | Size: 1.3 KiB |
300
htmldoc/faq.html
300
htmldoc/faq.html
|
@ -1,300 +0,0 @@
|
||||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
||||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
||||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
||||||
<head>
|
|
||||||
<meta name="generator" content=
|
|
||||||
"HTML Tidy for Mac OS X (vers 1st June 2003), see www.w3.org" />
|
|
||||||
<link type="text/css" rel="stylesheet" href="tidy.css" />
|
|
||||||
<title>HTML Tidy - Frequently Asked Questions</title>
|
|
||||||
<style type="text/css">
|
|
||||||
code { font-weight: bold; }
|
|
||||||
</style>
|
|
||||||
</head>
|
|
||||||
<body>
|
|
||||||
<h1>HTML Tidy - Frequently Asked Questions</h1>
|
|
||||||
|
|
||||||
<h2>Overview</h2>
|
|
||||||
|
|
||||||
<p class="abstract">Certain questions about Tidy come up on a
|
|
||||||
regular basis. These are some that have been culled from postings
|
|
||||||
to the html-tidy@w3.org and tidy-develop@lists.sourceforge.net
|
|
||||||
mailing lists. If you don't see your question addressed here, see
|
|
||||||
<a href="#support">How To Get Support</a> below.</p>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li><a href="#what-now">What Now?</a></li>
|
|
||||||
|
|
||||||
<li><a href="#support">How to Get Support?</a></li>
|
|
||||||
|
|
||||||
<li><a href="#bug">How to Submit A Bug Report</a></li>
|
|
||||||
|
|
||||||
<li><a href="#feature">How to Submit A Feature Request</a></li>
|
|
||||||
|
|
||||||
<li><a href="#layout">How Do I Control the Output Layout?</a></li>
|
|
||||||
|
|
||||||
<li><a href="#version">What Version of Tidy Should I Use?</a></li>
|
|
||||||
|
|
||||||
<li><a href="#regression">How Do I Run A Regression Test?</a></li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<hr />
|
|
||||||
<dl>
|
|
||||||
<dt><a name="what-now" id="what-now"></a>What Now?</dt>
|
|
||||||
|
|
||||||
<dd><p>If you have a popup screen that reads as follows:
|
|
||||||
<pre>
|
|
||||||
HTML Tidy for Windows <vers 1st August 2002; built on Aug 8 2002, at 15:41:13>
|
|
||||||
Parsing Console input <stdin>
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>and do not know what to do next, read on.</p>
|
|
||||||
|
|
||||||
<p>Tidy is waiting for your HTML to come in, so it can parse it.
|
|
||||||
Tidy is fundamentally a tool that reads in HTML cleans it up and
|
|
||||||
writes it out again. It was developed as a program you run from the
|
|
||||||
console prompt, but there are GUI encapsulations available, e.g.
|
|
||||||
HTML-Kit, which you might prefer.</p>
|
|
||||||
|
|
||||||
<p>If you are using Windows, the first step is to unzip the zip file
|
|
||||||
and place the tidy.exe file in a folder somewhere on your executables
|
|
||||||
path. You may also want to set up a config file to save having to type
|
|
||||||
lots of options each time you run Tidy. From the console prompt you can
|
|
||||||
run Tidy like this:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
C> tidy -m mywebpage.html
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>In this case, the <code>-m</code> option requests Tidy to write
|
|
||||||
the tidied file back to the same filename as it read from
|
|
||||||
(mywebpage.html). Tidy will give you a breakdown of the problems it
|
|
||||||
found and the version of HTML the file appears to be using.</p>
|
|
||||||
|
|
||||||
<p>To get a listing of Tidy command line options, just type
|
|
||||||
<code>tidy -?</code>. To see a listing on configuration options,
|
|
||||||
try <code>tidy -help-config</code>. To get more info on the
|
|
||||||
config options, see the <a
|
|
||||||
href="http://tidy.sourceforge.net/docs/quickref.html">Quick Reference</a>.</p>
|
|
||||||
|
|
||||||
<p>See also Dave Raggett's <a href="http://tidy.sourceforge.net/docs/Overview.html#help">User Guide</a>.</p>
|
|
||||||
|
|
||||||
<p>If you're not comfortable with the DOS command line, you should
|
|
||||||
try one of the <a href="http://tidy.sourceforge.net/#tidylibapps">GUI
|
|
||||||
Applications</a>.</p>
|
|
||||||
</dd>
|
|
||||||
|
|
||||||
<dt><a name="support" id="support"></a>How To Get Support</dt>
|
|
||||||
|
|
||||||
<dd>
|
|
||||||
<p>For general HTML Tidy support, the original mailing list
|
|
||||||
html-tidy@w3.org is best. Sometimes developers are the last to
|
|
||||||
know... Also, this list covers both Java and C versions, not to
|
|
||||||
mention various value-added products such as GUI front ends, Perl
|
|
||||||
and Python integration, etc. If you don't get a response after a
|
|
||||||
couple tries or if you have a bug fix, bump it over to the
|
|
||||||
developer list at tidy-develop@lists.sourceforge.net. It's not a
|
|
||||||
hard line, but that is the general arrangement.</p>
|
|
||||||
</dd>
|
|
||||||
|
|
||||||
<dt><a name="bug" id="bug"></a>How to Submit A Bug Report</dt>
|
|
||||||
|
|
||||||
<dd>
|
|
||||||
<p>You are encouraged to report bugs you found to the Tidy
|
|
||||||
developer team. Tidy's quality depends on your feedback. You can
|
|
||||||
either file your bug report in the Sourceforge <a
|
|
||||||
href="http://sourceforge.net/tracker/?func=add&group_id=27659&atid=390963">
|
|
||||||
bug tracker</a> for HTML Tidy (<em>recommended</em>) or send a mail
|
|
||||||
to the mailing list at html-tidy@w3.org. Note you do <em>not</em>
|
|
||||||
have to have a Sourceforge account in order to file bug reports, or
|
|
||||||
be subscribed to html-tidy@w3.org in order to post messages to the
|
|
||||||
list.</p>
|
|
||||||
|
|
||||||
<p>Prior to submitting a bug report, please check that the bug is
|
|
||||||
not already known. Many are. If you are not sure, just ask. If it
|
|
||||||
is new bug, make sure to include at least the following information
|
|
||||||
in your report:</p>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li>A desciption of what you think went wrong.</li>
|
|
||||||
|
|
||||||
<li>The HTML Tidy version (find it out by running <code>tidy
|
|
||||||
-v</code>) and operating system you are running.</li>
|
|
||||||
|
|
||||||
<li>The input, that exposes the bug.<br />
|
|
||||||
A small HTML document that reproduces the problem is best.</li>
|
|
||||||
|
|
||||||
<li>The configuration options you've used. Command line options
|
|
||||||
like<br />
|
|
||||||
<code>-asxml</code>, configuration files, etc. You may use
|
|
||||||
<code>tidy -show-config</code> to get an overview of the active
|
|
||||||
Tidy settings.</li>
|
|
||||||
|
|
||||||
<li>Your e-mail address for further questions and comments.</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<p>These information are necessary to reproduce whatever is
|
|
||||||
failing, without them we cannot help you. Additional information -
|
|
||||||
and patches - are very welcome!</p>
|
|
||||||
|
|
||||||
<p><em>Please include only one bug per report.</em> Reports with
|
|
||||||
multiple bugs are less easy to track and some bugs may get
|
|
||||||
missed.</p>
|
|
||||||
</dd>
|
|
||||||
|
|
||||||
<dt><a name="feature" id="feature"></a>How to Submit A Feature
|
|
||||||
Request</dt>
|
|
||||||
|
|
||||||
<dd>
|
|
||||||
<p>If you want Tidy to do something new that it doesn't do today
|
|
||||||
(or stop doing something), then it is probably a feature
|
|
||||||
request.</p>
|
|
||||||
|
|
||||||
<p>The process for submitting a feature request is very similar to
|
|
||||||
bug requests. A different <a
|
|
||||||
href="http://sourceforge.net/tracker/?atid=390966&group_id=27659">
|
|
||||||
tracker</a> is used on SourceForge to denote the difference in
|
|
||||||
subject matter.</p>
|
|
||||||
|
|
||||||
<p>As with bugs, please be sure that the feature has not already
|
|
||||||
been requested. If the feature has already requested, you can add
|
|
||||||
your comments to the feature request tracker, or send mail to the
|
|
||||||
<a href="mailto:html-tidy@w3.org">mailing list</a> indicating your
|
|
||||||
wish to also have the feature implemented. If the feature has not
|
|
||||||
already been requested, send the same information as for a bug
|
|
||||||
report, but place special emphasis on the desired output for a
|
|
||||||
given input, desired options, etc. - please be as specific as
|
|
||||||
possible about what you want Tidy to <em>do</em>.</p>
|
|
||||||
</dd>
|
|
||||||
|
|
||||||
<dt><a name="layout" id="layout"></a>How Do I Control the Output Layout?</dt>
|
|
||||||
|
|
||||||
<dd>
|
|
||||||
<p>There are three primary options that control how Tidy
|
|
||||||
formats your markup:</p>
|
|
||||||
<ul>
|
|
||||||
<li><a class="code"
|
|
||||||
href="quickref.html#indent">indent</a></li>
|
|
||||||
<li><a class="code"
|
|
||||||
href="quickref.html#indent-attributes">indent-attributes</a></li>
|
|
||||||
<li><a class="code"
|
|
||||||
href="quickref.html#vertical-space">vertical-space</a></li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<p>Briefly, <code>indent</code> sets the level of left-to-right indenting
|
|
||||||
and, somewhat, how often elements are put onto a new line. The options
|
|
||||||
are <code>yes</code>, <code>no</code>, and <code>auto</code>.
|
|
||||||
<code>indent-attributes</code> is a flag that, when set, tells Tidy to
|
|
||||||
put each attribute on a new line. <code>vertical-space</code> is a flag
|
|
||||||
that, when set, tells Tidy to add some empty lines for readability. The
|
|
||||||
default for all three is <code>no</code>. These options may be used in
|
|
||||||
any combination to control you you want your markup to look. The best
|
|
||||||
thing is to experiment a bit to see what you like. Be aware that
|
|
||||||
<code>indent yes</code> is deprecated for production use as it will
|
|
||||||
cause visual changes in most browsers.</p>
|
|
||||||
|
|
||||||
<p>To get Tidy <em>Classic</em> <code>--indent auto</code> layout, use the following options:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
indent: auto
|
|
||||||
indent-attributes: no
|
|
||||||
vertical-space: yes
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>You can read about more <em>Pretty Print</em> options
|
|
||||||
<a href="quickref.html#PrettyPrintHeader">here</a>.</p>
|
|
||||||
</dd>
|
|
||||||
|
|
||||||
<dt><a name="version" id="version"></a>What Version of Tidy Should
|
|
||||||
I Use?</dt>
|
|
||||||
|
|
||||||
<dd>
|
|
||||||
<p>The current Source Forge builds are recommended. You can find these at
|
|
||||||
<a href="http://tidy.sourceforge.net">http://tidy.sourceforge.net</a>.
|
|
||||||
People continue to report examples where Tidy does not catch some
|
|
||||||
ill-formed HTML or, worse, generates ill-formed HTML. These cases have
|
|
||||||
been significantly reduced. That said, be sure to test Tidy with some
|
|
||||||
representative files from your environment.</p>
|
|
||||||
|
|
||||||
<p>For development work, use CVS directly on your development
|
|
||||||
system. For information on how to pull Tidy sources from <a
|
|
||||||
href="http://sourceforge.net/cvs/?group_id=27659">CVS</a>. This way
|
|
||||||
you can keep abreast of changes to Tidy and quickly resolve
|
|
||||||
conflicts.</p>
|
|
||||||
|
|
||||||
<p>For building a front end (e.g. GUI or language binding), the
|
|
||||||
simplest approach is to use TidyLib. For more information
|
|
||||||
about building and coding with TidyLib, see the <a
|
|
||||||
href="http://tidy.sourceforge.net/libintro.html">Introduction To TidyLib</a>.</p>
|
|
||||||
</dd>
|
|
||||||
|
|
||||||
<dt><a name="regression" id="regression">How Do I Run A
|
|
||||||
Regression Test?</a></dt>
|
|
||||||
<dd>
|
|
||||||
<p>You might ask, "Why should I run a regression test?". If you
|
|
||||||
are a Tidy user, you might want to compare a new version of Tidy
|
|
||||||
to the version you are currently running. This is a good idea
|
|
||||||
if you are using Tidy in production applications such as web
|
|
||||||
publishing. If you are a Tidy developer, it is a good idea to
|
|
||||||
run the regression test suite to make sure your fix or enhancement
|
|
||||||
doesn't add new bugs.</p>
|
|
||||||
|
|
||||||
<p>Detecting new bugs is easier said than done, because sometimes
|
|
||||||
they are subtle and can only be seen in browsers (or one particular
|
|
||||||
browser you don't even have). But you can catch most crashes and
|
|
||||||
many layout problems by running the test suite as described here.</p>
|
|
||||||
|
|
||||||
<p>The basic process is simple: run the test suite <strong>before</strong>
|
|
||||||
and <strong>after</strong> making changes to TidyLib and compare the output
|
|
||||||
markup and messages. Be aware that the test scripts for WinNT/2K/XP
|
|
||||||
(alltest.cmd) and Linux/Unix (testall.sh) place the output files in
|
|
||||||
<code>tidy/test/tmp</code>. If you forget to run the <strong>before</strong>
|
|
||||||
test, you can always download a binary from the <a
|
|
||||||
href="http://tidy.sourceforge.net/#binaries">Project Page</a>. If you
|
|
||||||
are not a TidyLib developer, you can download the <a
|
|
||||||
href="http://tidy.sourceforge.net/test/tidy_test.tgz">Test Suite</a>
|
|
||||||
directly. Here are the steps to evaluate the impact of a TidyLib change.</p>
|
|
||||||
|
|
||||||
<h3>For Windows</h3>
|
|
||||||
<p><strong>Before</strong> making changes:</p>
|
|
||||||
<pre>
|
|
||||||
C:\tidy\test> alltest.cmd
|
|
||||||
C:\tidy\test> ren tmp baseline
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p><strong>After</strong> making changes and building Tidy:</p>
|
|
||||||
<pre>
|
|
||||||
C:\tidy\test> alltest.cmd
|
|
||||||
C:\tidy\test> windiff tmp baseline
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<h3>For Linux/Unix</h3>
|
|
||||||
<p><strong>Before</strong> making changes:</p>
|
|
||||||
<pre>
|
|
||||||
~/tidy/test$ ./testall.sh
|
|
||||||
~/tidy/test$ mv tmp baseline
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p><strong>After</strong> making changes and building Tidy:</p>
|
|
||||||
<pre>
|
|
||||||
~/tidy/test$ ./testall.sh
|
|
||||||
~/tidy/test$ diff -u tmp baseline > diff.txt
|
|
||||||
</pre>
|
|
||||||
</dd>
|
|
||||||
|
|
||||||
<!--
|
|
||||||
<dt><a name="" id=""></a></dt>
|
|
||||||
<dd>
|
|
||||||
</dd>
|
|
||||||
|
|
||||||
<dt><a name="" id=""></a></dt>
|
|
||||||
<dd>
|
|
||||||
</dd>
|
|
||||||
-->
|
|
||||||
<!-- Save for future questions
|
|
||||||
<dt><a name="" id=""></a></dt>
|
|
||||||
<dd>
|
|
||||||
</dd>
|
|
||||||
-->
|
|
||||||
</dl>
|
|
||||||
</body>
|
|
||||||
</html>
|
|
|
@ -1,554 +0,0 @@
|
||||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
||||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
||||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
||||||
<head>
|
|
||||||
<meta name="generator" content="HTML Tidy, see www.w3.org" />
|
|
||||||
<title>HTML TIDY - Notes on pending work</title>
|
|
||||||
<meta name="keywords"
|
|
||||||
content="HTML, validation, error correction, pretty-printing" />
|
|
||||||
<meta name="author" content="Dave Raggett <dsr@w3.org>" />
|
|
||||||
<style type="text/css">
|
|
||||||
body {
|
|
||||||
margin-left: 10%;
|
|
||||||
margin-right: 10%;
|
|
||||||
font-family: sans-serif
|
|
||||||
}
|
|
||||||
h1 { margin-left: -8% }
|
|
||||||
h2,h3,h4,h5,h6 { margin-left: -4% }
|
|
||||||
pre { color: green; font-weight: bold;
|
|
||||||
font-size: 80%; font-family: monospace}
|
|
||||||
em { font-style: italic; font-weight: bold }
|
|
||||||
strong { text-transform: uppercase; font-weight: bold }
|
|
||||||
.note {font-style: italic; color: rgb(192, 101, 101) }
|
|
||||||
//hr {text-align: center; width: 60% }
|
|
||||||
blockquote {
|
|
||||||
color: navy;
|
|
||||||
margin-left: 1%;
|
|
||||||
margin-right: 1%;
|
|
||||||
text-align: center;
|
|
||||||
font-family: "Comic Sans MS", "Times New Roman", serif
|
|
||||||
}
|
|
||||||
table {
|
|
||||||
font-family: sans-serif;
|
|
||||||
font-size: 80%;
|
|
||||||
background: rgb(255,255,153)
|
|
||||||
}
|
|
||||||
td {
|
|
||||||
font-size: 80%
|
|
||||||
}
|
|
||||||
.people {font-family: "Lucida Calligraphy", serif}
|
|
||||||
:link { color: rgb(0, 0, 153) }
|
|
||||||
:visited { color: rgb(153, 0, 153) }
|
|
||||||
:active { color: rgb(255, 0, 102) }
|
|
||||||
a :hover { color: rgb(0, 0, 255) }
|
|
||||||
</style>
|
|
||||||
|
|
||||||
<style type="text/css">
|
|
||||||
p.c1 {font-style: italic}
|
|
||||||
</style>
|
|
||||||
</head>
|
|
||||||
<body bgcolor="#FFFFFF" background="grid.gif" text="black"
|
|
||||||
link="navy" vlink="black" alink="red">
|
|
||||||
<h1>HTML TIDY - Notes on Pending Work</h1>
|
|
||||||
|
|
||||||
<p><a href="http://www.w3.org/People/Raggett">Dave Raggett</a> <a
|
|
||||||
href="mailto:dsr@w3.org">dsr@w3.org</a></p>
|
|
||||||
|
|
||||||
<p>This is a page where I am keeping the suggestions for
|
|
||||||
improvements or bug fixes. My current work load means that I
|
|
||||||
don't get much time to work on HTML Tidy, so I am interested in
|
|
||||||
offers of help!</p>
|
|
||||||
|
|
||||||
<h4>Public Email List for Tidy: <<a
|
|
||||||
href="mailto:html-tidy@w3.org">html-tidy@w3.org</a>></h4>
|
|
||||||
|
|
||||||
<p>I have set up an archived mailing list devoted to Tidy. To
|
|
||||||
subscribe send an email to html-tidy-request@w3.org with the word
|
|
||||||
subscribe in the subject line (include the word unsubscribe if
|
|
||||||
you want to unsubscribe). The <a
|
|
||||||
href="http://lists.w3.org/Archives/Public/html-tidy/">archive</a>
|
|
||||||
for this list is accessible online. Please use this list to
|
|
||||||
report errors or enhancement requests.</p>
|
|
||||||
|
|
||||||
<h2>Things awaiting further attention</h2>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li>Support for BIG5 and ShiftJIS (Rick Jelliffe)</li>
|
|
||||||
|
|
||||||
<li>Stronger checking on which attributes appear on what
|
|
||||||
elements</li>
|
|
||||||
|
|
||||||
<li>Sorting attributes in a canonical order</li>
|
|
||||||
|
|
||||||
<li>Version checking for HTML 4.01 vs 4.0 (Tidy currently will
|
|
||||||
set the document type to 4.01 in preference to 4.0)</li>
|
|
||||||
|
|
||||||
<li>Noticing that the document isn't really XHTML if it isn't
|
|
||||||
wellformed, i.e. it lacks end tags and quotes on attribute
|
|
||||||
values</li>
|
|
||||||
|
|
||||||
<li>Converting <font face="Symbol">a</font> etc. to
|
|
||||||
the corresponding Unicode characters, when cleaning HTML.</li>
|
|
||||||
|
|
||||||
<li>link checking - this would involve some platform dependent
|
|
||||||
code as the network interface varies significantly from one
|
|
||||||
platform to the next.</li>
|
|
||||||
|
|
||||||
<li>When exporting Word2000 to Web page, there is a need for
|
|
||||||
smarter rules of thumb for working out whether the paragraph is a
|
|
||||||
bulletted or numbered list item, and determining the level of
|
|
||||||
nesting. Perhaps the style attribute holds the key? This tends to
|
|
||||||
include substrings like: "mso-list:l0 level1 lfo2;" and
|
|
||||||
"mso-list:l1 level1 lfo1;". Unfortunately, these aren't always
|
|
||||||
present, and I have yet to figure out a foolproof heuristic.</li>
|
|
||||||
</ul>
|
|
||||||
|
|
||||||
<p>I need to set up an index of precisely what attributes are
|
|
||||||
supported on each element. Right now, some elements check their
|
|
||||||
own attributes, whilst others are checked via default checks
|
|
||||||
defined for each attribute independently of the element. Until
|
|
||||||
this is done, you sometimes find that validation services
|
|
||||||
discovering errors unnoticed by Tidy itself.</p>
|
|
||||||
|
|
||||||
<p>Jelks Cabaniss asks: <i>Could Tidy be made to automatically
|
|
||||||
"clean" (FONTs to CSS) if the Strict DOCTYPE is requested? An
|
|
||||||
HTML or XHTML Strict document can't have FONT tags according to
|
|
||||||
the DTDs</i>. Jelks has a bunch of other good ideas such as
|
|
||||||
converting the bgcolor attribute over to CSS.</p>
|
|
||||||
|
|
||||||
<p>Adding an option to select slide transition effects. I would
|
|
||||||
also like to provide an optional feature for sorting attribute
|
|
||||||
values.</p>
|
|
||||||
|
|
||||||
<p>I am having problems with form elements as direct children of
|
|
||||||
tr or table. It is dangerous to create an implicit table cell,
|
|
||||||
and what is needed is a way to move the form element into the
|
|
||||||
next cell. If this can't be done an error needs to be raised
|
|
||||||
since Tidy will be stuck. On a separate note, Tidy is still
|
|
||||||
breaking lines between <img> and </a> which in
|
|
||||||
Netscape shows as an underlined space. It's fine in IE.</p>
|
|
||||||
|
|
||||||
<p>Benjamin Holzman <bah@orientation.com> writes: I'm
|
|
||||||
wrapping tidy (release-date 2000.01.13) in some perl objects
|
|
||||||
(using SWIG), and CharEncoding being a global is a bit of a pain.
|
|
||||||
I was wondering what your thoughts would be on how to fix that.
|
|
||||||
The character encoding is already a property of struct Out; is
|
|
||||||
there any reason why making it part of struct StreamIn as well,
|
|
||||||
and perhaps setting that property in OpenInput, based on the
|
|
||||||
existing CharEncoding variable, wouldn't allow us to move
|
|
||||||
CharEncoding to be local to main?</p>
|
|
||||||
|
|
||||||
<p>Oh, in case you're curious about the API, here's a short
|
|
||||||
script using my wrappers to be an html to xhtml filter:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
#!/usr/bin/perl
|
|
||||||
|
|
||||||
require tidy;
|
|
||||||
|
|
||||||
my $tidy = Tidy->new(*STDIN);
|
|
||||||
my $document = $tidy->parse;
|
|
||||||
$tidy->as_xhtml(*STDOUT);
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>Rick Parsons would like there to be a new wrap-attributes
|
|
||||||
option that can be used to suppress line wrapping within
|
|
||||||
attributes. There is already a similar option for JavaScript
|
|
||||||
literals.</p>
|
|
||||||
|
|
||||||
<p>Vijay Patil would like tidy -h to display options sorted
|
|
||||||
alphabetically.</p>
|
|
||||||
|
|
||||||
<p>Julian Reschke would like there to be an option to add the
|
|
||||||
xml:space="preserve" attribute to pre elements when outputting
|
|
||||||
xml.</p>
|
|
||||||
|
|
||||||
<p>Armando Asantos would like to use Tidy to produce a list of
|
|
||||||
URLs for images or hypertext links according to a config option.
|
|
||||||
This would be straightforward, but is a lower priority than bug
|
|
||||||
fixes etc.</p>
|
|
||||||
|
|
||||||
<p>Omri Traub would like an option to wrap the contents of style
|
|
||||||
and script elements in CDATA marked sections when converting to
|
|
||||||
XHTML. He is also interested in direct support for 16 bit
|
|
||||||
character file I/O.</p>
|
|
||||||
|
|
||||||
<p>Bertilo Wennergren notes:</p>
|
|
||||||
|
|
||||||
<blockquote>If I configure Tidy to "upgrade to style sheets", it
|
|
||||||
does so for a few things in my main document, but the code thus
|
|
||||||
created get error reports if I feed it back to Tidy. It turns out
|
|
||||||
that Tidy creates extra "class" attributes on tags that already
|
|
||||||
have "class" attributes set. This happens with this page:
|
|
||||||
<http://www.concinnity.se/bertilow/index.htm>.</blockquote>
|
|
||||||
|
|
||||||
<p>Randi Waki notes:</p>
|
|
||||||
|
|
||||||
<blockquote>
|
|
||||||
<p>If a quoted URL attribute value (e.g., href in <a>
|
|
||||||
elements) contains a line break, 13-Jan-2000 Tidy changes the
|
|
||||||
line break to a space while IE and Netscape discard the line
|
|
||||||
break. This can result in a broken link in the tidied
|
|
||||||
document.</p>
|
|
||||||
|
|
||||||
<p>I believe the following change fixes the problem. In lexer.c,
|
|
||||||
insert the following lines before line 2502:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
/* discard line breaks in quoted URLs */
|
|
||||||
if (c == '\n' && IsUrl(name))
|
|
||||||
continue;
|
|
||||||
|
|
||||||
/* existing line 2502 */ c = ' ';
|
|
||||||
</pre>
|
|
||||||
</blockquote>
|
|
||||||
|
|
||||||
<p>Stephen Reynolds would like Tidy to keep track of whether a
|
|
||||||
comment started on a new line and preserve this in the
|
|
||||||
output.</p>
|
|
||||||
|
|
||||||
<p>Terry Teague says:</p>
|
|
||||||
|
|
||||||
<blockquote>
|
|
||||||
<p>Sorry, I should have been more clear. Part of the problem is
|
|
||||||
the current HelpText() function in localize.c doesn't actually
|
|
||||||
reflect current reality.</p>
|
|
||||||
|
|
||||||
<p>You need to at least add the following line to HelpText()
|
|
||||||
:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
tidy_out(out, " -version or -v show version\n");
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>And I suppose it should mention the use of the new
|
|
||||||
"--<config options>" type syntax.</p>
|
|
||||||
|
|
||||||
<p>Regards, Terry</p>
|
|
||||||
</blockquote>
|
|
||||||
|
|
||||||
<p>John Russel notes:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
what i wonder is
|
|
||||||
1] does the specification indicate these are WRONG
|
|
||||||
2] if so why do they pass thru tidy ....
|
|
||||||
is url syntax such a can of worms that it is left to user
|
|
||||||
to check .......
|
|
||||||
|
|
||||||
CASE 1: misuse of slash for folders
|
|
||||||
site had background="pics\fancy.jpg"
|
|
||||||
instead of "pics/fancy.jpg"
|
|
||||||
|
|
||||||
CASE 2: spaces in filename
|
|
||||||
site had href="coin album.html"
|
|
||||||
instead of "coin%20album.html"
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>Andre Stechert would like a way to prevent Tidy from
|
|
||||||
"cleaning" newly declared elements which don't have any content
|
|
||||||
but do have end tags, see his mail of 17th January 2000</p>
|
|
||||||
|
|
||||||
<p>Todd Clark would like to use Tidy with Microsoft's WebClass
|
|
||||||
tags. Unfortunately these include unusual characters in the tag
|
|
||||||
names such as @ which Tidy objects to, for instance:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
<WC@DOMAINNAME>test.com</WC@DOMAINNAME>
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>Perhaps it makes sense to offer an option to make Tidy less
|
|
||||||
picky about what characters it accepts in tag names. Or perhaps
|
|
||||||
"WebClass: yes".</p>
|
|
||||||
|
|
||||||
<p>Jelks Cabaniss suggests an option to control dropping of empty
|
|
||||||
elements, e.g. according to what attributes they have.</p>
|
|
||||||
|
|
||||||
<p>Paavo Hartikainen writes:</p>
|
|
||||||
|
|
||||||
<blockquote>
|
|
||||||
<p>Tidy always expands '&' to '&' even if I have
|
|
||||||
'quote-ampersand: no' defined in configuration file. This is not
|
|
||||||
a good thing to do for URLs that have '&' characters in them.
|
|
||||||
OS is Debian GNU/Linux 2.1 SPARC. Same thing happens on Alpha.
|
|
||||||
Other architectures I have not tried.</p>
|
|
||||||
|
|
||||||
<p>My configuration looks like this:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
char-encoding: latin1
|
|
||||||
error-file: ./errors
|
|
||||||
indent-spaces: 2
|
|
||||||
logical-emphasis: yes
|
|
||||||
output-xhtml: yes
|
|
||||||
quiet: no
|
|
||||||
quote-ampersand: no
|
|
||||||
show-warnings: yes
|
|
||||||
tidy-mark: yes
|
|
||||||
wrap: 78
|
|
||||||
wrap-attributes: no
|
|
||||||
write-back: yes
|
|
||||||
keep-time: yes
|
|
||||||
</pre>
|
|
||||||
</blockquote>
|
|
||||||
|
|
||||||
<p>Paul White reports that Tidy isn't recognizing HTML 3.2 when
|
|
||||||
the doctype is "-//W3C//DTD HTML 3.2 Final//EN" (as per the REC),
|
|
||||||
and similarly for HTML 4.01. This would appear to call for a
|
|
||||||
change to the table of names in lexer.c.</p>
|
|
||||||
|
|
||||||
<p>Stuart Hungerford would like Tidy to detect and fix duplicate
|
|
||||||
attributes e.g. multiple class attributes. Celeste Suliin Burris
|
|
||||||
would like Tidy to replace spaces in URLs by %20 as some versions
|
|
||||||
of Netscape "croak big time" on this. Denis Kokarev also wants
|
|
||||||
Tidy to remove duplicate attributes when the values are the same.
|
|
||||||
This apparently stops XSLT from working. Brian Schweitzer notes
|
|
||||||
that Tidy adds a 2nd class attribute rather than merging the
|
|
||||||
classes into a space separated list.</p>
|
|
||||||
|
|
||||||
<p>Bertilo Wennergren writes: Tidy seems not to recognize frame
|
|
||||||
elements with a closing "/". It actually removes them. Try his <a
|
|
||||||
href="http://www.concinnity.se/bertilow/pmeg/pmeg9/k_bazo.htm">example</a>.
|
|
||||||
Tidy can produce XHTML Frameset docs, but when fed them back</p>
|
|
||||||
|
|
||||||
<p>again it cries foul.</p>
|
|
||||||
|
|
||||||
<p>Jose Manuel Cerqueira Esteves notes:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
I've used `tidy' to convert a few HTML 4.0 files to XHTML 1.0 and noticed
|
|
||||||
a problem when dealing with constructs like
|
|
||||||
|
|
||||||
<small><small>some text</small></small>
|
|
||||||
|
|
||||||
First, `tidy' acts as if the second "<small>" was meant as a closing tag:
|
|
||||||
|
|
||||||
Warning: "<small> is probably intended as </small>"
|
|
||||||
|
|
||||||
Then it trims the resulting empty <small></small>:
|
|
||||||
|
|
||||||
Warning: trimming empty <small>
|
|
||||||
|
|
||||||
And finally both remaining closing tags ("</small>"), now spurious,
|
|
||||||
are removed:
|
|
||||||
|
|
||||||
Warning: discarding unexpected </small>
|
|
||||||
Warning: discarding unexpected </small>
|
|
||||||
|
|
||||||
It would be convenient to have at least some `tidy' option to prevent this
|
|
||||||
from happening (or perhaps some different heuristics?).
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>Robbert Hans Baron would like to see Tidy warning about
|
|
||||||
duplicate attributes and fixing these when the values are
|
|
||||||
identical.</p>
|
|
||||||
|
|
||||||
<p>Jutta Wrage notes that: When parsing HTML 3.2 Pages, tidy
|
|
||||||
doesn't accept textareas in forms correctly. The HTML Reference
|
|
||||||
specification (HTML 3.2 Final) allows: name, rows and cols, but
|
|
||||||
upon seeing these Tidy thinks the document is 4.0.</p>
|
|
||||||
|
|
||||||
<p>Matthew Brealey notes that a heading start tag is coerced to
|
|
||||||
an end heading tag when the end tag is missing. This is
|
|
||||||
deliberate, but perhaps not the best heuristic.</p>
|
|
||||||
|
|
||||||
<p>HIYAMA Masayuki notes that Tidy should set the encoding
|
|
||||||
attribute to match the language encoding, e.g. ?xml version="1.0"
|
|
||||||
encoding="iso-2022-jp"?><.</p>
|
|
||||||
|
|
||||||
<p>Mark Modrall has extended Tidy to support selectively
|
|
||||||
stripping out listed tags and attributes, see his email of March
|
|
||||||
14th.</p>
|
|
||||||
|
|
||||||
<p>Yong Taek Bae notes that with the omit end tags option Tidy
|
|
||||||
omits the body tag even if it has attributes. This is an
|
|
||||||
error.</p>
|
|
||||||
|
|
||||||
<p>Tapio Markula reports that Tidy is incorrectly replacing
|
|
||||||
accented characters in script elements by entities. The script
|
|
||||||
element (in HTML but not XHTML) is CDATA and as such entities
|
|
||||||
won't be expanded. This bug needs to be fixed along with the
|
|
||||||
support for CDATA sections.</p>
|
|
||||||
|
|
||||||
<p>Terrill Bennett reports tidy crashing when producing slides,
|
|
||||||
and when the -i option has been set. He later added the crash
|
|
||||||
occurs when the page doesn't include an h1 element. See
|
|
||||||
Terrill-Bennett-11mar00.txt.</p>
|
|
||||||
|
|
||||||
<p>Stephen Lewis notes that if an <hr> element is present
|
|
||||||
in the head before the title element, then Tidy gets confused and
|
|
||||||
adds in a spurious extra empty title element. This would be
|
|
||||||
avoided if Tidy could move the hr into the body before the body
|
|
||||||
element is encountered. This raises a number of problems for
|
|
||||||
instance working out when to copy in attributes from an explicit
|
|
||||||
body element.</p>
|
|
||||||
|
|
||||||
<p>Carl Osterly would like Tidy to avoid breaking lines before or
|
|
||||||
after the = sign in attribute values when this is practical.
|
|
||||||
Perhaps a simple rule of thumb could be used to decide this?</p>
|
|
||||||
|
|
||||||
<p>Rick H Wesson notes that Tidy crashes on CDATA marked sections
|
|
||||||
when parsing XML.</p>
|
|
||||||
|
|
||||||
<p>Luigi Federici would like an option to set the DTD URI for XML
|
|
||||||
or XHTML.</p>
|
|
||||||
|
|
||||||
<p>Mat Sander notes: If I have php code the indentation behaves
|
|
||||||
strange. Repeated tidying php content and end tag indented one
|
|
||||||
level extra for each time. The result ends up something like
|
|
||||||
this:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
...
|
|
||||||
<?php
|
|
||||||
$r=0;
|
|
||||||
?<
|
|
||||||
...
|
|
||||||
|
|
||||||
I have the fillowing config file for Tidy:
|
|
||||||
---
|
|
||||||
tidy-mark: no
|
|
||||||
markup: yes
|
|
||||||
wrap: 0
|
|
||||||
indent: auto
|
|
||||||
output-xml: no
|
|
||||||
output-xhtml: yes
|
|
||||||
doctype: loose
|
|
||||||
char-encoding: latin1
|
|
||||||
quote-marks: yes
|
|
||||||
assume-xml-procins: yes
|
|
||||||
word-2000: yes
|
|
||||||
clean: yes
|
|
||||||
logical-emphasis: yes
|
|
||||||
drop-empty-paras: yes
|
|
||||||
enclose-text: yes
|
|
||||||
fix-bad-comments: yes
|
|
||||||
alt-text: .
|
|
||||||
write-back: bool
|
|
||||||
keep-time: yes
|
|
||||||
show-warnings: no
|
|
||||||
quiet: yes
|
|
||||||
split: no
|
|
||||||
---
|
|
||||||
|
|
||||||
Best Regards,
|
|
||||||
Mats-Olof Sander
|
|
||||||
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>Don Hasson notes that if you make a mistake and leave off the
|
|
||||||
ending "/" in the <title> tag, tidy will generate an extra
|
|
||||||
set of <title>s.</p>
|
|
||||||
|
|
||||||
<p>Example:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
<html>
|
|
||||||
<head><title>No end here<title></head>
|
|
||||||
<body>
|
|
||||||
Empty
|
|
||||||
</body>
|
|
||||||
</html>
|
|
||||||
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>produces this:</p>
|
|
||||||
|
|
||||||
<pre>
|
|
||||||
<html>
|
|
||||||
<head>
|
|
||||||
<title>No end here</title>
|
|
||||||
<title></title>
|
|
||||||
</head>
|
|
||||||
<body>
|
|
||||||
Empty
|
|
||||||
</body>
|
|
||||||
</html>
|
|
||||||
|
|
||||||
</pre>
|
|
||||||
|
|
||||||
<p>Jeff Wilkinson would like the HTML Tidy page to include
|
|
||||||
internal anchors so that he can link directly to the appropriate
|
|
||||||
sections.</p>
|
|
||||||
|
|
||||||
<p>Peter Vince would like to be able to clean presentation
|
|
||||||
attributes on the body element, as well as translating b and i to
|
|
||||||
span.</p>
|
|
||||||
|
|
||||||
<p>Dave Bryan and Mathew Brealey would like there to be a way to
|
|
||||||
suppress the default handling of inline elements in favor of
|
|
||||||
simply inserting the appropriate end tag when encountering an
|
|
||||||
element that isn't allowed in an inline context. The default
|
|
||||||
behavior replicates the rendering on existing browsers but can
|
|
||||||
cause problems for hand editors.</p>
|
|
||||||
|
|
||||||
<p>Dave Bryan notes that tidy isn't updating the column position
|
|
||||||
when parsing attributes.</p>
|
|
||||||
|
|
||||||
<p>Can Tidy track when a line break occurs after a PI or comment
|
|
||||||
and reproduce this in the output? This idea occurred to me after
|
|
||||||
reading a comment from Brad Stowers.</p>
|
|
||||||
|
|
||||||
<p>One interesting suggestion is to make some of Tidy's rules of
|
|
||||||
thumb sensitive to the program that generated the markup as
|
|
||||||
indicated by the meta element. This would allow for greater
|
|
||||||
robustness in how the rules operate.</p>
|
|
||||||
|
|
||||||
<p>Dave Bryan would like the quiet mode to be tweaked to suppress
|
|
||||||
the general info at the end of the report. see
|
|
||||||
Dave-Bryan-24mar00.txt.</p>
|
|
||||||
|
|
||||||
<p>Erik Rossen would like an option to suppress line wrap within
|
|
||||||
tags, so that the tag is always on the same line regardless of
|
|
||||||
the number and length of the attributes.</p>
|
|
||||||
|
|
||||||
<p>Dan Satria suggest that the clean mechanism check to see if
|
|
||||||
there are any existing matching style rules before adding new
|
|
||||||
ones.</p>
|
|
||||||
|
|
||||||
<p>Zoltan Hawryluk suggests mapping the Netscape layer tag into
|
|
||||||
the equivalent CSS positioning syntax.</p>
|
|
||||||
|
|
||||||
<p>Jim Walker says Tidy doesn't correctly report errors such as
|
|
||||||
<tt></</head></tt>.</p>
|
|
||||||
|
|
||||||
<p>Tidy's slide feature: see Johannes-Poutre-12jul00.txt</p>
|
|
||||||
|
|
||||||
<p>Carole Mah suggests Tidy should recover from multiple class
|
|
||||||
attributes on the same element.</p>
|
|
||||||
|
|
||||||
<h2>Other ideas</h2>
|
|
||||||
|
|
||||||
<ul>
|
|
||||||
<li>Recursion through subdirectories, so you can fix up your
|
|
||||||
entire web site at one go. This assumes I can find a way that is
|
|
||||||
portable across a wide range of platforms!</li>
|
|
||||||
|
|
||||||
<li>Support for W3C's <a
|
|
||||||
href="http://www.w3.org/TR/REC-DOM-Level-1/">Document Object
|
|
||||||
Model</a> (DOM) level one.</li>
|
|
||||||
|
|
||||||
<li>Full validation of all attribute values.</li>
|
|
||||||
|
|
||||||
<li>Mapping Unicode bidi control characters to HTML tags.</li>
|
|
||||||
|
|
||||||
<li>Full support for parsing XML (still somewhat limited).</li>
|
|
||||||
|
|
||||||
<li>How to say which XML elements should be printed
|
|
||||||
"inline".</li>
|
|
||||||
|
|
||||||
<li>Acting on the XML encoding attribute, e.g.
|
|
||||||
<?xml encoding="iso-8859-1"></li>
|
|
||||||
|
|
||||||
<li>Improved mapping from HTML presentation attributes/elements
|
|
||||||
to CSS.</li>
|
|
||||||
|
|
||||||
<li>Improved support for <a
|
|
||||||
href="http://java.sun.com/products/jsp/">JSP</a> (Java Server
|
|
||||||
pages)</li>
|
|
||||||
|
|
||||||
<li>Ugly print option which removes all optional whitespace</li>
|
|
||||||
</ul>
|
|
||||||
</body>
|
|
||||||
</html>
|
|
||||||
|
|
File diff suppressed because it is too large
Load diff
BIN
htmldoc/tidy.gif
BIN
htmldoc/tidy.gif
Binary file not shown.
Before Width: | Height: | Size: 244 B |
562
index.html
Normal file
562
index.html
Normal file
|
@ -0,0 +1,562 @@
|
||||||
|
<!doctype html>
|
||||||
|
<meta charset=utf-8>
|
||||||
|
<title>HTML Tidy for HTML5 (experimental)</title>
|
||||||
|
<style type="text/css">
|
||||||
|
html {
|
||||||
|
background: #DDE5D9 url() repeat 0 0;
|
||||||
|
font-family: "Lucida Sans Unicode", "Lucida Sans", verdana, arial, helvetica;
|
||||||
|
}
|
||||||
|
body {
|
||||||
|
border: solid 1px #CED4CA;
|
||||||
|
background-color: #FFF;
|
||||||
|
padding: 4px 40px 40px 40px;
|
||||||
|
margin: 20px 20px 20px 20px;
|
||||||
|
padding-right: 20%;
|
||||||
|
}
|
||||||
|
h1, h2 {
|
||||||
|
color: #0B5B9D;
|
||||||
|
}
|
||||||
|
h1 {
|
||||||
|
font-size: 39px;
|
||||||
|
font-weight: normal;
|
||||||
|
vertical-align: top;
|
||||||
|
margin-bottom: 0px;
|
||||||
|
}
|
||||||
|
a {
|
||||||
|
text-decoration: none;
|
||||||
|
color: #0B5B9D;
|
||||||
|
padding: 2px;
|
||||||
|
}
|
||||||
|
|
||||||
|
a:hover {
|
||||||
|
text-decoration: none;
|
||||||
|
background-color: #0B5B9D;
|
||||||
|
color: white;
|
||||||
|
}
|
||||||
|
a:active {
|
||||||
|
text-decoration: none;
|
||||||
|
background-color: white;
|
||||||
|
color: black;
|
||||||
|
}
|
||||||
|
#toc {
|
||||||
|
position: fixed;
|
||||||
|
top: 10px;
|
||||||
|
right: 10px;
|
||||||
|
border: 2px solid #0B5B9D;
|
||||||
|
background: rgba(255,255,255,.9);
|
||||||
|
padding: 15px;
|
||||||
|
z-index: 999;
|
||||||
|
max-height: 400px;
|
||||||
|
overflow: auto;
|
||||||
|
font-size: 11px;
|
||||||
|
font-family: Verdana, sans-serif;
|
||||||
|
}
|
||||||
|
#toc-button {
|
||||||
|
position:fixed;
|
||||||
|
top:10px;
|
||||||
|
right:10px;
|
||||||
|
background:transparent;
|
||||||
|
padding:15px;
|
||||||
|
z-index:999;
|
||||||
|
max-height:400px;
|
||||||
|
overflow:auto;
|
||||||
|
font-size:11px;
|
||||||
|
font-family:Verdana, sans-serif;
|
||||||
|
}
|
||||||
|
#toc .button,
|
||||||
|
#toc-button .button {
|
||||||
|
float: right;
|
||||||
|
margin: 0 0 5px 5px;
|
||||||
|
padding: 5px;
|
||||||
|
border: 1px #008 solid;
|
||||||
|
color:#00f;
|
||||||
|
background-color:#ccf;
|
||||||
|
}
|
||||||
|
#toc ol {
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
font-size: 11px;
|
||||||
|
font-family: Verdana, sans-serif;
|
||||||
|
}
|
||||||
|
#toc li {
|
||||||
|
list-style: decimal outside;
|
||||||
|
margin-left: 20px;
|
||||||
|
font-size: 11px;
|
||||||
|
font-family: Verdana, sans-serif;
|
||||||
|
}
|
||||||
|
#toc li a {
|
||||||
|
font-size: 11px;
|
||||||
|
font-family: Verdana, sans-serif;
|
||||||
|
}
|
||||||
|
.hide {
|
||||||
|
display: none;
|
||||||
|
}
|
||||||
|
.show {
|
||||||
|
display: block;
|
||||||
|
}
|
||||||
|
code { color: green; font-weight: bold; }
|
||||||
|
pre { color: green; font-weight: bold; font-family: monospace}
|
||||||
|
em { font-style: italic; color: rgb(0, 0, 153) }
|
||||||
|
:link { color: rgb(0, 0, 153) }
|
||||||
|
:visited { color: rgb(153, 0, 153) }
|
||||||
|
</style>
|
||||||
|
|
||||||
|
<h1 id=intro>HTML Tidy for HTML5 (experimental)</h1>
|
||||||
|
<p>This page documents the experimental HTML5 fork of HTML Tidy available
|
||||||
|
at
|
||||||
|
<a href="https://github.com/w3c/tidy-html5">https://github.com/w3c/tidy-html5</a>.
|
||||||
|
|
||||||
|
<p>File bug reports and enhancement requests at
|
||||||
|
<a href="https://github.com/w3c/tidy-html5/issues">https://github.com/w3c/tidy-html5/issues</a>.</p>
|
||||||
|
|
||||||
|
<p>The W3C public mailing list for HTML Tidy discussion is
|
||||||
|
<b>html-tidy@w3.org</b> (<a href= "http://lists.w3.org/Archives/Public/html-tidy/">list archive</a>).
|
||||||
|
|
||||||
|
<p>For more information on HTML5:</p>
|
||||||
|
<ul>
|
||||||
|
<li>
|
||||||
|
<a href="http://dev.w3.org/html5/spec-author-view">HTML: Edition for Web Authors</a> (the latest HTML specification)
|
||||||
|
<li>
|
||||||
|
<a href="http://dev.w3.org/html5/markup/">HTML: The Markup Language</a> (an HTML language reference)
|
||||||
|
</ul>
|
||||||
|
<p>
|
||||||
|
Validate your HTML documents using the
|
||||||
|
<a href="http://validator.w3.org/nu/">W3C Nu Markup Validator</a>.
|
||||||
|
|
||||||
|
<h2 id=what-tidy-does>What Tidy does</h2>
|
||||||
|
<p>Tidy corrects and cleans up HTML content by fixing markup errors.
|
||||||
|
Here are a few examples:
|
||||||
|
<ul>
|
||||||
|
<li><b>Mismatched end tags:</b>
|
||||||
|
<pre>
|
||||||
|
<h2>subheading</h3>
|
||||||
|
</pre>
|
||||||
|
<p>…is converted to:</p>
|
||||||
|
<pre>
|
||||||
|
<h2>subheading</h2>
|
||||||
|
</pre></li>
|
||||||
|
<li><b>Misnested tags:</b>
|
||||||
|
<pre>
|
||||||
|
<p>here is a para <b>bold <i>bold italic</b> bold?</i> normal?
|
||||||
|
</pre>
|
||||||
|
<p>…is converted to:</p>
|
||||||
|
<pre>
|
||||||
|
<p>here is a para <b>bold <i>bold italic</i> bold?</b> normal?
|
||||||
|
</pre></li>
|
||||||
|
<li><b>Missing end tags:</b>
|
||||||
|
<pre>
|
||||||
|
<h1>heading
|
||||||
|
<h2>subheading</h2>
|
||||||
|
</pre>
|
||||||
|
<p>…is converted to:</p>
|
||||||
|
<pre>
|
||||||
|
<h1>heading</h1>
|
||||||
|
<h2>subheading</h2>
|
||||||
|
</pre>
|
||||||
|
…and
|
||||||
|
<pre>
|
||||||
|
<h1><i>italic heading</h1>
|
||||||
|
</pre>
|
||||||
|
<p>…is converted to:</p>
|
||||||
|
<pre>
|
||||||
|
<h1><i>italic heading</i></h1>
|
||||||
|
</pre></li>
|
||||||
|
<li><b>Mixed-up tags</b>
|
||||||
|
<pre>
|
||||||
|
<i><h1>heading</h1></i>
|
||||||
|
<p>new paragraph <b>bold text
|
||||||
|
<p>some more bold text
|
||||||
|
</pre>
|
||||||
|
<p>…is converted to:</p>
|
||||||
|
<pre>
|
||||||
|
<h1><i>heading</i></h1>
|
||||||
|
<p>new paragraph <b>bold text</b>
|
||||||
|
<p><b>some more bold text</b>
|
||||||
|
</pre></li>
|
||||||
|
<li><b>Tag in the wrong place:</b>
|
||||||
|
<pre>
|
||||||
|
<h1><hr>heading</h1>
|
||||||
|
<h2>sub<hr>heading</h2>
|
||||||
|
</pre>
|
||||||
|
<p>…is converted to:</p>
|
||||||
|
<pre>
|
||||||
|
<hr>
|
||||||
|
<h1>heading</h1>
|
||||||
|
<h2>sub</h2>
|
||||||
|
<hr>
|
||||||
|
<h2>heading</h2>
|
||||||
|
</pre></li>
|
||||||
|
<li><b>Missing "/" in end tags:</b>
|
||||||
|
<pre>
|
||||||
|
<a href="#refs">References<a>
|
||||||
|
</pre>
|
||||||
|
<p>…is converted to:</p>
|
||||||
|
<pre>
|
||||||
|
<a href="#refs">References</a>
|
||||||
|
</pre></li>
|
||||||
|
<li><b>List markup with missing tags:</b>
|
||||||
|
<pre>
|
||||||
|
<body>
|
||||||
|
<li>1st list item
|
||||||
|
<li>2nd list item
|
||||||
|
</pre>
|
||||||
|
<p>…is converted to:</p>
|
||||||
|
<pre>
|
||||||
|
<body>
|
||||||
|
<ul>
|
||||||
|
<li>1st list item</li>
|
||||||
|
<li>2nd list item</li>
|
||||||
|
</ul>
|
||||||
|
</pre></li>
|
||||||
|
<li><b>Missing quotation marks around attribute values</b>
|
||||||
|
<p>Tidy inserts quotation marks around all attribute values for you. It
|
||||||
|
can also detect when you have forgotten the closing quotation mark,
|
||||||
|
although this is something you will have to fix yourself.</p>
|
||||||
|
</li>
|
||||||
|
<li><b>Unknown/proprietary attributes</b>
|
||||||
|
<p>Tidy has a comprehensive knowledge of the attributes defined in HTML5.
|
||||||
|
That often allows you to spot where you have mis-typed an attribute.
|
||||||
|
</li>
|
||||||
|
<li><b>Tags lacking a terminating ">"</b>
|
||||||
|
<p>This is something you then have to fix yourself as Tidy cannot
|
||||||
|
determine where the ">" was meant to be inserted.</p>
|
||||||
|
</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h2 id="help">How to run Tidy from the command line</h2>
|
||||||
|
<p>This is the syntax for invoking Tidy from the command line:
|
||||||
|
<pre>
|
||||||
|
<code>tidy <em>[[options] filename]*</em></code>
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
Tidy defaults to reading from standard input, so if you run Tidy without
|
||||||
|
specifying the <code><em>filename</em></code> argument, it will just sit
|
||||||
|
there waiting for input to read.
|
||||||
|
And Tidy defaults to writing to standard output. So you can pipe output
|
||||||
|
from Tidy to other programs, as well as pipe output from other programs to
|
||||||
|
Tidy. You can page through the output from Tidy by piping it to a pager:</p>
|
||||||
|
<pre>
|
||||||
|
tidy file.html | less
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
To have Tidy write its output to a file instead, either use the
|
||||||
|
<code>-o <em>filename</em></code> or <code>-output <em>filename</em></code>
|
||||||
|
option, or redirect standard output to the file; for example:
|
||||||
|
<pre>
|
||||||
|
tidy -o output.html index.html
|
||||||
|
tidy index.html > output.html
|
||||||
|
</pre>
|
||||||
|
<p>Both of those run tidy on the file <b>index.html</b> and write the
|
||||||
|
output to the file <b>output.html</b>, while writing any error messages to
|
||||||
|
standard error.
|
||||||
|
<p>
|
||||||
|
Tidy defaults to writing its error messages to standard error (that is, to
|
||||||
|
the console where you’re running Tidy). To page through the error messages,
|
||||||
|
along with the output, redirect standard error to standard output, and pipe
|
||||||
|
it to your pager:
|
||||||
|
<pre>
|
||||||
|
tidy index.html 2>&1 | less
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
To have Tidy write the errors to a file instead, either use the
|
||||||
|
<code>-f <em>filename</em></code> or <code>-file <em>filename</em></code>
|
||||||
|
option, or redirect standard error to a file:</p>
|
||||||
|
<pre>
|
||||||
|
tidy -o output.html -f errs.txt index.html
|
||||||
|
tidy index.html > output.html 2> errs.txt
|
||||||
|
</pre>
|
||||||
|
<p>Both of those run tidy on the file <b>index.html</b> and write the
|
||||||
|
output to the file <b>output.html</b>, while writing any error messages to
|
||||||
|
the file <b>errs.txt</b>.
|
||||||
|
<p>
|
||||||
|
Writing the error messages to a file is especially useful if the file you
|
||||||
|
are checking has many errors; reading them from a file instead of the
|
||||||
|
console or pager can make it easier to review them.
|
||||||
|
<p>You can use the or <code>-m</code> or <code>-modify</code> option to
|
||||||
|
modify (in-place) the contents of the input file you are checking; that is,
|
||||||
|
to overwrite those contents with the output from Tidy. Example:
|
||||||
|
<pre>
|
||||||
|
tidy -f errs.txt -m index.html
|
||||||
|
</pre>
|
||||||
|
<p>That runs tidy on the file <b>index.html</b>, modifying it in place
|
||||||
|
and writing the error messages to the file <b>errs.txt</b>.
|
||||||
|
<p>
|
||||||
|
<b>Caution:</b> If you use the -m option, you should first save a copy of your file.
|
||||||
|
<h2 id=options>Options and configuration settings</h2>
|
||||||
|
<p>To get a list of available options, use:</p>
|
||||||
|
<pre>
|
||||||
|
tidy -help
|
||||||
|
</pre>
|
||||||
|
<p>To get a list of all configuration settings, use:</p>
|
||||||
|
<pre>
|
||||||
|
tidy -help-config
|
||||||
|
</pre>
|
||||||
|
<p>To read the help output a page at time, pipe it to a pager:
|
||||||
|
<pre>
|
||||||
|
tidy -help | less
|
||||||
|
tidy -help-config | less
|
||||||
|
</pre>
|
||||||
|
<p>Single-letter options other than -f may be combined; for example:
|
||||||
|
<pre>
|
||||||
|
tidy -f errs.txt -imu foo.html
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<h2 id="config">Using a config file</h2>
|
||||||
|
<p>The most convenient way to configure Tidy is by using separate
|
||||||
|
config file.
|
||||||
|
Assuming you have created a
|
||||||
|
Tidy config file named <b>config.txt</b> (the name doesn't matter), you can
|
||||||
|
instruct Tidy to use it via the command line option
|
||||||
|
<code>-config config.txt</code>; for example:
|
||||||
|
<pre>
|
||||||
|
tidy -config config.txt file1.html file2.html
|
||||||
|
</pre>
|
||||||
|
<p>Alternatively, you can name the default config file via the
|
||||||
|
environment variable named <b>HTML_TIDY</b>, the value of which is
|
||||||
|
the absolute path for the config file.
|
||||||
|
<p>You can also set config options on the command line by preceding
|
||||||
|
the name of the option immediately (no intervening space) with the string "<code>--</code>";
|
||||||
|
for example:</p>
|
||||||
|
<pre>
|
||||||
|
tidy --break-before-br true --show-warnings false
|
||||||
|
</pre>
|
||||||
|
<p>You can find documentation for full set of configuration options
|
||||||
|
on the
|
||||||
|
<a href= "quickref.html">Quick Reference</a>
|
||||||
|
page.
|
||||||
|
|
||||||
|
<h2 id=sample-config>Sample config file</h2>
|
||||||
|
<p>The following is an example of a Tidy config file.</p>
|
||||||
|
<pre>
|
||||||
|
// sample config file for HTML tidy
|
||||||
|
indent: auto
|
||||||
|
indent-spaces: 2
|
||||||
|
wrap: 72
|
||||||
|
markup: yes
|
||||||
|
output-xml: no
|
||||||
|
input-xml: no
|
||||||
|
show-warnings: yes
|
||||||
|
numeric-entities: yes
|
||||||
|
quote-marks: yes
|
||||||
|
quote-nbsp: yes
|
||||||
|
quote-ampersand: no
|
||||||
|
break-before-br: no
|
||||||
|
uppercase-tags: no
|
||||||
|
uppercase-attributes: no
|
||||||
|
char-encoding: latin1
|
||||||
|
new-inline-tags: cfif, cfelse, math, mroot,
|
||||||
|
mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
|
||||||
|
munder, mover, mmultiscripts, msup, msub, mtext,
|
||||||
|
mprescripts, mtable, mtr, mtd, mth
|
||||||
|
new-blocklevel-tags: cfoutput, cfquery
|
||||||
|
new-empty-tags: cfelse
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<h2 id=indenting>Indenting output for readability</h2>
|
||||||
|
<p>Indenting the source markup of an HTML document makes the markup easier
|
||||||
|
to read. Tidy can indent the markup for an HTML document while recognizing
|
||||||
|
elements whose contents should not be indented. In the example below, Tidy
|
||||||
|
indents the output while preserving the formatting of the <pre>
|
||||||
|
element:</p>
|
||||||
|
<p>Input:</p>
|
||||||
|
<pre>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>Test document</title>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<p>This example shows how Tidy can indent output while preserving
|
||||||
|
formatting of particular elements.</p>
|
||||||
|
|
||||||
|
<pre>This is
|
||||||
|
<em>genuine
|
||||||
|
preformatted</em>
|
||||||
|
text
|
||||||
|
</pre>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
|
||||||
|
</pre>
|
||||||
|
<p>Output:</p>
|
||||||
|
<pre>
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>Test document</title>
|
||||||
|
</head>
|
||||||
|
|
||||||
|
<body>
|
||||||
|
<p>This example shows how Tidy can indent output while preserving
|
||||||
|
formatting of particular elements.</p>
|
||||||
|
<pre>
|
||||||
|
This is
|
||||||
|
<em>genuine
|
||||||
|
preformatted</em>
|
||||||
|
text
|
||||||
|
</pre>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
</pre>
|
||||||
|
<p>Tidy’s indenting behavior is not perfect and can sometimes cause your
|
||||||
|
output to be rendered by browsers in a different way than the input.
|
||||||
|
You can avoid unexpected indenting-related rendering problems by setting
|
||||||
|
<code>indent: no</code> or <code>indent: auto</code> in a config file.</p>
|
||||||
|
|
||||||
|
<h2 id=preserve-indenting>Preserving original indenting not possible</h2>
|
||||||
|
<p>Tidy is not capable of preserving the original indenting of the markup
|
||||||
|
from the input it receives. That’s because Tidy starts by building a clean
|
||||||
|
parse tree from the input, and that parse tree doesn’t contain any
|
||||||
|
information about the original indenting. Tidy then pretty-prints the parse
|
||||||
|
tree using the current config settings. Trying to preserve the original
|
||||||
|
indenting from the input would interact badly with the repair operations
|
||||||
|
needed to build a clean parse tree, and would considerably complicate the
|
||||||
|
code.</p>
|
||||||
|
|
||||||
|
<h2 id=encodings>Encodings and character references</h2>
|
||||||
|
<p>
|
||||||
|
Tidy defaults to assuming you want output to be encoded in UTF-8.
|
||||||
|
But Tidy offers you a choice of other character encodings: US ASCII, ISO
|
||||||
|
Latin-1, and the ISO 2022 family of 7 bit encodings.
|
||||||
|
<p>
|
||||||
|
Tidy doesn't yet recognize the use of the HTML <meta> element for
|
||||||
|
specifying the character encoding.</p>
|
||||||
|
<p>
|
||||||
|
The full set of HTML character references are defined. Cleaned-up output
|
||||||
|
uses named character references for characters when appropriate. Otherwise,
|
||||||
|
characters outside the normal range are output as numeric character
|
||||||
|
references.
|
||||||
|
|
||||||
|
<h2 id=accessibility>Accessibility</h2>
|
||||||
|
<p>Tidy offers advice on potential accessibility problems for people using
|
||||||
|
non-graphical browsers.
|
||||||
|
|
||||||
|
<h2 id=presentational-markup>Cleaning up presentational markup</h2>
|
||||||
|
<p>Some tools generate HTML with presentational elements such as <font>,
|
||||||
|
<nobr>, and <center>.
|
||||||
|
Tidy's <code>-clean</code> option will replace those elements with CSS style
|
||||||
|
properties.
|
||||||
|
<p>Some HTML documents rely on the presentational effects of <p> start
|
||||||
|
tags that are not followed by any content. Tidy deletes such <p> tags
|
||||||
|
(as well as any headings that don’t have content). So do not use <p>
|
||||||
|
tags simply for adding vertical whitespace; instead use CSS, or the
|
||||||
|
<br> element. However, note that Tidy won’t discard <p> tags that
|
||||||
|
are followed by any nonbreaking space (that is, the &nbsp; named
|
||||||
|
character reference).
|
||||||
|
|
||||||
|
<h2 id=new-tags>Teaching Tidy about new tags</h2>
|
||||||
|
<p>You can teach Tidy about new tags by declaring them in the
|
||||||
|
configuration file, the syntax is:</p>
|
||||||
|
<pre>
|
||||||
|
new-inline-tags: <em>tag1, tag2, tag3</em>
|
||||||
|
new-empty-tags: <em>tag1, tag2, tag3</em>
|
||||||
|
new-blocklevel-tags: <em>tag1, tag2, tag3</em>
|
||||||
|
new-pre-tags: <em>tag1, tag2, tag3</em>
|
||||||
|
</pre>
|
||||||
|
<p>The same tag can be defined as empty and as inline or as empty
|
||||||
|
and as block.</p>
|
||||||
|
<p>These declarations can be combined to define a new empty
|
||||||
|
inline or empty block element. But you are not advised to declare
|
||||||
|
tags as being both inline and block.</p>
|
||||||
|
<p>Note that the new tags can only appear where Tidy expects inline
|
||||||
|
or block-level tags respectively. That means you can’t place
|
||||||
|
new tags within the document head or other contexts with restricted
|
||||||
|
content models.
|
||||||
|
|
||||||
|
<h2 id=php-asp-jste>Ignoring PHP, ASP, and JSTE instructions</h2>
|
||||||
|
<p>Tidy will gracefully ignore many cases of PHP, ASP, and JSTE
|
||||||
|
instructions within element content and as replacements for attributes,
|
||||||
|
and preserve them as-is in output; for example:</p>
|
||||||
|
<pre>
|
||||||
|
<option <% if rsSchool.Fields("ID").Value
|
||||||
|
= session("sessSchoolID")
|
||||||
|
then Response.Write("selected") %>
|
||||||
|
value='<%=rsSchool.Fields("ID").Value%>'>
|
||||||
|
<%=rsSchool.Fields("Name").Value%>
|
||||||
|
(<%=rsSchool.Fields("ID").Value%>)
|
||||||
|
</option>
|
||||||
|
</pre>
|
||||||
|
<p>But note that Tidy may report missing attributes when those are “hidden”
|
||||||
|
within the PHP, ASP, or JSTE code. If you use PHP, ASP, or JSTE code to
|
||||||
|
create a start tag, but place the end tag explicitly in the HTML markup, Tidy
|
||||||
|
won’t be able to match them up, and will delete the end tag. So in that
|
||||||
|
case you are advised to make the start tag explicit and to use PHP, ASP, or
|
||||||
|
JSTE code for just the attributes; for example:</p>
|
||||||
|
<pre>
|
||||||
|
<a href="<%=random.site()%>">do you feel lucky?</a>
|
||||||
|
</pre>
|
||||||
|
<p>
|
||||||
|
Tidy can also get things wrong if the PHP, ASP, or JSTE code includes
|
||||||
|
quotation marks; for example:
|
||||||
|
</p>
|
||||||
|
<pre>
|
||||||
|
value="<%=rsSchool.Fields("ID").Value%>"
|
||||||
|
</pre>
|
||||||
|
<p>Tidy will see the quotation mark preceding <i>ID</i> as ending the
|
||||||
|
attribute value, and proceed to complain about what follows.
|
||||||
|
<p>Tidy allows you to control whether line wrapping on spaces within
|
||||||
|
PHP, ASP, and JSTE
|
||||||
|
instructions is enabled; see the <b>wrap-php</b>, <b>wrap-asp</b>,
|
||||||
|
and <b>wrap-jste</b> config options.</p>
|
||||||
|
|
||||||
|
<h2 id=xml>Correcting well-formedness errors in XML markup</h2>
|
||||||
|
<p>Tidy can help you to correct well-formedness errors in XML markup. Tidy
|
||||||
|
doesn't yet recognize all XML features, though; for example, it doesn't
|
||||||
|
understand CDATA sections or DTD subsets.</p>
|
||||||
|
|
||||||
|
<h2 id="scripts">Using Tidy from scripts</h2>
|
||||||
|
<p>If you want to run Tidy from a Perl or other scripting language
|
||||||
|
you may find it of value to inspect the result returned by Tidy
|
||||||
|
when it exits: 0 if everything is fine, 1 if there were warnings
|
||||||
|
and 2 if there were errors. This is an example using Perl:</p>
|
||||||
|
<pre>
|
||||||
|
if (close(TIDY) == 0) {
|
||||||
|
my $exitcode = $? >> 8;
|
||||||
|
if ($exitcode == 1) {
|
||||||
|
printf STDERR "tidy issued warning messages\n";
|
||||||
|
} elsif ($exitcode == 2) {
|
||||||
|
printf STDERR "tidy issued error messages\n";
|
||||||
|
} else {
|
||||||
|
die "tidy exited with code: $exitcode\n";
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
printf STDERR "tidy detected no errors\n";
|
||||||
|
}
|
||||||
|
</pre>
|
||||||
|
|
||||||
|
<h2 id="implementation">Source code</h2>
|
||||||
|
<p>The source code for the experimental HTML5 fork of Tidy can be found at
|
||||||
|
<a href="https://github.com/w3c/tidy-html5">https://github.com/w3c/tidy-html5</a>.
|
||||||
|
|
||||||
|
<h2 id=acks>Acknowledgements</h2>
|
||||||
|
<p>Dave Raggett has a list of
|
||||||
|
<a href="http://www.w3.org/People/Raggett/tidy/#acks">Acknowledgements</a>
|
||||||
|
for people who made suggestions or reported bugs for the
|
||||||
|
original version of Tidy.
|
||||||
|
|
||||||
|
<div id=toc-button style="">
|
||||||
|
<a class=button href="
|
||||||
|
javascript:document.getElementById('toc').className = 'show';
|
||||||
|
document.getElementById('toc-button').className = 'hide';">Show TOC</a>
|
||||||
|
</div>
|
||||||
|
<div id=toc class=hide>
|
||||||
|
<a class=button href="
|
||||||
|
javascript:document.getElementById('toc').className = 'hide';
|
||||||
|
document.getElementById('toc-button').className = 'show';">Close</a>
|
||||||
|
<ol>
|
||||||
|
<li><a href="#what-tidy-does">What Tidy does</a>
|
||||||
|
<li><a href="#help">How to run Tidy from the command line</a>
|
||||||
|
<li><a href="#options">Options and configuration settings</a>
|
||||||
|
<li><a href="#config">Using a config file</a>
|
||||||
|
<li><a href="#sample-config">Sample config file</a>
|
||||||
|
<li><a href="#indenting">Indenting output for readability</a>
|
||||||
|
<li><a href="#preserve-indenting">Preserving original indenting not possible</a>
|
||||||
|
<li><a href="#encodings">Encodings and character references</a>
|
||||||
|
<li><a href="#accessibility">Accessibility</a>
|
||||||
|
<li><a href="#presentational-markup">Cleaning up presentational markup</a>
|
||||||
|
<li><a href="#new-tags">Teaching Tidy about new tags</a>
|
||||||
|
<li><a href="#php-asp-jste">Ignoring PHP, ASP, and JSTE instructions</a>
|
||||||
|
<li><a href="#xml">Correcting well-formedness errors in XML markup</a>
|
||||||
|
<li><a href="#scripts">Using Tidy from scripts</a>
|
||||||
|
<li><a href="#implementation">Source code</a>
|
||||||
|
<li><a href="#acks">Acknowledgements</a>
|
||||||
|
</ol>
|
||||||
|
</div>
|
Loading…
Reference in a new issue