Changed license to markdown.
Moved license to top level. Purged outdated documentation files.
This commit is contained in:
parent
7bf364624f
commit
ca9eb839aa
|
@ -1,14 +1,6 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
||||
<html>
|
||||
<head>
|
||||
<title>HTML Tidy License</title>
|
||||
</head>
|
||||
# HTML Tidy
|
||||
|
||||
<body>
|
||||
<pre>
|
||||
HTML Tidy
|
||||
|
||||
HTML parser and pretty printer
|
||||
## HTML parser and pretty printer
|
||||
|
||||
Copyright (c) 1998-2003 World Wide Web Consortium
|
||||
(Massachusetts Institute of Technology, European Research
|
||||
|
@ -34,17 +26,12 @@ for any purpose, without fee, subject to the following restrictions:
|
|||
|
||||
1. The origin of this source code must not be misrepresented.
|
||||
2. Altered versions must be plainly marked as such and must
|
||||
not be misrepresented as being the original source.
|
||||
not be misrepresented as being the original source.
|
||||
3. This Copyright notice may not be removed or altered from any
|
||||
source or altered source distribution.
|
||||
source or altered source distribution.
|
||||
|
||||
The copyright holders and contributing author(s) specifically
|
||||
permit, without fee, and encourage the use of this source code
|
||||
as a component for supporting the Hypertext Markup Language in
|
||||
commercial products. If you use this source code in a product,
|
||||
acknowledgment is not required but would be appreciated.
|
||||
</pre>
|
||||
|
||||
|
||||
</body>
|
||||
</html>
|
||||
acknowledgement is not required but would be appreciated.
|
BIN
htmldoc/.DS_Store
vendored
Normal file
BIN
htmldoc/.DS_Store
vendored
Normal file
Binary file not shown.
File diff suppressed because it is too large
Load diff
Binary file not shown.
Before Width: | Height: | Size: 1.3 KiB |
300
htmldoc/faq.html
300
htmldoc/faq.html
|
@ -1,300 +0,0 @@
|
|||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
<head>
|
||||
<meta name="generator" content=
|
||||
"HTML Tidy for Mac OS X (vers 1st June 2003), see www.w3.org" />
|
||||
<link type="text/css" rel="stylesheet" href="tidy.css" />
|
||||
<title>HTML Tidy - Frequently Asked Questions</title>
|
||||
<style type="text/css">
|
||||
code { font-weight: bold; }
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>HTML Tidy - Frequently Asked Questions</h1>
|
||||
|
||||
<h2>Overview</h2>
|
||||
|
||||
<p class="abstract">Certain questions about Tidy come up on a
|
||||
regular basis. These are some that have been culled from postings
|
||||
to the html-tidy@w3.org and tidy-develop@lists.sourceforge.net
|
||||
mailing lists. If you don't see your question addressed here, see
|
||||
<a href="#support">How To Get Support</a> below.</p>
|
||||
|
||||
<ul>
|
||||
<li><a href="#what-now">What Now?</a></li>
|
||||
|
||||
<li><a href="#support">How to Get Support?</a></li>
|
||||
|
||||
<li><a href="#bug">How to Submit A Bug Report</a></li>
|
||||
|
||||
<li><a href="#feature">How to Submit A Feature Request</a></li>
|
||||
|
||||
<li><a href="#layout">How Do I Control the Output Layout?</a></li>
|
||||
|
||||
<li><a href="#version">What Version of Tidy Should I Use?</a></li>
|
||||
|
||||
<li><a href="#regression">How Do I Run A Regression Test?</a></li>
|
||||
</ul>
|
||||
|
||||
<hr />
|
||||
<dl>
|
||||
<dt><a name="what-now" id="what-now"></a>What Now?</dt>
|
||||
|
||||
<dd><p>If you have a popup screen that reads as follows:
|
||||
<pre>
|
||||
HTML Tidy for Windows <vers 1st August 2002; built on Aug 8 2002, at 15:41:13>
|
||||
Parsing Console input <stdin>
|
||||
</pre>
|
||||
|
||||
<p>and do not know what to do next, read on.</p>
|
||||
|
||||
<p>Tidy is waiting for your HTML to come in, so it can parse it.
|
||||
Tidy is fundamentally a tool that reads in HTML cleans it up and
|
||||
writes it out again. It was developed as a program you run from the
|
||||
console prompt, but there are GUI encapsulations available, e.g.
|
||||
HTML-Kit, which you might prefer.</p>
|
||||
|
||||
<p>If you are using Windows, the first step is to unzip the zip file
|
||||
and place the tidy.exe file in a folder somewhere on your executables
|
||||
path. You may also want to set up a config file to save having to type
|
||||
lots of options each time you run Tidy. From the console prompt you can
|
||||
run Tidy like this:</p>
|
||||
|
||||
<pre>
|
||||
C> tidy -m mywebpage.html
|
||||
</pre>
|
||||
|
||||
<p>In this case, the <code>-m</code> option requests Tidy to write
|
||||
the tidied file back to the same filename as it read from
|
||||
(mywebpage.html). Tidy will give you a breakdown of the problems it
|
||||
found and the version of HTML the file appears to be using.</p>
|
||||
|
||||
<p>To get a listing of Tidy command line options, just type
|
||||
<code>tidy -?</code>. To see a listing on configuration options,
|
||||
try <code>tidy -help-config</code>. To get more info on the
|
||||
config options, see the <a
|
||||
href="http://tidy.sourceforge.net/docs/quickref.html">Quick Reference</a>.</p>
|
||||
|
||||
<p>See also Dave Raggett's <a href="http://tidy.sourceforge.net/docs/Overview.html#help">User Guide</a>.</p>
|
||||
|
||||
<p>If you're not comfortable with the DOS command line, you should
|
||||
try one of the <a href="http://tidy.sourceforge.net/#tidylibapps">GUI
|
||||
Applications</a>.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="support" id="support"></a>How To Get Support</dt>
|
||||
|
||||
<dd>
|
||||
<p>For general HTML Tidy support, the original mailing list
|
||||
html-tidy@w3.org is best. Sometimes developers are the last to
|
||||
know... Also, this list covers both Java and C versions, not to
|
||||
mention various value-added products such as GUI front ends, Perl
|
||||
and Python integration, etc. If you don't get a response after a
|
||||
couple tries or if you have a bug fix, bump it over to the
|
||||
developer list at tidy-develop@lists.sourceforge.net. It's not a
|
||||
hard line, but that is the general arrangement.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="bug" id="bug"></a>How to Submit A Bug Report</dt>
|
||||
|
||||
<dd>
|
||||
<p>You are encouraged to report bugs you found to the Tidy
|
||||
developer team. Tidy's quality depends on your feedback. You can
|
||||
either file your bug report in the Sourceforge <a
|
||||
href="http://sourceforge.net/tracker/?func=add&group_id=27659&atid=390963">
|
||||
bug tracker</a> for HTML Tidy (<em>recommended</em>) or send a mail
|
||||
to the mailing list at html-tidy@w3.org. Note you do <em>not</em>
|
||||
have to have a Sourceforge account in order to file bug reports, or
|
||||
be subscribed to html-tidy@w3.org in order to post messages to the
|
||||
list.</p>
|
||||
|
||||
<p>Prior to submitting a bug report, please check that the bug is
|
||||
not already known. Many are. If you are not sure, just ask. If it
|
||||
is new bug, make sure to include at least the following information
|
||||
in your report:</p>
|
||||
|
||||
<ul>
|
||||
<li>A desciption of what you think went wrong.</li>
|
||||
|
||||
<li>The HTML Tidy version (find it out by running <code>tidy
|
||||
-v</code>) and operating system you are running.</li>
|
||||
|
||||
<li>The input, that exposes the bug.<br />
|
||||
A small HTML document that reproduces the problem is best.</li>
|
||||
|
||||
<li>The configuration options you've used. Command line options
|
||||
like<br />
|
||||
<code>-asxml</code>, configuration files, etc. You may use
|
||||
<code>tidy -show-config</code> to get an overview of the active
|
||||
Tidy settings.</li>
|
||||
|
||||
<li>Your e-mail address for further questions and comments.</li>
|
||||
</ul>
|
||||
|
||||
<p>These information are necessary to reproduce whatever is
|
||||
failing, without them we cannot help you. Additional information -
|
||||
and patches - are very welcome!</p>
|
||||
|
||||
<p><em>Please include only one bug per report.</em> Reports with
|
||||
multiple bugs are less easy to track and some bugs may get
|
||||
missed.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="feature" id="feature"></a>How to Submit A Feature
|
||||
Request</dt>
|
||||
|
||||
<dd>
|
||||
<p>If you want Tidy to do something new that it doesn't do today
|
||||
(or stop doing something), then it is probably a feature
|
||||
request.</p>
|
||||
|
||||
<p>The process for submitting a feature request is very similar to
|
||||
bug requests. A different <a
|
||||
href="http://sourceforge.net/tracker/?atid=390966&group_id=27659">
|
||||
tracker</a> is used on SourceForge to denote the difference in
|
||||
subject matter.</p>
|
||||
|
||||
<p>As with bugs, please be sure that the feature has not already
|
||||
been requested. If the feature has already requested, you can add
|
||||
your comments to the feature request tracker, or send mail to the
|
||||
<a href="mailto:html-tidy@w3.org">mailing list</a> indicating your
|
||||
wish to also have the feature implemented. If the feature has not
|
||||
already been requested, send the same information as for a bug
|
||||
report, but place special emphasis on the desired output for a
|
||||
given input, desired options, etc. - please be as specific as
|
||||
possible about what you want Tidy to <em>do</em>.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="layout" id="layout"></a>How Do I Control the Output Layout?</dt>
|
||||
|
||||
<dd>
|
||||
<p>There are three primary options that control how Tidy
|
||||
formats your markup:</p>
|
||||
<ul>
|
||||
<li><a class="code"
|
||||
href="quickref.html#indent">indent</a></li>
|
||||
<li><a class="code"
|
||||
href="quickref.html#indent-attributes">indent-attributes</a></li>
|
||||
<li><a class="code"
|
||||
href="quickref.html#vertical-space">vertical-space</a></li>
|
||||
</ul>
|
||||
|
||||
<p>Briefly, <code>indent</code> sets the level of left-to-right indenting
|
||||
and, somewhat, how often elements are put onto a new line. The options
|
||||
are <code>yes</code>, <code>no</code>, and <code>auto</code>.
|
||||
<code>indent-attributes</code> is a flag that, when set, tells Tidy to
|
||||
put each attribute on a new line. <code>vertical-space</code> is a flag
|
||||
that, when set, tells Tidy to add some empty lines for readability. The
|
||||
default for all three is <code>no</code>. These options may be used in
|
||||
any combination to control you you want your markup to look. The best
|
||||
thing is to experiment a bit to see what you like. Be aware that
|
||||
<code>indent yes</code> is deprecated for production use as it will
|
||||
cause visual changes in most browsers.</p>
|
||||
|
||||
<p>To get Tidy <em>Classic</em> <code>--indent auto</code> layout, use the following options:</p>
|
||||
|
||||
<pre>
|
||||
indent: auto
|
||||
indent-attributes: no
|
||||
vertical-space: yes
|
||||
</pre>
|
||||
|
||||
<p>You can read about more <em>Pretty Print</em> options
|
||||
<a href="quickref.html#PrettyPrintHeader">here</a>.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="version" id="version"></a>What Version of Tidy Should
|
||||
I Use?</dt>
|
||||
|
||||
<dd>
|
||||
<p>The current Source Forge builds are recommended. You can find these at
|
||||
<a href="http://tidy.sourceforge.net">http://tidy.sourceforge.net</a>.
|
||||
People continue to report examples where Tidy does not catch some
|
||||
ill-formed HTML or, worse, generates ill-formed HTML. These cases have
|
||||
been significantly reduced. That said, be sure to test Tidy with some
|
||||
representative files from your environment.</p>
|
||||
|
||||
<p>For development work, use CVS directly on your development
|
||||
system. For information on how to pull Tidy sources from <a
|
||||
href="http://sourceforge.net/cvs/?group_id=27659">CVS</a>. This way
|
||||
you can keep abreast of changes to Tidy and quickly resolve
|
||||
conflicts.</p>
|
||||
|
||||
<p>For building a front end (e.g. GUI or language binding), the
|
||||
simplest approach is to use TidyLib. For more information
|
||||
about building and coding with TidyLib, see the <a
|
||||
href="http://tidy.sourceforge.net/libintro.html">Introduction To TidyLib</a>.</p>
|
||||
</dd>
|
||||
|
||||
<dt><a name="regression" id="regression">How Do I Run A
|
||||
Regression Test?</a></dt>
|
||||
<dd>
|
||||
<p>You might ask, "Why should I run a regression test?". If you
|
||||
are a Tidy user, you might want to compare a new version of Tidy
|
||||
to the version you are currently running. This is a good idea
|
||||
if you are using Tidy in production applications such as web
|
||||
publishing. If you are a Tidy developer, it is a good idea to
|
||||
run the regression test suite to make sure your fix or enhancement
|
||||
doesn't add new bugs.</p>
|
||||
|
||||
<p>Detecting new bugs is easier said than done, because sometimes
|
||||
they are subtle and can only be seen in browsers (or one particular
|
||||
browser you don't even have). But you can catch most crashes and
|
||||
many layout problems by running the test suite as described here.</p>
|
||||
|
||||
<p>The basic process is simple: run the test suite <strong>before</strong>
|
||||
and <strong>after</strong> making changes to TidyLib and compare the output
|
||||
markup and messages. Be aware that the test scripts for WinNT/2K/XP
|
||||
(alltest.cmd) and Linux/Unix (testall.sh) place the output files in
|
||||
<code>tidy/test/tmp</code>. If you forget to run the <strong>before</strong>
|
||||
test, you can always download a binary from the <a
|
||||
href="http://tidy.sourceforge.net/#binaries">Project Page</a>. If you
|
||||
are not a TidyLib developer, you can download the <a
|
||||
href="http://tidy.sourceforge.net/test/tidy_test.tgz">Test Suite</a>
|
||||
directly. Here are the steps to evaluate the impact of a TidyLib change.</p>
|
||||
|
||||
<h3>For Windows</h3>
|
||||
<p><strong>Before</strong> making changes:</p>
|
||||
<pre>
|
||||
C:\tidy\test> alltest.cmd
|
||||
C:\tidy\test> ren tmp baseline
|
||||
</pre>
|
||||
|
||||
<p><strong>After</strong> making changes and building Tidy:</p>
|
||||
<pre>
|
||||
C:\tidy\test> alltest.cmd
|
||||
C:\tidy\test> windiff tmp baseline
|
||||
</pre>
|
||||
|
||||
<h3>For Linux/Unix</h3>
|
||||
<p><strong>Before</strong> making changes:</p>
|
||||
<pre>
|
||||
~/tidy/test$ ./testall.sh
|
||||
~/tidy/test$ mv tmp baseline
|
||||
</pre>
|
||||
|
||||
<p><strong>After</strong> making changes and building Tidy:</p>
|
||||
<pre>
|
||||
~/tidy/test$ ./testall.sh
|
||||
~/tidy/test$ diff -u tmp baseline > diff.txt
|
||||
</pre>
|
||||
</dd>
|
||||
|
||||
<!--
|
||||
<dt><a name="" id=""></a></dt>
|
||||
<dd>
|
||||
</dd>
|
||||
|
||||
<dt><a name="" id=""></a></dt>
|
||||
<dd>
|
||||
</dd>
|
||||
-->
|
||||
<!-- Save for future questions
|
||||
<dt><a name="" id=""></a></dt>
|
||||
<dd>
|
||||
</dd>
|
||||
-->
|
||||
</dl>
|
||||
</body>
|
||||
</html>
|
BIN
htmldoc/grid.gif
BIN
htmldoc/grid.gif
Binary file not shown.
Before Width: | Height: | Size: 1.5 KiB |
|
@ -1,554 +0,0 @@
|
|||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
<head>
|
||||
<meta name="generator" content="HTML Tidy, see www.w3.org" />
|
||||
<title>HTML TIDY - Notes on pending work</title>
|
||||
<meta name="keywords"
|
||||
content="HTML, validation, error correction, pretty-printing" />
|
||||
<meta name="author" content="Dave Raggett <dsr@w3.org>" />
|
||||
<style type="text/css">
|
||||
body {
|
||||
margin-left: 10%;
|
||||
margin-right: 10%;
|
||||
font-family: sans-serif
|
||||
}
|
||||
h1 { margin-left: -8% }
|
||||
h2,h3,h4,h5,h6 { margin-left: -4% }
|
||||
pre { color: green; font-weight: bold;
|
||||
font-size: 80%; font-family: monospace}
|
||||
em { font-style: italic; font-weight: bold }
|
||||
strong { text-transform: uppercase; font-weight: bold }
|
||||
.note {font-style: italic; color: rgb(192, 101, 101) }
|
||||
//hr {text-align: center; width: 60% }
|
||||
blockquote {
|
||||
color: navy;
|
||||
margin-left: 1%;
|
||||
margin-right: 1%;
|
||||
text-align: center;
|
||||
font-family: "Comic Sans MS", "Times New Roman", serif
|
||||
}
|
||||
table {
|
||||
font-family: sans-serif;
|
||||
font-size: 80%;
|
||||
background: rgb(255,255,153)
|
||||
}
|
||||
td {
|
||||
font-size: 80%
|
||||
}
|
||||
.people {font-family: "Lucida Calligraphy", serif}
|
||||
:link { color: rgb(0, 0, 153) }
|
||||
:visited { color: rgb(153, 0, 153) }
|
||||
:active { color: rgb(255, 0, 102) }
|
||||
a :hover { color: rgb(0, 0, 255) }
|
||||
</style>
|
||||
|
||||
<style type="text/css">
|
||||
p.c1 {font-style: italic}
|
||||
</style>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" background="grid.gif" text="black"
|
||||
link="navy" vlink="black" alink="red">
|
||||
<h1>HTML TIDY - Notes on Pending Work</h1>
|
||||
|
||||
<p><a href="http://www.w3.org/People/Raggett">Dave Raggett</a> <a
|
||||
href="mailto:dsr@w3.org">dsr@w3.org</a></p>
|
||||
|
||||
<p>This is a page where I am keeping the suggestions for
|
||||
improvements or bug fixes. My current work load means that I
|
||||
don't get much time to work on HTML Tidy, so I am interested in
|
||||
offers of help!</p>
|
||||
|
||||
<h4>Public Email List for Tidy: <<a
|
||||
href="mailto:html-tidy@w3.org">html-tidy@w3.org</a>></h4>
|
||||
|
||||
<p>I have set up an archived mailing list devoted to Tidy. To
|
||||
subscribe send an email to html-tidy-request@w3.org with the word
|
||||
subscribe in the subject line (include the word unsubscribe if
|
||||
you want to unsubscribe). The <a
|
||||
href="http://lists.w3.org/Archives/Public/html-tidy/">archive</a>
|
||||
for this list is accessible online. Please use this list to
|
||||
report errors or enhancement requests.</p>
|
||||
|
||||
<h2>Things awaiting further attention</h2>
|
||||
|
||||
<ul>
|
||||
<li>Support for BIG5 and ShiftJIS (Rick Jelliffe)</li>
|
||||
|
||||
<li>Stronger checking on which attributes appear on what
|
||||
elements</li>
|
||||
|
||||
<li>Sorting attributes in a canonical order</li>
|
||||
|
||||
<li>Version checking for HTML 4.01 vs 4.0 (Tidy currently will
|
||||
set the document type to 4.01 in preference to 4.0)</li>
|
||||
|
||||
<li>Noticing that the document isn't really XHTML if it isn't
|
||||
wellformed, i.e. it lacks end tags and quotes on attribute
|
||||
values</li>
|
||||
|
||||
<li>Converting <font face="Symbol">a</font> etc. to
|
||||
the corresponding Unicode characters, when cleaning HTML.</li>
|
||||
|
||||
<li>link checking - this would involve some platform dependent
|
||||
code as the network interface varies significantly from one
|
||||
platform to the next.</li>
|
||||
|
||||
<li>When exporting Word2000 to Web page, there is a need for
|
||||
smarter rules of thumb for working out whether the paragraph is a
|
||||
bulletted or numbered list item, and determining the level of
|
||||
nesting. Perhaps the style attribute holds the key? This tends to
|
||||
include substrings like: "mso-list:l0 level1 lfo2;" and
|
||||
"mso-list:l1 level1 lfo1;". Unfortunately, these aren't always
|
||||
present, and I have yet to figure out a foolproof heuristic.</li>
|
||||
</ul>
|
||||
|
||||
<p>I need to set up an index of precisely what attributes are
|
||||
supported on each element. Right now, some elements check their
|
||||
own attributes, whilst others are checked via default checks
|
||||
defined for each attribute independently of the element. Until
|
||||
this is done, you sometimes find that validation services
|
||||
discovering errors unnoticed by Tidy itself.</p>
|
||||
|
||||
<p>Jelks Cabaniss asks: <i>Could Tidy be made to automatically
|
||||
"clean" (FONTs to CSS) if the Strict DOCTYPE is requested? An
|
||||
HTML or XHTML Strict document can't have FONT tags according to
|
||||
the DTDs</i>. Jelks has a bunch of other good ideas such as
|
||||
converting the bgcolor attribute over to CSS.</p>
|
||||
|
||||
<p>Adding an option to select slide transition effects. I would
|
||||
also like to provide an optional feature for sorting attribute
|
||||
values.</p>
|
||||
|
||||
<p>I am having problems with form elements as direct children of
|
||||
tr or table. It is dangerous to create an implicit table cell,
|
||||
and what is needed is a way to move the form element into the
|
||||
next cell. If this can't be done an error needs to be raised
|
||||
since Tidy will be stuck. On a separate note, Tidy is still
|
||||
breaking lines between <img> and </a> which in
|
||||
Netscape shows as an underlined space. It's fine in IE.</p>
|
||||
|
||||
<p>Benjamin Holzman <bah@orientation.com> writes: I'm
|
||||
wrapping tidy (release-date 2000.01.13) in some perl objects
|
||||
(using SWIG), and CharEncoding being a global is a bit of a pain.
|
||||
I was wondering what your thoughts would be on how to fix that.
|
||||
The character encoding is already a property of struct Out; is
|
||||
there any reason why making it part of struct StreamIn as well,
|
||||
and perhaps setting that property in OpenInput, based on the
|
||||
existing CharEncoding variable, wouldn't allow us to move
|
||||
CharEncoding to be local to main?</p>
|
||||
|
||||
<p>Oh, in case you're curious about the API, here's a short
|
||||
script using my wrappers to be an html to xhtml filter:</p>
|
||||
|
||||
<pre>
|
||||
#!/usr/bin/perl
|
||||
|
||||
require tidy;
|
||||
|
||||
my $tidy = Tidy->new(*STDIN);
|
||||
my $document = $tidy->parse;
|
||||
$tidy->as_xhtml(*STDOUT);
|
||||
</pre>
|
||||
|
||||
<p>Rick Parsons would like there to be a new wrap-attributes
|
||||
option that can be used to suppress line wrapping within
|
||||
attributes. There is already a similar option for JavaScript
|
||||
literals.</p>
|
||||
|
||||
<p>Vijay Patil would like tidy -h to display options sorted
|
||||
alphabetically.</p>
|
||||
|
||||
<p>Julian Reschke would like there to be an option to add the
|
||||
xml:space="preserve" attribute to pre elements when outputting
|
||||
xml.</p>
|
||||
|
||||
<p>Armando Asantos would like to use Tidy to produce a list of
|
||||
URLs for images or hypertext links according to a config option.
|
||||
This would be straightforward, but is a lower priority than bug
|
||||
fixes etc.</p>
|
||||
|
||||
<p>Omri Traub would like an option to wrap the contents of style
|
||||
and script elements in CDATA marked sections when converting to
|
||||
XHTML. He is also interested in direct support for 16 bit
|
||||
character file I/O.</p>
|
||||
|
||||
<p>Bertilo Wennergren notes:</p>
|
||||
|
||||
<blockquote>If I configure Tidy to "upgrade to style sheets", it
|
||||
does so for a few things in my main document, but the code thus
|
||||
created get error reports if I feed it back to Tidy. It turns out
|
||||
that Tidy creates extra "class" attributes on tags that already
|
||||
have "class" attributes set. This happens with this page:
|
||||
<http://www.concinnity.se/bertilow/index.htm>.</blockquote>
|
||||
|
||||
<p>Randi Waki notes:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>If a quoted URL attribute value (e.g., href in <a>
|
||||
elements) contains a line break, 13-Jan-2000 Tidy changes the
|
||||
line break to a space while IE and Netscape discard the line
|
||||
break. This can result in a broken link in the tidied
|
||||
document.</p>
|
||||
|
||||
<p>I believe the following change fixes the problem. In lexer.c,
|
||||
insert the following lines before line 2502:</p>
|
||||
|
||||
<pre>
|
||||
/* discard line breaks in quoted URLs */
|
||||
if (c == '\n' && IsUrl(name))
|
||||
continue;
|
||||
|
||||
/* existing line 2502 */ c = ' ';
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>Stephen Reynolds would like Tidy to keep track of whether a
|
||||
comment started on a new line and preserve this in the
|
||||
output.</p>
|
||||
|
||||
<p>Terry Teague says:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>Sorry, I should have been more clear. Part of the problem is
|
||||
the current HelpText() function in localize.c doesn't actually
|
||||
reflect current reality.</p>
|
||||
|
||||
<p>You need to at least add the following line to HelpText()
|
||||
:</p>
|
||||
|
||||
<pre>
|
||||
tidy_out(out, " -version or -v show version\n");
|
||||
</pre>
|
||||
|
||||
<p>And I suppose it should mention the use of the new
|
||||
"--<config options>" type syntax.</p>
|
||||
|
||||
<p>Regards, Terry</p>
|
||||
</blockquote>
|
||||
|
||||
<p>John Russel notes:</p>
|
||||
|
||||
<pre>
|
||||
what i wonder is
|
||||
1] does the specification indicate these are WRONG
|
||||
2] if so why do they pass thru tidy ....
|
||||
is url syntax such a can of worms that it is left to user
|
||||
to check .......
|
||||
|
||||
CASE 1: misuse of slash for folders
|
||||
site had background="pics\fancy.jpg"
|
||||
instead of "pics/fancy.jpg"
|
||||
|
||||
CASE 2: spaces in filename
|
||||
site had href="coin album.html"
|
||||
instead of "coin%20album.html"
|
||||
</pre>
|
||||
|
||||
<p>Andre Stechert would like a way to prevent Tidy from
|
||||
"cleaning" newly declared elements which don't have any content
|
||||
but do have end tags, see his mail of 17th January 2000</p>
|
||||
|
||||
<p>Todd Clark would like to use Tidy with Microsoft's WebClass
|
||||
tags. Unfortunately these include unusual characters in the tag
|
||||
names such as @ which Tidy objects to, for instance:</p>
|
||||
|
||||
<pre>
|
||||
<WC@DOMAINNAME>test.com</WC@DOMAINNAME>
|
||||
</pre>
|
||||
|
||||
<p>Perhaps it makes sense to offer an option to make Tidy less
|
||||
picky about what characters it accepts in tag names. Or perhaps
|
||||
"WebClass: yes".</p>
|
||||
|
||||
<p>Jelks Cabaniss suggests an option to control dropping of empty
|
||||
elements, e.g. according to what attributes they have.</p>
|
||||
|
||||
<p>Paavo Hartikainen writes:</p>
|
||||
|
||||
<blockquote>
|
||||
<p>Tidy always expands '&' to '&' even if I have
|
||||
'quote-ampersand: no' defined in configuration file. This is not
|
||||
a good thing to do for URLs that have '&' characters in them.
|
||||
OS is Debian GNU/Linux 2.1 SPARC. Same thing happens on Alpha.
|
||||
Other architectures I have not tried.</p>
|
||||
|
||||
<p>My configuration looks like this:</p>
|
||||
|
||||
<pre>
|
||||
char-encoding: latin1
|
||||
error-file: ./errors
|
||||
indent-spaces: 2
|
||||
logical-emphasis: yes
|
||||
output-xhtml: yes
|
||||
quiet: no
|
||||
quote-ampersand: no
|
||||
show-warnings: yes
|
||||
tidy-mark: yes
|
||||
wrap: 78
|
||||
wrap-attributes: no
|
||||
write-back: yes
|
||||
keep-time: yes
|
||||
</pre>
|
||||
</blockquote>
|
||||
|
||||
<p>Paul White reports that Tidy isn't recognizing HTML 3.2 when
|
||||
the doctype is "-//W3C//DTD HTML 3.2 Final//EN" (as per the REC),
|
||||
and similarly for HTML 4.01. This would appear to call for a
|
||||
change to the table of names in lexer.c.</p>
|
||||
|
||||
<p>Stuart Hungerford would like Tidy to detect and fix duplicate
|
||||
attributes e.g. multiple class attributes. Celeste Suliin Burris
|
||||
would like Tidy to replace spaces in URLs by %20 as some versions
|
||||
of Netscape "croak big time" on this. Denis Kokarev also wants
|
||||
Tidy to remove duplicate attributes when the values are the same.
|
||||
This apparently stops XSLT from working. Brian Schweitzer notes
|
||||
that Tidy adds a 2nd class attribute rather than merging the
|
||||
classes into a space separated list.</p>
|
||||
|
||||
<p>Bertilo Wennergren writes: Tidy seems not to recognize frame
|
||||
elements with a closing "/". It actually removes them. Try his <a
|
||||
href="http://www.concinnity.se/bertilow/pmeg/pmeg9/k_bazo.htm">example</a>.
|
||||
Tidy can produce XHTML Frameset docs, but when fed them back</p>
|
||||
|
||||
<p>again it cries foul.</p>
|
||||
|
||||
<p>Jose Manuel Cerqueira Esteves notes:</p>
|
||||
|
||||
<pre>
|
||||
I've used `tidy' to convert a few HTML 4.0 files to XHTML 1.0 and noticed
|
||||
a problem when dealing with constructs like
|
||||
|
||||
<small><small>some text</small></small>
|
||||
|
||||
First, `tidy' acts as if the second "<small>" was meant as a closing tag:
|
||||
|
||||
Warning: "<small> is probably intended as </small>"
|
||||
|
||||
Then it trims the resulting empty <small></small>:
|
||||
|
||||
Warning: trimming empty <small>
|
||||
|
||||
And finally both remaining closing tags ("</small>"), now spurious,
|
||||
are removed:
|
||||
|
||||
Warning: discarding unexpected </small>
|
||||
Warning: discarding unexpected </small>
|
||||
|
||||
It would be convenient to have at least some `tidy' option to prevent this
|
||||
from happening (or perhaps some different heuristics?).
|
||||
</pre>
|
||||
|
||||
<p>Robbert Hans Baron would like to see Tidy warning about
|
||||
duplicate attributes and fixing these when the values are
|
||||
identical.</p>
|
||||
|
||||
<p>Jutta Wrage notes that: When parsing HTML 3.2 Pages, tidy
|
||||
doesn't accept textareas in forms correctly. The HTML Reference
|
||||
specification (HTML 3.2 Final) allows: name, rows and cols, but
|
||||
upon seeing these Tidy thinks the document is 4.0.</p>
|
||||
|
||||
<p>Matthew Brealey notes that a heading start tag is coerced to
|
||||
an end heading tag when the end tag is missing. This is
|
||||
deliberate, but perhaps not the best heuristic.</p>
|
||||
|
||||
<p>HIYAMA Masayuki notes that Tidy should set the encoding
|
||||
attribute to match the language encoding, e.g. ?xml version="1.0"
|
||||
encoding="iso-2022-jp"?><.</p>
|
||||
|
||||
<p>Mark Modrall has extended Tidy to support selectively
|
||||
stripping out listed tags and attributes, see his email of March
|
||||
14th.</p>
|
||||
|
||||
<p>Yong Taek Bae notes that with the omit end tags option Tidy
|
||||
omits the body tag even if it has attributes. This is an
|
||||
error.</p>
|
||||
|
||||
<p>Tapio Markula reports that Tidy is incorrectly replacing
|
||||
accented characters in script elements by entities. The script
|
||||
element (in HTML but not XHTML) is CDATA and as such entities
|
||||
won't be expanded. This bug needs to be fixed along with the
|
||||
support for CDATA sections.</p>
|
||||
|
||||
<p>Terrill Bennett reports tidy crashing when producing slides,
|
||||
and when the -i option has been set. He later added the crash
|
||||
occurs when the page doesn't include an h1 element. See
|
||||
Terrill-Bennett-11mar00.txt.</p>
|
||||
|
||||
<p>Stephen Lewis notes that if an <hr> element is present
|
||||
in the head before the title element, then Tidy gets confused and
|
||||
adds in a spurious extra empty title element. This would be
|
||||
avoided if Tidy could move the hr into the body before the body
|
||||
element is encountered. This raises a number of problems for
|
||||
instance working out when to copy in attributes from an explicit
|
||||
body element.</p>
|
||||
|
||||
<p>Carl Osterly would like Tidy to avoid breaking lines before or
|
||||
after the = sign in attribute values when this is practical.
|
||||
Perhaps a simple rule of thumb could be used to decide this?</p>
|
||||
|
||||
<p>Rick H Wesson notes that Tidy crashes on CDATA marked sections
|
||||
when parsing XML.</p>
|
||||
|
||||
<p>Luigi Federici would like an option to set the DTD URI for XML
|
||||
or XHTML.</p>
|
||||
|
||||
<p>Mat Sander notes: If I have php code the indentation behaves
|
||||
strange. Repeated tidying php content and end tag indented one
|
||||
level extra for each time. The result ends up something like
|
||||
this:</p>
|
||||
|
||||
<pre>
|
||||
...
|
||||
<?php
|
||||
$r=0;
|
||||
?<
|
||||
...
|
||||
|
||||
I have the fillowing config file for Tidy:
|
||||
---
|
||||
tidy-mark: no
|
||||
markup: yes
|
||||
wrap: 0
|
||||
indent: auto
|
||||
output-xml: no
|
||||
output-xhtml: yes
|
||||
doctype: loose
|
||||
char-encoding: latin1
|
||||
quote-marks: yes
|
||||
assume-xml-procins: yes
|
||||
word-2000: yes
|
||||
clean: yes
|
||||
logical-emphasis: yes
|
||||
drop-empty-paras: yes
|
||||
enclose-text: yes
|
||||
fix-bad-comments: yes
|
||||
alt-text: .
|
||||
write-back: bool
|
||||
keep-time: yes
|
||||
show-warnings: no
|
||||
quiet: yes
|
||||
split: no
|
||||
---
|
||||
|
||||
Best Regards,
|
||||
Mats-Olof Sander
|
||||
|
||||
</pre>
|
||||
|
||||
<p>Don Hasson notes that if you make a mistake and leave off the
|
||||
ending "/" in the <title> tag, tidy will generate an extra
|
||||
set of <title>s.</p>
|
||||
|
||||
<p>Example:</p>
|
||||
|
||||
<pre>
|
||||
<html>
|
||||
<head><title>No end here<title></head>
|
||||
<body>
|
||||
Empty
|
||||
</body>
|
||||
</html>
|
||||
|
||||
</pre>
|
||||
|
||||
<p>produces this:</p>
|
||||
|
||||
<pre>
|
||||
<html>
|
||||
<head>
|
||||
<title>No end here</title>
|
||||
<title></title>
|
||||
</head>
|
||||
<body>
|
||||
Empty
|
||||
</body>
|
||||
</html>
|
||||
|
||||
</pre>
|
||||
|
||||
<p>Jeff Wilkinson would like the HTML Tidy page to include
|
||||
internal anchors so that he can link directly to the appropriate
|
||||
sections.</p>
|
||||
|
||||
<p>Peter Vince would like to be able to clean presentation
|
||||
attributes on the body element, as well as translating b and i to
|
||||
span.</p>
|
||||
|
||||
<p>Dave Bryan and Mathew Brealey would like there to be a way to
|
||||
suppress the default handling of inline elements in favor of
|
||||
simply inserting the appropriate end tag when encountering an
|
||||
element that isn't allowed in an inline context. The default
|
||||
behavior replicates the rendering on existing browsers but can
|
||||
cause problems for hand editors.</p>
|
||||
|
||||
<p>Dave Bryan notes that tidy isn't updating the column position
|
||||
when parsing attributes.</p>
|
||||
|
||||
<p>Can Tidy track when a line break occurs after a PI or comment
|
||||
and reproduce this in the output? This idea occurred to me after
|
||||
reading a comment from Brad Stowers.</p>
|
||||
|
||||
<p>One interesting suggestion is to make some of Tidy's rules of
|
||||
thumb sensitive to the program that generated the markup as
|
||||
indicated by the meta element. This would allow for greater
|
||||
robustness in how the rules operate.</p>
|
||||
|
||||
<p>Dave Bryan would like the quiet mode to be tweaked to suppress
|
||||
the general info at the end of the report. see
|
||||
Dave-Bryan-24mar00.txt.</p>
|
||||
|
||||
<p>Erik Rossen would like an option to suppress line wrap within
|
||||
tags, so that the tag is always on the same line regardless of
|
||||
the number and length of the attributes.</p>
|
||||
|
||||
<p>Dan Satria suggest that the clean mechanism check to see if
|
||||
there are any existing matching style rules before adding new
|
||||
ones.</p>
|
||||
|
||||
<p>Zoltan Hawryluk suggests mapping the Netscape layer tag into
|
||||
the equivalent CSS positioning syntax.</p>
|
||||
|
||||
<p>Jim Walker says Tidy doesn't correctly report errors such as
|
||||
<tt></</head></tt>.</p>
|
||||
|
||||
<p>Tidy's slide feature: see Johannes-Poutre-12jul00.txt</p>
|
||||
|
||||
<p>Carole Mah suggests Tidy should recover from multiple class
|
||||
attributes on the same element.</p>
|
||||
|
||||
<h2>Other ideas</h2>
|
||||
|
||||
<ul>
|
||||
<li>Recursion through subdirectories, so you can fix up your
|
||||
entire web site at one go. This assumes I can find a way that is
|
||||
portable across a wide range of platforms!</li>
|
||||
|
||||
<li>Support for W3C's <a
|
||||
href="http://www.w3.org/TR/REC-DOM-Level-1/">Document Object
|
||||
Model</a> (DOM) level one.</li>
|
||||
|
||||
<li>Full validation of all attribute values.</li>
|
||||
|
||||
<li>Mapping Unicode bidi control characters to HTML tags.</li>
|
||||
|
||||
<li>Full support for parsing XML (still somewhat limited).</li>
|
||||
|
||||
<li>How to say which XML elements should be printed
|
||||
"inline".</li>
|
||||
|
||||
<li>Acting on the XML encoding attribute, e.g.
|
||||
<?xml encoding="iso-8859-1"></li>
|
||||
|
||||
<li>Improved mapping from HTML presentation attributes/elements
|
||||
to CSS.</li>
|
||||
|
||||
<li>Improved support for <a
|
||||
href="http://java.sun.com/products/jsp/">JSP</a> (Java Server
|
||||
pages)</li>
|
||||
|
||||
<li>Ugly print option which removes all optional whitespace</li>
|
||||
</ul>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -24,7 +24,7 @@
|
|||
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
|
||||
<head>
|
||||
<title>HTML Tidy Configuration Options Quick Reference</title>
|
||||
<link type="text/css" rel="stylesheet" href="tidy.css" />
|
||||
<link type="text/css" rel="stylesheet" href="quickref-html.css" />
|
||||
</head>
|
||||
|
||||
<body>
|
||||
|
|
File diff suppressed because it is too large
Load diff
BIN
htmldoc/tidy.gif
BIN
htmldoc/tidy.gif
Binary file not shown.
Before Width: | Height: | Size: 244 B |
Loading…
Reference in a new issue