diff --git a/htmldoc/Overview.html b/htmldoc/Overview.html deleted file mode 100644 index 5c80529..0000000 --- a/htmldoc/Overview.html +++ /dev/null @@ -1,1546 +0,0 @@ - - - - -Clean up your Web pages with HTML TIDY - - - - - - - - - -

icon Clean up your Web pages
- with HTML TIDY

- -

This version 4th August 2000

- -

Copyright © 1998-2000 W3C, see tidy.c for copyright notice.

- -
With many thanks to Hewlett Packard for financial -support during the development of this software!
- -
-

How to use Tidy | Downloading Tidy | Release Notes
- Integration with other Software | Acknowledgements

- -
-

To get the latest version of Tidy please visit the original -version of this page at: http://www.w3.org/People/Raggett/tidy/. -Courtesy of Netmind, you can register for email reminders when -new versions of tidy become available.

- -
-
-
- -

The public email list devoted to HTML Tidy is: <html-tidy@w3.org>. To -subscribe send an email to html-tidy-request@w3.org with the word -subscribe in the subject line (include the word unsubscribe if -you want to unsubscribe). The archive -for this list is accessible online. Please use this list to -report errors or enhancement requests. See the release notes for -information on recent changes. Your feedback is welcome!

- -

If you find HTML Tidy useful and you would like to say thanks, -then please send me a (paper) postcard or other souvenir from the -area in which you live along with a few words on what you are -using Tidy for. It will be fun to map out where Tidy users are to -be found! My postal address is given at -the end of this file.

- -

Tutorials for HTML and CSS

- -

If you are just starting off and would like to know more about -how to author Web pages, you may find my guide to HTML and CSS -helpful. Please send me feedback on this, and I will do my best -to further improve it.

- -

Support for Word2000

- -

Tidy can now perform wonders on HTML saved from Microsoft Word -2000! Word bulks out HTML files with stuff for round-tripping -presentation between HTML and Word. If you are more concerned -about using HTML on the Web, check out Tidy's "Word-2000" config option! Of course Tidy -does a good job on Word'97 files as well!

- -

Introduction to TIDY

- -

When editing HTML it's easy to make mistakes. Wouldn't it be -nice if there was a simple way to fix these mistakes -automatically and tidy up sloppy editing into nicely layed out -markup? Well now there is! Dave Raggett's HTML TIDY is a free -utility for doing just that. It also works great on the -atrociously hard to read markup generated by specialized HTML -editors and conversion tools, and can help you identify where you -need to pay further attention on making your pages more -accessible to people with disabilities.

- -

Tidy is able to fix up a wide range of problems and to bring -to your attention things that you need to work on yourself. Each -item found is listed with the line number and column so that you -can see where the problem lies in your markup. Tidy won't -generate a cleaned up version when there are problems that it -can't be sure of how to handle. These are logged as "errors" -rather than "warnings".

- -

Tidy features in a recent -article on XHTML by webreview.com.

- - -

Examples of TIDY at work

- -

Tidy corrects the markup in a way that matches where possible -the observed rendering in popular browsers from Netscape and -Microsoft. Here are just a few examples of how TIDY perfects your -HTML for you:

- - - -

Layout style

- -

You can choose which style you want Tidy to use when it -generates the cleaned up markup: for instance whether you like -elements to indent their contents or not. Several people have -asked if Tidy could preserve the original layout. I am sorry to -say that this would be very hard to support due to the way Tidy -is implemented. Tidy starts by building a clean parse tree from -the source file. The parse tree doesn't contain any information -about the original layout. Tidy then pretty prints the parse tree -using the current layout options. Trying to preserve the original -layout would interact badly with the repair operations needed to -build a clean parse tree and considerably complicate the -code.

- -

Some browsers can screw up the right alignment of text -depending on how you layout headings. As an example, -consider:

- -
-<h1 align="right">
-  Heading
-</h1>
-
-<h1 align="right">Heading</h1>
-
- -

Both of these should be rendered the same. Sadly a common -browser bug fails to trim trailing whitespace and misaligns the -first heading. HTML Tidy will protect you from this bug, except -when you set the indent option to "yes".

- -

Setting the indent option to yes can also cause problems with -table layout for some browsers:

- -
-<td><img src="foo.gif"></td>
-<td><img src="foo.gif"></td>
-
- -

will look slightly different from:

- -
-<td>
-  <img src="foo.gif">
-</td>
-<td>
-  <img src="foo.gif">
-</td>
-
- -

You can avoid such quirks by using indent: no or -indent: auto in the config file.

- -

Internationalization issues

- -

Tidy offers you a choice of character encodings: US ASCII, ISO -Latin-1, UTF-8 and the ISO 2022 family of 7 bit encodings. The -full set of HTML 4.0 entities are defined. Cleaned up output uses -HTML entity names for characters when appropriate. Otherwise -characters outside the normal range are output as numeric -character entities. Tidy defaults to assuming you want the output -to be in US ASCII. Tidy doesn't yet recognize the use of the HTML -meta element for specifying the character encoding.

- -

Accessibility

- -

Tidy offers advice on accessibility problems for people using -non-graphical browsers. The most common thing you will see is the -suggestion you add a summary attribute to table elements. The -idea is to provide a summary of the table's role and structure -suitable for use with aural browsers.

- -

Cleaning up presentational markup

- -

Many tools generate HTML with an excess of FONT, NOBR and -CENTER tags. Tidy's -clean option will replace them by -style properties and rules using CSS. This makes the markup -easier to read and maintain as well as reducing the file size! -Tidy is expected to get smarter at this in the future.

- -

Some pages rely on the presentation effects of isolated -<p> or </p> tags.Tidy deletes empty paragraph and -heading elements etc. The use of empty paragraph elements is not -recommended for adding vertical whitespace. Instead use style -sheets, or the <br> element. Tidy won't discard paragraphs -only containing a nonbreaking space &nbsp;

- -

Teaching Tidy about new tags!

- -

You can teach Tidy about new tags by declaring them in the -configuration file, the syntax is:

- -
-  new-inline-tags: tag1, tag2, tag3
-  new-empty-tags: tag1, tag2, tag3
-  new-blocklevel-tags: tag1, tag2, tag3
-  new-pre-tags: tag1, tag2, tag3
-
- -

The same tag can be defined as empty and as inline or as empty -and as block.

- -

These declarations can be combined to define an a new empty -inline or empty block element, but you are not advised to declare -tags as being both inline and block!

- -

Note that the new tags can only appear where Tidy expects -inline or block-level tags respectively. This means you can't -(yet) place new tags within the document head or other contexts -with restricted content models. So far the most popular use of -this feature is to allow Tidy to be applied to Cold Fusion -files.

- -

I am working on ways to make it easy to customize -the permitted document syntax using assertion -grammars, and hope to apply this to a much smarter version of -Tidy for release later this year or early next year.

- -

Limited support for ASP, JSTE and PHP

- -

Tidy is somewhat aware of the preprocessing language called -ASP which uses a pseudo element syntax <% ... %> -to include preprocessor directives. ASP is normally interpreted -by the web server before delivery to the browser. JSTE shares the -same syntax, but sometimes also uses <# ... #>. -Tidy can also cope with another such language called PHP, which -uses the syntax <?php ... ?>

- -

Tidy will cope with ASP, JSTE and PHP pseudo elements within -element content and as replacements for attributes, for -example:

- -
-  <option <% if rsSchool.Fields("ID").Value
-    = session("sessSchoolID")
-    then Response.Write("selected") %>
-    value='<%=rsSchool.Fields("ID").Value%>'>
-    <%=rsSchool.Fields("Name").Value%>
-    (<%=rsSchool.Fields("ID").Value%>)
-  </option>
-
- -

Note that Tidy doesn't understand the scripting language used -within pseudo elements and attributes, and can easily get -confused. Tidy may report missing attributes when these are -hidden within preprocessor code. Tidy can also get things wrong -if the code includes quote marks, e.g. if the example above is -changed to:

- -
-    value="<%=rsSchool.Fields("ID").Value%>"
-
- -

Tidy will now see the quote mark preceding ID as ending the -attribute value, and proceed to complain about what follows. Note -you can choose whether to allow line wrapping on spaces within -pseudo elements or not using the wrap-asp option. If you -used ASP, JSTE or PHP to create a start tag, but placed the end -tag explicitly in the markup, Tidy won't be able to match them -up, and will delete the end tag for you. So in this case you are -advise to make the start tag explicit and to use ASP, JSTE or PHP -for just the attributes, e.g.

- -
-   <a href="<%=random.site()%>">do you feel lucky?</a>
-
- -

Tidy allows you to control whether line wrapping is enabled -for ASP, JSTE and PHP instructions, see the wrap-asp, wrap-jste -and wrap-php config options, respectively.

- -

I regret that Tidy does not support Tango preprocessing -instructions which look like:

- -
-<@if variable_1='a'>
-    do something
-<@else>
-    do nothing
-</@if>
-
-<@include <@cgi><@appfilepath>includes/message.html>
-
- -

Tidy supports another preprocessing syntax called "Tango", but -only for attribute values. Adding support for pseudo elements -written in Tango looks as if it would be quite tough, so I would -like to gauge the level of interest before committing to this -work.

- -

Limited support for XML

- -

XML processors compliant with W3C's XML 1.0 recommendation are -very picky about which files they will accept. Tidy can help you -to fix errors that cause your XML files to be rejected. Tidy -doesn't yet recognize all XML features though, e.g. it doesn't -understand CDATA sections or DTD subsets.

- -

Creating Slides

- -

The -slides option allows you to burst a single HTML -file into a number of linked slides. Each H2 element in the input -file is treated as delimiting the start of the next slide. The -slides are named slide1.html, slide2.html, slide3.html etc. This -is a relatively new feature and ideas are welcomed as to how to -improve it. In particular, I plan to add support to the -configuration file for setting the style sheet for slides and for -customizing the slides via a template.

- -

I would be interested in hearing from anyone who can offer -help with using JavaScript for adding dynamic effects to slides, -for instance similar to those available in Microsoft -PowerPoint.

- -

Indenting text for a better layout

- -

Indenting the content of elements makes the markup easier to -read. Tidy can do this for all elements or just for those where -it's needed. The auto-indent mode has been used below to avoid -indenting the content of title, p and li elements:

- -
-<html>
-  <head>
-    <title>Test document</title>
-  </head>
-
-  <body>
-    <p>para which has enough text to cause a line break,
-    and so test the wrapping mechanism for long lines.</p>
-<pre>
-This is
-<em>genuine
-       preformatted</em>
-   text
-</pre>
-
-    <ul>
-      <li>1st list item</li>
-
-      <li>2nd list item</li>
-    </ul>
-    <!-- end comment -->
-  </body>
-</html>
-
- -

Indenting the content does increase the size of the file, so -you may prefer Tidy's default style:

- -
- <html>
- <head>
- <title>Test document</title>
- </head>
- <body>
- <p>para which has enough text to cause a line break,
- and so test the wrapping mechanism for long lines.</p>
- 
- <pre>This is
- <em>genuine
-       preformatted</em>
-    text
- </pre>
- 
- <ul>
- <li>1st list item </li>
- 
- <li>2nd list item</li>
- </ul>
- 
- <!-- end comment -->
- </body>
- </html>
- 
-
- -

How to run tidy

- -
-   tidy [[options] filename]*
-
- -

HTML tidy is not (yet) a Windows program. If you run tidy -without any arguments, it will just sit there waiting to read -markup on the stdin stream. Tidy's input and output default to -stdin and stdout respectively. Errors are written to stderr but -can be redirected to a file with the -f filename -option.

- -

I generally use the -m option to get tidy to update the -original file, and if the file is particularly bad I also use the --f option to write the errors to a file to make it easier to -review them. Tidy supports a small set of character encoding -options. The default is ASCII, which makes it easy to edit markup -in regular text editors.

- -

For instance:

- -
-   tidy -f errs.txt -m index.html
-
- -

which runs tidy on the file "index.html" updating it in place -and writing the error messages to the file "errs.txt". Its a good -idea to save your work before tidying it, as with all complex -software, tidy may have bugs. If you find any please let me -know!

- -

Thanks to Jacek Niedziela, The Win32 executable for tidy is -now able to example wild cards in filenames. This utilizes the -setargv library supplied with VC++.

- -

Tidy writes errors to stderr, and won't be paused by the more -command. A work around is to redirect stderr to stdout as -follows. This works on Unix and Windows NT, but not on other -platforms. My thanks to Markus Wolf for this tip!

- -
-   tidy file.html 2>&1 | more
-
- -

Tidy's Options

- -

To get a list of available options use:

- -
-   tidy -help
-
- -

You may want to run it through more to view the help a page at -a time.

- -
-   tidy -help | more
-
- -

Input and Output default to stdin/stdout respectively. Single -letter options apart from -f may be combined as in: tidy -f -errs.txt -imu foo.html

- -

Matej Vela <vela@debian.org> has written -a Unix man page for Tidy, but for the -latest details on config options and for the release notes please -visit this page: http://www.w3.org/People/Raggett/tidy.

- -

Using a Configuration -File

- -

Tidy now supports a configuration file, and this is now much -the most convenient way to configure Tidy. Assuming you have -created a config file named "config.txt" (the name doesn't -matter), you can instruct Tidy to use it via the command line -option -config config.txt, e.g.

- -
-   tidy -config config.txt file1.html file2.html
-
- -

Alternatively, you can name the default config file via the -environment variable named "HTML_TIDY". Note this should be the -absolute path since you are likely to want to run Tidy in -different directories. You can also set a config file at compile -time by defining TIDY_CONFIG_FILE as the path string, see -platform.h.

- -

You can now set config options on the command line by -preceding the name of the option immediately (no intervening -space) by "--", for example:

- -
-  tidy --break-before-br true --show-warnings false
-
- -

The following options are supported:

- -
-
tidy-mark: bool
- -
If set to yes (the default) Tidy will add a meta -element to the document head to indicate that the document has -been tidied. To suppress this, set tidy-mark to no. Tidy -won't add a meta element if one is already present.
- -
markup: bool
- -
Determines whether Tidy generates a pretty printed version of -the markup. Bool values are either yes or no. -Note that Tidy won't generate a pretty printed version if it -finds unknown tags, or missing trailing quotes on attribute -values, or missing trailing '>' on tags. The default is -yes.
- -
wrap: number
- -
Sets the right margin for line wrapping. Tidy tries to wrap -lines so that they do not exceed this length. The default is 66. -Set wrap to zero if you want to disable line wrapping.
- -
wrap-attributes: bool
- -
If set to yes, attribute values may be wrapped -across lines for easier editing. The default is no. This option -can be set independently of wrap-scriptlets
- -
wrap-script-literals: bool
- -
If set to yes, this allows lines to be wrapped -within string literals that appear in script attributes. The -default is no. The example shows how Tidy wraps a really -really long script string literal inserting a backslash character -before the linebreak: - -
-<a href="somewhere.html" onmouseover="document.status = '...some \
-really, really, really, really, really, really, really, really, \
-really, really long string..';">test</a>
-
-
- -
wrap-asp: bool
- -
If set to no, this prevents lines from being wrapped -within ASP pseudo elements, which look like: -<% ... %>. The default is yes.
- -
wrap-jste: bool
- -
If set to no, this prevents lines from being wrapped -within JSTE pseudo elements, which look like: -<# ... #>. The default is yes.
- -
wrap-php: bool
- -
If set to no, this prevents lines from being wrapped -within PHP pseudo elements. The default is yes.
- -
literal-attributes: bool
- -
If set to yes, this ensures that whitespace -characters within attribute values are passed through unchanged. -The default is no.
- -
tab-size: number
- -
Sets the number of columns between successive tab stops. The -default is 4. It is used to map tabs to spaces when reading -files. Tidy never outputs files with tabs.
- -
indent: no, yes or auto
- -
If set to yes, Tidy will indent block-level tags. -The default is no. If set to auto Tidy will -decide whether or not to indent the content of tags such as -title, h1-h6, li, td, th, or p depending on whether or not the -content includes a block-level element. You are advised to avoid -setting indent to yes as this can expose layout bugs in some -browsers.
- -
indent-spaces: number
- -
Sets the number of spaces to indent content when indentation -is enabled. The default is 2 spaces.
- -
indent-attributes: bool
- -
If set to yes, each attribute will begin on a new -line. The default is no.
- -
hide-endtags: bool
- -
If set to yes, optional end-tags will be omitted -when generating the pretty printed markup. This option is ignored -if you are outputting to XML. The default is no.
- -
input-xml: bool
- -
If set to yes, Tidy will use the XML parser rather -than the error correcting HTML parser. The default is -no.
- -
output-xml: bool
- -
If set to yes, Tidy will use generate the pretty -printed output writing it as well-formed XML. Any entities not -defined in XML 1.0 will be written as numeric entities to allow -them to be parsed by an XML parser. The tags and attributes will -be in the case used in the input document, regardless of other -options. The default is no.
- -
add-xml-pi: bool
- -
add-xml-decl: bool
- -
If set to yes, Tidy will add the XML declatation -when outputting XML or XHTML. The default is no. Note -that if the input document includes an <?xml?> declaration -then it will appear in the output independent of the value of -this option.
- -
output-xhtml: bool
- -
If set to yes, Tidy will generate the pretty printed -output writing it as extensible HTML. The default is no. -This option causes Tidy to set the doctype and default namespace -as appropriate to XHTML. If a doctype or namespace is given they -will checked for consistency with the content of the document. In -the case of an inconsistency, the corrected values will appear in -the output. For XHTML, entities can be written as named or -numeric entities according to the value of the "numeric-entities" -property. The tags and attributes will be output in the case used -in the input document, regardless of other options.
- -
doctype: omit, auto, strict, loose or -<fpi>
- -
This property controls the doctype declaration generated by -Tidy. If set to omit the output file won't contain a -doctype declaration. If set to auto (the default) Tidy -will use an educated guess based upon the contents of the -document. If set to strict, Tidy will set the doctype to -the strict DTD. If set to loose, the doctype is set to -the loose (transitional) DTD. Alternatively, you can supply a -string for the formal public identifier (fpi) for example:
- -
-
-    doctype: "-//ACME//DTD HTML 3.14159//EN"
-
-
- -
If you specify the fpi for an XHTML document, Tidy will set -the system identifier to the empty string. Tidy leaves the -document type for generic XML documents unchanged.
- -
char-encoding: raw, ascii, latin1, utf8 or -iso2022
- -
Determines how Tidy interprets character streams. For -ascii, Tidy will accept Latin-1 character values, but -will use entities for all characters whose value > 127. For -raw, Tidy will output values above 127 without -translating them into entities. For latin1 characters -above 255 will be written as entities. For utf8, Tidy -assumes that both input and output is encoded as UTF-8. You can -use iso2022 for files encoded using the ISO2022 family -of encodings e.g. ISO 2022-JP. The default is -ascii.
- -
numeric-entities: bool
- -
Causes entities other than the basic XML 1.0 named entities -to be written in the numeric rather than the named entity form. -The default is no
- -
quote-marks: bool
- -
If set to yes, this causes " characters to be -written out as &quot; as is preferred by some editing -environments. The apostrophe character ' is written out as -&#39; since many web browsers don't yet support &apos;. -The default is no.
- -
quote-nbsp: bool
- -
If set to yes, this causes non-breaking space -characters to be written out as entities, rather than as the -Unicode character value 160 (decimal). The default is -yes.
- -
quote-ampersand: bool
- -
If set to yes, this causes unadorned & -characters to be written out as &amp;. The default is -yes.
- -
assume-xml-procins: bool
- -
If set to yes, this changes the parsing of -processing instructions to require ?> as the terminator rather -than >. The default is no. This option is -automatically set if the input is in XML.
- -
fix-backslash: bool
- -
If set to yes, this causes backslash characters "\" -in URLs to be replaced by forward slashes "/". The default is -yes.
- -
break-before-br: bool
- -
If set to yes, Tidy will output a line break before -each <br> element. The default is no.
- -
uppercase-tags: bool
- -
Causes tag names to be output in upper case. The default is -no resulting in lowercase, except for XML input where -the original case is preserved.
- -
uppercase-attributes: bool
- -
If set to yes attribute names are output in upper -case. The default is no resulting in lowercase, except -for XML where the original case is preserved.
- -
word-2000: -bool
- -
If set to yes, Tidy will go to great pains to strip -out all the surplus stuff Microsoft Word 2000 inserts when you -save Word documents as "Web pages". The default is no. -Note that Tidy doesn't yet know what to do with VML markup from -Word, but in future I hope to be able to map VML to SVG.
-
- Microsoft has developed its own optional filter for exporting to -HTML, and the 2.0 version is much improved. You can download the -filter free from the -Microsoft Office Update site.
- -
clean: bool
- -
If set to yes, causes Tidy to strip out surplus -presentational tags and attributes replacing them by style rules -and structural markup as appropriate. It works well on the html -saved from Microsoft Office'97. The default is no.
- -
logical-emphasis: bool
- -
If set to yes, causes Tidy to replace any occurrence -of i by em and any occurrence of b by strong. In both cases, the -attributes are preserved unchanged. The default is no. -This option can now be set independently of the clean and -drop-font-tags options.
- -
drop-empty-paras: bool
- -
If set to yes, empty paragraphs will be discarded. -If set to no, empty paragraphs are replaced by a pair of -br elements as HTML4 precludes empty paragraphs. The -default is yes.
- -
drop-font-tags: bool
- -
If set to yes together with the clean option (see -above), Tidy will discard font and center tags rather than -creating the corresponding style rules. The default is -no.
- -
enclose-text: bool
- -
If set to yes, this causes Tidy to enclose any text -it finds in the body element within a p element. This is useful -when you want to take an existing html file and use it with a -style sheet. Any text at the body level will screw up the -margins, but wrap the text within a p element and all is well! -The default is no.
- -
enclose-block-text: bool
- -
If set to yes, this causes Tidy to insert a p -element to enclose any text it finds in any element that allows -mixed content for HTML transitional but not HTML strict. The -default is no.
- -
fix-bad-comments: bool
- -
If set to yes, this causes Tidy to replace -unexpected hyphens with "=" characters when it comes across -adjacent hyphens. The default is yes. This option is -provided for users of Cold Fusion which uses the comment syntax: -<!--- --->
- -
add-xml-space: bool
- -
If set to yes, this causes Tidy to add -xml:space="preserve" to elements such as pre, style and script -when generating XML. This is needed if the whitespace in such -elements is to be parsed appropriately without having access to -the DTD. The default is no.
- -
alt-text: string
- -
This allows you to set the default alt text for img -attributes. This feature is dangerous as it suppresses further -accessibility warnings. YOU ARE RESPONSIBLE FOR MAKING YOUR -DOCUMENTS ACCESSIBLE TO PEOPLE WHO CAN'T SEE THE -IMAGES!!!
- -
write-back: bool
- -
If set to yes, Tidy will write back the tidied -markup to the same file it read from. The default is no. -You are advised to keep copies of important files before tidying -them as on rare occasions the result may not always be what you -expect.
- -
keep-time: bool
- -
If set to yes, Tidy won't alter the last modified -time for files it writes back to. The default is yes. -This allows you to tidy files without effecting which ones will -be uploaded to the Web server when using a tool such as -'SiteCopy'. Note that this feature may not work on some -platforms.
- -
error-file: filename
- -
Writes errors and warnings to the named file rather than to -stderr.
- -
show-warnings: bool
- -
If set to no, warnings are suppressed. This can be -useful when a few errors are hidden in a flurry of warnings. The -default is yes.
- -
quiet: bool
- -
If set to yes, Tidy won't output the welcome message -or the summary of the numbers of errors and warnings. The default -is no.
- -
gnu-emacs: bool
- -
If set to yes, Tidy changes the format for reporting -errors and warnings to a format that is more easily parsed by GNU -Emacs. The default is no.
- -
split: bool
- -
If set to yes Tidy will use the input file to create -a sequence of slides, splitting the markup prior to each -successive <h2>. You can see an example of the results in a -recent -talk I made on XHTML. The slides are written to -"slide1.html", "slide2.html" etc. The default is -no.
- -
new-empty-tags: tag1, tag2, tag3
- -
Use this to declare new empty inline tags. The option takes a -space or comma separated list of tag names. Unless you declare -new tags, Tidy will refuse to generate a tidied file if the input -includes previously unknown tags. Remember to also declare empty -tags as either inline or blocklevel, see below.
- -
new-inline-tags: tag1, tag2, tag3
- -
Use this to declare new non-empty inline tags. The option -takes a space or comma separated list of tag names. Unless you -declare new tags, Tidy will refuse to generate a tidied file if -the input includes previously unknown tags.
- -
new-blocklevel-tags: tag1, tag2, tag3
- -
Use this to declare new block-level tags. The option takes a -space or comma separated list of tag names. Unless you declare -new tags, Tidy will refuse to generate a tidied file if the input -includes previously unknown tags. Note you can't change the -content model for elements such as table, ul, ol and dl. This is -explained in more detail in the release notes.
- -
new-pre-tags: tag1, tag2, tag3
- -
Use this to declare new tags that are to be processed in -exactly the same way as HTML's pre element. The option takes a -space or comma separated list of tag names. Unless you declare -new tags, Tidy will refuse to generate a tidied file if the input -includes previously unknown tags. Note you can't as yet add new -CDATA elements (similar to script).
-
- -

Sample Config File

- -

This is just an example to get you started.

- -
-// sample config file for HTML tidy
-indent: auto
-indent-spaces: 2
-wrap: 72
-markup: yes
-output-xml: no
-input-xml: no
-show-warnings: yes
-numeric-entities: yes
-quote-marks: yes
-quote-nbsp: yes
-quote-ampersand: no
-break-before-br: no
-uppercase-tags: no
-uppercase-attributes: no
-char-encoding: latin1
-new-inline-tags: cfif, cfelse, math, mroot, 
-  mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
-  munder, mover, mmultiscripts, msup, msub, mtext,
-  mprescripts, mtable, mtr, mtd, mth
-new-blocklevel-tags: cfoutput, cfquery
-new-empty-tags: cfelse
-
- -

Using Tidy from -scripts

- -

If you want to run Tidy from a Perl or other scripting -language you may find it of value to inspect the result returned -by Tidy when it exits: 0 if everything is fine, 1 if there were -warnings and 2 if there were errors. This is an example using -Perl:

- -
-if (close(TIDY) == 0) {
-  my $exitcode = $? >> 8;
-  if ($exitcode == 1) {
-    printf STDERR "tidy issued warning messages\n";
-  } elsif ($exitcode == 2) {
-    printf STDERR "tidy issued error messages\n";
-  } else {
-    die "tidy exited with code: $exitcode\n";
-  }
-} else {
-  printf STDERR "tidy detected no errors\n";
-}
-
- -

Downloadable -Binaries

- -

If you are prepared to maintain a public URL for -HTML Tidy compiled for a specific platform, please let me know so -that I can add a link to your page. This will avoid the need for -me to update this page whenever you recompile.

- -
-

Windows 95/98/NT/2000

- -

tidy.exe. -Windows 95/98/NT/2000 executable (32-bit Windows console-mode -program). This is the executable that I maintain as part of the -HTML Tidy distribution. The command line parameters are described -above, along with the extensive configuration file options.

- -

HTML-Kit - a -free HTML editor for Windows 95/98/NT/2000 with integrated -support for Tidy.

- -

TidyGUI. -Windows front end for running Tidy, written by André -Blavier. André has also written a Windows COM -wrapper for Tidy. He describes how to use this from -Visual Basic.

- -

Evrsoft's 1st Page -2000 - a free HTML editor for Windows 95/98/NT/2000 with -integrated support for Tidy. 1st Page 2000 is a high-end -authoring tool that makes it easy to add effects based upon -scripting.

- -

NoteTab - an -award winning text and html editor for Windows with built-in -support for running HTML Tidy. NoteTab is written by Eric -Fookes.

- -

Mac OS

- -Several versions of HTML -Tidy for Mac OS are available, including a standalone -Macintosh application with a graphical user interface, a BBEdit -plugin, a MPW tool, or as a FilterTop filter ( -Screenshot). My thanks to Terry Teague for this -port.
-
- - -

Atari

- -

Arnaud Bercegeay's site for the Atari binary for Tidy.

- -

Amiga

- -

Keith Blakemore-Noble maintains a page for Tidy -on Amiga.

- -

BeOS

- -

Peter Enzerink is maintaining HTML -Tidy for BeOS. Link points to download for HTML Tidy as well -as HTML Tidy editor addons for BeOS.

- -

AIX

- -

Ciaran Deignan maintains an AIX -binary for Tidy. The link is to a general download page. The -executable is available for AIX 4.3.2 and later.

- -

Linux

- -

Dimitri Papadopoulos maintains a Tidy RPM package -for Redhat Linux You may also be able to find Tidy on other -Linux distribution sites, e.g. http://rpmfind.net/.

- - -

UnixWare

- -

Simon Trimmer <simon@ocston.org> maintains -a Tidy binary for -Unixware.

- -

HP-UX

- -

You can get precompiled versions of Tidy for HPUX, from -Olaf Hopp, and from Ian -Springer.

- -

MSDOS

- -

Nick B. maintains Tidy386 for -DOS. This exploits the DPMI mechanism for the memory -management.

- -

Solaris

- -

Stephen Fuqua maintains a page for Tidy on -Solaris.

- -

OS/2

- -

Kaz SHiMZ <kshimz@sfc.co.jp> maintains -an OS/2 -binary for Tidy.

- -

FreeBSD

- -

Martin Fouts maintains Tidy on -FreeBSD.

- -

RISC OS

- -

Alex Macfarlane -Smith maintains a port -of Tidy to the RISC OS.

- -

MiNT (Atari) OS

- -

Edgar Aichinger -maintains a -port of Tidy to the MiNT OS. MiNT is a UNIX for m68k Atari -computers and is nearly FHS compliant (we don't use bootable OS -images nor have any mounting capabilities, so neither /boot nor -/mnt are used). The binary also runs on ordinary TOS, since the -MiNT libraries cover all GEMDOS/GEM functions.

-
- -

Integrating Tidy as part of -other Software

- -

You can also incorporate Tidy as part of a larger program, for -instance in HTML editors or HTML transformation tools used for -import filters, or for when you want to customize Web content to -get the best out of different kinds of browsers. Imagine -authoring clean HTML with CSS and at a touch of a button -producing variants that look great and work reliably on a large -variety of different browsers, taking into account the quirks of -each. For instance, providing the ability to tune content for -different versions of Netscape and Internet Explorer, and for -browsers running on set-top boxes for televisions, handheld and -palmtop devices, cell phones, and voice browsers. I am happy to -quote for software development for such tools.

- -

Sebastian Lange has contributed a perl wrapper for calling -Tidy from your perl scripts, see sl-tidy.pl.

- -

Using Tidy from emacs

- -

Pete Gelbman emailed this -tip for using Tidy with the Unix version of emacs. lets you -highlight a region of text and run Tidy on it. Tidy's "fixed" -output will replace your highlighted region right in place. The -error/warnings output will be directed into a separate -mini-buffer below in your main screen.

- -

Java port of HTML Tidy

- -

Andy Quick <ac.quick@sympatico.ca> -maintains a Java port of Tidy, so you can now integrate Tidy into -your Java applications. Andy is tracking the releases of Tidy in -C (this page). More information is available on Andy's home -page.

- -

Source -Code

- -

The code is in ANSI C and uses the C standard library for i/o. -The parser works top down, building a complete parse tree in -memory. Document text is held as Unicode represented as UTF-8 in -a character buffer that expands as needed. The code has so far -been tested on Windows'95, Windows'98, Windows NT, Windows 2000, -Linux, FreeBSD, NetBSD, Ultrix, OSF, OS/MP, IRIX, NeXtStep, -MacOS, BeOS, OS/2, AIX, Amiga, Atari, SunOS, Solaris, IRIX and -HP-UX, amongst others.

- -

Here is a link to the Open Source copyright -notice and license.

- -
-
tidy4aug00.tgz
- -
gzipped tar file for source code (Unix line ends)
- -
tidy4aug00.zip
- -
zipped source code (Windows line ends)
- -
platform.h, html.h
- -
the include files with common definitions
- -
config.c
- -
support for customizing Tidy via config files
- -
lexer.c
- -
lexical analysis and buffer management
- -
parser.c
- -
HTML and XML parsers
- -
tags.c
- -
dictionary of tags and their properties
- -
attrs.c
- -
dictionary of attributes and their properties
- -
istack.c
- -
stack of active inline elements
- -
entities.c
- -
dictionary of entities
- -
clean.c
- -
smarts for cleaning up presentational markup
- -
pprint.c
- -
pretty printing for HTML and XML
- -
localize.c
- -
Change this file to localize tidy's messages
- -
tidy.c
- -
main() and error reporting routines
- -
Makefile
- -
Makefile for gcc
- -
Unix Man page
- -
Maintained by Matej Vela <vela@debian.org>
-
- -

Conventions for whether lines end with CRLF, LF or CR vary -from one system to another. I have included the C source for a -utility tab2space which can be used to ensure that files -use the line end convention of your choice, and to expand tabs to -spaces.

- -
-   tab2space -t4 -unix *.h *.c
-   tab2space -tabs -unix Makefile
-
- -

Note use of "-tabs" to ensure that tabs are preserved in the -Makefile (it won't work without them!).

- -

For those of you on Unix, here is a script you can use to -strip carriage returns:

- -
-#!/bin/sh
-echo Stripping Carriage Returns from files...
-for i
-do
-        # If a writable file
-        if [ -f $i ]
-        then
-                if [ -w $i ]
-                then
-                        echo $i
-                        # strip CRs from input and output to temp file
-                        tr -d '\015' < $i > toix.tmp
-                        mv toix.tmp $i
-                else
-                        echo $i: write-protected
-                fi
-        else
-                echo $i: not a file
-        fi
-done
-
- -

Save this script to a file, e.g. "scripcr" and use -"chmod +x stripcr" to make it executable. You can then -run it as "stripcr *.c *.h Overview.html Makefile"

- -

Acknowledgements

- -

I would like to thank the many people who have written to me -with suggestions for improvements or reporting bugs. Your help -has been invaluable.

- -
Jonathan Adair, Drew Adams, Osma -Ahvenlampi, Carsten Allefeld, Richard Allsebrook, Jacob Sparre -Andersen, Joe D'Andrea, Jerry Andrews, Bruce Aron, Takuya Asada, -Edward Avis, Carlos Piqueres Ayela, Nick B, Chang Hyun Baek, Nick -B, Denis Barbier, Chuck Baslock, Christer Bernerus, David J. -Biesack, John Bigby, Yu Jian Bin, Alexander Biron, Keith -Blakemore-Noble, Eric Blossom, Berend de Boer, Ochen M. Braun, -Dave Bryan, David Brooke, Andy Brown, Keith B. Brown, Andreas -Buchholz, Maurice Buxton, Jelks Cabaniss, John Cappelletti, -Trevor Carden, Terry Cassidy, Mathew Cepl, Kendall Clark, Rob -Clark, Jeremy Clulow, Dan Connolly, Larry Cousin, Ken Cox, Luis -M. Cruz, John Cumming, Ian Davey, Keith Davies, Ciaran Deignan, -David Duffy, Emma Duke-Williams, Tamminen Eero, Bodo Eing, Peter -Enzerink, Baruch Even, David Fallon, Claus André -Färber, Stephanie Foott, Darren Forcier, Martin Fouts, -Frederik Fouvry, Rene Fritz, Stephen Fuqua, Martin Gallwey, Pete -Gelbman, Francisco Guardiola, David Getchell, Michael Giroux, -Davor Golek, Guus Goos, Léa Gris, Rainer Gutsche, Kai -Hackemesser, Juha Häikiö, David Halliday, -Johann-Christian Hanke, Vlad Harchev, Shane Harrelson, Andre -Hinrichs, Bjoern Hoehrmann, G. Ken Holman, Bill Homer, Olaf Hopp, -Craig Horman, Jack Horsfield, Nigel Horspool, Pao-Hsi Huang, -Stuart Hungerford, Marc Jauvin, Rick Jelliffe, Peter Jeremy, -Craig Johnson, Charles LaFountain, Steven Lobo, Zdenek Kabelac, -Michael Kay, Jeffery Kendall, Axel Kielhorn, Konstantinos -Kleisouris, Johannes Koch, Daniel Kohn, Rudy Kohut, Allan -Kuchinsky, Volker Kuhlmann, Michael LaStella, Johnny Lee, Steve -Lee, Tony Leneis, Nick Leverton, Todd Lewis, Dietmar Lippold, -Gert-Jan C. Lokhorst, Murray Longmore, John Love-Jensen, -Satwinder Mangat, Carole Mah, Anton Marsden, Bede McCall, Shane -McCarron, Thomas McGuigan, Ian McKellar, Al Medeiros, Chris -Nappin, Ann Navarro, Jacek Niedziela, Morten Blinksbjerg Nielsen, -Kenichi Numata, Allan Odgaard, Matt Oshry, Gerald Oskoboiny, Paul -Ossenbruggen, Ernst Paalvast, Christian Pantel, Dimitri -Papadopoulos, Rick Parsons, Steven Pemberton, Daniel Persson, Lee -Anne Phillips, Xavier Plantefeve, Karl Prinz, Andy Quick, Jany -Quintard, Julian Reschke, Stephen Reynolds, Thomas Ribbrock, Ross -L. Richardson, Philip Riebold, Erik Rossen, Dan Rudman, Peter -Ruevski, Christian Ruetgers, Klaus Johannes Rusch, John Russell, -Eric Schindler, J. Schlauch, Christian Schüler, Klaus -Alexander Seistrup, Jim Seymour, Kazuyoshi Shimizu, Geoff -Sinclair, Jo Smith, Paul Smith, Steve Spilker, Rafi Stern, -Jacques Steyn, Michael J. Suzio, Zac Thompson, Eric Thorbjornsen, -Oren Tirosh, John Tobler, Omri Traub, Loïc Trégan, -Jason Tribbeck, Simon Trimmer, Steffen Ullrich, Stuart Updegrave, -Charles A. Upsdell, Jussi Vestman, Larry W. Virden, Daniel -Vogelheim, Nigel Wadsworth, Jez Wain, Randy Waki, Paul Ward, Neil -Weber, Bertilo Wennergren, Yudong Yang, Jeff Young, Edward Zalta, -Johannes Zellner, Christian Zuckschwerdt
- -

Dave's Address

- -
-    73b Ground Corner
-    Holt
-    Wiltshire
-    BA14 6RT
-    United Kingdom
-
- -

Dave -Raggett <dsr@w3.org> is -an engineer from Hewlett -Packard's UK -Laboratories, and works on assignment to the World Wide Web -Consortium, where he is the W3C lead for HTML, XForms and Voice -Browsers and Math.

- - - diff --git a/htmldoc/checked_by_tidy.gif b/htmldoc/checked_by_tidy.gif deleted file mode 100644 index 47c2c48..0000000 Binary files a/htmldoc/checked_by_tidy.gif and /dev/null differ diff --git a/htmldoc/faq.html b/htmldoc/faq.html deleted file mode 100644 index fade8ea..0000000 --- a/htmldoc/faq.html +++ /dev/null @@ -1,300 +0,0 @@ - - - - - -HTML Tidy - Frequently Asked Questions - - - -

HTML Tidy - Frequently Asked Questions

- -

Overview

- -

Certain questions about Tidy come up on a -regular basis. These are some that have been culled from postings -to the html-tidy@w3.org and tidy-develop@lists.sourceforge.net -mailing lists. If you don't see your question addressed here, see -How To Get Support below.

- - - -
-
-
What Now?
- -

If you have a popup screen that reads as follows: -

-HTML Tidy for Windows <vers 1st August 2002; built on Aug 8 2002, at 15:41:13>
-Parsing Console input <stdin>
-
- -

and do not know what to do next, read on.

- -

Tidy is waiting for your HTML to come in, so it can parse it. - Tidy is fundamentally a tool that reads in HTML cleans it up and -writes it out again. It was developed as a program you run from the -console prompt, but there are GUI encapsulations available, e.g. -HTML-Kit, which you might prefer.

- -

If you are using Windows, the first step is to unzip the zip file -and place the tidy.exe file in a folder somewhere on your executables -path. You may also want to set up a config file to save having to type -lots of options each time you run Tidy. From the console prompt you can -run Tidy like this:

- -
-C> tidy -m mywebpage.html
-
- -

In this case, the -m option requests Tidy to write -the tidied file back to the same filename as it read from -(mywebpage.html). Tidy will give you a breakdown of the problems it -found and the version of HTML the file appears to be using.

- -

To get a listing of Tidy command line options, just type -tidy -?. To see a listing on configuration options, -try tidy -help-config. To get more info on the -config options, see the Quick Reference.

- -

See also Dave Raggett's User Guide.

- -

If you're not comfortable with the DOS command line, you should -try one of the GUI -Applications.

-
- -
How To Get Support
- -
-

For general HTML Tidy support, the original mailing list -html-tidy@w3.org is best. Sometimes developers are the last to -know... Also, this list covers both Java and C versions, not to -mention various value-added products such as GUI front ends, Perl -and Python integration, etc. If you don't get a response after a -couple tries or if you have a bug fix, bump it over to the -developer list at tidy-develop@lists.sourceforge.net. It's not a -hard line, but that is the general arrangement.

-
- -
How to Submit A Bug Report
- -
-

You are encouraged to report bugs you found to the Tidy -developer team. Tidy's quality depends on your feedback. You can -either file your bug report in the Sourceforge -bug tracker for HTML Tidy (recommended) or send a mail -to the mailing list at html-tidy@w3.org. Note you do not -have to have a Sourceforge account in order to file bug reports, or -be subscribed to html-tidy@w3.org in order to post messages to the -list.

- -

Prior to submitting a bug report, please check that the bug is -not already known. Many are. If you are not sure, just ask. If it -is new bug, make sure to include at least the following information -in your report:

- - - -

These information are necessary to reproduce whatever is -failing, without them we cannot help you. Additional information - -and patches - are very welcome!

- -

Please include only one bug per report. Reports with -multiple bugs are less easy to track and some bugs may get -missed.

-
- -
How to Submit A Feature -Request
- -
-

If you want Tidy to do something new that it doesn't do today -(or stop doing something), then it is probably a feature -request.

- -

The process for submitting a feature request is very similar to -bug requests. A different -tracker is used on SourceForge to denote the difference in -subject matter.

- -

As with bugs, please be sure that the feature has not already -been requested. If the feature has already requested, you can add -your comments to the feature request tracker, or send mail to the -mailing list indicating your -wish to also have the feature implemented. If the feature has not -already been requested, send the same information as for a bug -report, but place special emphasis on the desired output for a -given input, desired options, etc. - please be as specific as -possible about what you want Tidy to do.

-
- -
How Do I Control the Output Layout?
- -
-

There are three primary options that control how Tidy -formats your markup:

- - -

Briefly, indent sets the level of left-to-right indenting -and, somewhat, how often elements are put onto a new line. The options -are yes, no, and auto. -indent-attributes is a flag that, when set, tells Tidy to -put each attribute on a new line. vertical-space is a flag -that, when set, tells Tidy to add some empty lines for readability. The -default for all three is no. These options may be used in -any combination to control you you want your markup to look. The best -thing is to experiment a bit to see what you like. Be aware that -indent yes is deprecated for production use as it will -cause visual changes in most browsers.

- -

To get Tidy Classic --indent auto layout, use the following options:

- -
-indent: auto
-indent-attributes: no
-vertical-space: yes
-
- -

You can read about more Pretty Print options -here.

-
- -
What Version of Tidy Should -I Use?
- -
-

The current Source Forge builds are recommended. You can find these at -http://tidy.sourceforge.net. -People continue to report examples where Tidy does not catch some -ill-formed HTML or, worse, generates ill-formed HTML. These cases have -been significantly reduced. That said, be sure to test Tidy with some -representative files from your environment.

- -

For development work, use CVS directly on your development -system. For information on how to pull Tidy sources from CVS. This way -you can keep abreast of changes to Tidy and quickly resolve -conflicts.

- -

For building a front end (e.g. GUI or language binding), the -simplest approach is to use TidyLib. For more information -about building and coding with TidyLib, see the Introduction To TidyLib.

-
- -
How Do I Run A -Regression Test?
-
-

You might ask, "Why should I run a regression test?". If you -are a Tidy user, you might want to compare a new version of Tidy -to the version you are currently running. This is a good idea -if you are using Tidy in production applications such as web -publishing. If you are a Tidy developer, it is a good idea to -run the regression test suite to make sure your fix or enhancement -doesn't add new bugs.

- -

Detecting new bugs is easier said than done, because sometimes -they are subtle and can only be seen in browsers (or one particular -browser you don't even have). But you can catch most crashes and -many layout problems by running the test suite as described here.

- -

The basic process is simple: run the test suite before -and after making changes to TidyLib and compare the output -markup and messages. Be aware that the test scripts for WinNT/2K/XP -(alltest.cmd) and Linux/Unix (testall.sh) place the output files in -tidy/test/tmp. If you forget to run the before -test, you can always download a binary from the Project Page. If you -are not a TidyLib developer, you can download the Test Suite -directly. Here are the steps to evaluate the impact of a TidyLib change.

- -

For Windows

-

Before making changes:

-
-C:\tidy\test> alltest.cmd
-C:\tidy\test> ren tmp baseline
-
- -

After making changes and building Tidy:

-
-C:\tidy\test> alltest.cmd
-C:\tidy\test> windiff tmp baseline
-
- -

For Linux/Unix

-

Before making changes:

-
-~/tidy/test$ ./testall.sh
-~/tidy/test$ mv tmp baseline
-
- -

After making changes and building Tidy:

-
-~/tidy/test$ ./testall.sh
-~/tidy/test$ diff -u tmp baseline > diff.txt
-
-
- - - -
- - diff --git a/htmldoc/pending.html b/htmldoc/pending.html deleted file mode 100644 index 70c530a..0000000 --- a/htmldoc/pending.html +++ /dev/null @@ -1,554 +0,0 @@ - - - - -HTML TIDY - Notes on pending work - - - - - - - -

HTML TIDY - Notes on Pending Work

- -

Dave Raggett dsr@w3.org

- -

This is a page where I am keeping the suggestions for -improvements or bug fixes. My current work load means that I -don't get much time to work on HTML Tidy, so I am interested in -offers of help!

- -

Public Email List for Tidy: <html-tidy@w3.org>

- -

I have set up an archived mailing list devoted to Tidy. To -subscribe send an email to html-tidy-request@w3.org with the word -subscribe in the subject line (include the word unsubscribe if -you want to unsubscribe). The archive -for this list is accessible online. Please use this list to -report errors or enhancement requests.

- -

Things awaiting further attention

- - - -

I need to set up an index of precisely what attributes are -supported on each element. Right now, some elements check their -own attributes, whilst others are checked via default checks -defined for each attribute independently of the element. Until -this is done, you sometimes find that validation services -discovering errors unnoticed by Tidy itself.

- -

Jelks Cabaniss asks: Could Tidy be made to automatically -"clean" (FONTs to CSS) if the Strict DOCTYPE is requested? An -HTML or XHTML Strict document can't have FONT tags according to -the DTDs. Jelks has a bunch of other good ideas such as -converting the bgcolor attribute over to CSS.

- -

Adding an option to select slide transition effects. I would -also like to provide an optional feature for sorting attribute -values.

- -

I am having problems with form elements as direct children of -tr or table. It is dangerous to create an implicit table cell, -and what is needed is a way to move the form element into the -next cell. If this can't be done an error needs to be raised -since Tidy will be stuck. On a separate note, Tidy is still -breaking lines between <img> and </a> which in -Netscape shows as an underlined space. It's fine in IE.

- -

Benjamin Holzman <bah@orientation.com> writes: I'm -wrapping tidy (release-date 2000.01.13) in some perl objects -(using SWIG), and CharEncoding being a global is a bit of a pain. -I was wondering what your thoughts would be on how to fix that. -The character encoding is already a property of struct Out; is -there any reason why making it part of struct StreamIn as well, -and perhaps setting that property in OpenInput, based on the -existing CharEncoding variable, wouldn't allow us to move -CharEncoding to be local to main?

- -

Oh, in case you're curious about the API, here's a short -script using my wrappers to be an html to xhtml filter:

- -
-      #!/usr/bin/perl
-
-      require tidy;
-
-      my $tidy     = Tidy->new(*STDIN);
-      my $document = $tidy->parse;
-      $tidy->as_xhtml(*STDOUT);
-
- -

Rick Parsons would like there to be a new wrap-attributes -option that can be used to suppress line wrapping within -attributes. There is already a similar option for JavaScript -literals.

- -

Vijay Patil would like tidy -h to display options sorted -alphabetically.

- -

Julian Reschke would like there to be an option to add the -xml:space="preserve" attribute to pre elements when outputting -xml.

- -

Armando Asantos would like to use Tidy to produce a list of -URLs for images or hypertext links according to a config option. -This would be straightforward, but is a lower priority than bug -fixes etc.

- -

Omri Traub would like an option to wrap the contents of style -and script elements in CDATA marked sections when converting to -XHTML. He is also interested in direct support for 16 bit -character file I/O.

- -

Bertilo Wennergren notes:

- -
If I configure Tidy to "upgrade to style sheets", it -does so for a few things in my main document, but the code thus -created get error reports if I feed it back to Tidy. It turns out -that Tidy creates extra "class" attributes on tags that already -have "class" attributes set. This happens with this page: -<http://www.concinnity.se/bertilow/index.htm>.
- -

Randi Waki notes:

- -
-

If a quoted URL attribute value (e.g., href in <a> -elements) contains a line break, 13-Jan-2000 Tidy changes the -line break to a space while IE and Netscape discard the line -break. This can result in a broken link in the tidied -document.

- -

I believe the following change fixes the problem. In lexer.c, -insert the following lines before line 2502:

- -
-                            /* discard line breaks in quoted URLs */
-                            if (c == '\n' && IsUrl(name))
-                                continue;
-
-/* existing line 2502 */    c = ' ';
-
-
- -

Stephen Reynolds would like Tidy to keep track of whether a -comment started on a new line and preserve this in the -output.

- -

Terry Teague says:

- -
-

Sorry, I should have been more clear. Part of the problem is -the current HelpText() function in localize.c doesn't actually -reflect current reality.

- -

You need to at least add the following line to HelpText() -:

- -
-    tidy_out(out, "  -version or -v  show version\n");
-
- -

And I suppose it should mention the use of the new -"--<config options>" type syntax.

- -

Regards, Terry

-
- -

John Russel notes:

- -
- what i wonder is
-1] does the specification indicate these are WRONG
-2] if so why do they pass thru tidy ....
-is url syntax such a can of worms that it is left to user
-   to check .......
-
-CASE 1: misuse of slash for folders
-site had  background="pics\fancy.jpg"
-  instead of   "pics/fancy.jpg"
-
-CASE 2: spaces in filename
-site had href="coin album.html"
-instead of "coin%20album.html"
-
- -

Andre Stechert would like a way to prevent Tidy from -"cleaning" newly declared elements which don't have any content -but do have end tags, see his mail of 17th January 2000

- -

Todd Clark would like to use Tidy with Microsoft's WebClass -tags. Unfortunately these include unusual characters in the tag -names such as @ which Tidy objects to, for instance:

- -
-<WC@DOMAINNAME>test.com</WC@DOMAINNAME>
-
- -

Perhaps it makes sense to offer an option to make Tidy less -picky about what characters it accepts in tag names. Or perhaps -"WebClass: yes".

- -

Jelks Cabaniss suggests an option to control dropping of empty -elements, e.g. according to what attributes they have.

- -

Paavo Hartikainen writes:

- -
-

Tidy always expands '&' to '&' even if I have -'quote-ampersand: no' defined in configuration file. This is not -a good thing to do for URLs that have '&' characters in them. -OS is Debian GNU/Linux 2.1 SPARC. Same thing happens on Alpha. -Other architectures I have not tried.

- -

My configuration looks like this:

- -
-char-encoding: latin1
-error-file: ./errors
-indent-spaces: 2
-logical-emphasis: yes
-output-xhtml: yes
-quiet: no
-quote-ampersand: no
-show-warnings: yes
-tidy-mark: yes
-wrap: 78
-wrap-attributes: no
-write-back: yes
-keep-time: yes
-
-
- -

Paul White reports that Tidy isn't recognizing HTML 3.2 when -the doctype is "-//W3C//DTD HTML 3.2 Final//EN" (as per the REC), -and similarly for HTML 4.01. This would appear to call for a -change to the table of names in lexer.c.

- -

Stuart Hungerford would like Tidy to detect and fix duplicate -attributes e.g. multiple class attributes. Celeste Suliin Burris -would like Tidy to replace spaces in URLs by %20 as some versions -of Netscape "croak big time" on this. Denis Kokarev also wants -Tidy to remove duplicate attributes when the values are the same. -This apparently stops XSLT from working. Brian Schweitzer notes -that Tidy adds a 2nd class attribute rather than merging the -classes into a space separated list.

- -

Bertilo Wennergren writes: Tidy seems not to recognize frame -elements with a closing "/". It actually removes them. Try his example. -Tidy can produce XHTML Frameset docs, but when fed them back

- -

again it cries foul.

- -

Jose Manuel Cerqueira Esteves notes:

- -
-I've used `tidy' to convert a few HTML 4.0 files to XHTML 1.0 and noticed
-a problem when dealing with constructs like
-
- <small><small>some text</small></small>
-
-First, `tidy' acts as if the second "<small>" was meant as a closing tag:
-
- Warning: "<small> is probably intended as </small>"
-
-Then it trims the resulting empty <small></small>:
-
- Warning: trimming empty <small>
-
-And finally both remaining closing tags ("</small>"), now spurious,
-are removed:
-
- Warning: discarding unexpected </small>
- Warning: discarding unexpected </small>
-
-It would be convenient to have at least some `tidy' option to prevent this
-from happening (or perhaps some different heuristics?).
-
- -

Robbert Hans Baron would like to see Tidy warning about -duplicate attributes and fixing these when the values are -identical.

- -

Jutta Wrage notes that: When parsing HTML 3.2 Pages, tidy -doesn't accept textareas in forms correctly. The HTML Reference -specification (HTML 3.2 Final) allows: name, rows and cols, but -upon seeing these Tidy thinks the document is 4.0.

- -

Matthew Brealey notes that a heading start tag is coerced to -an end heading tag when the end tag is missing. This is -deliberate, but perhaps not the best heuristic.

- -

HIYAMA Masayuki notes that Tidy should set the encoding -attribute to match the language encoding, e.g. ?xml version="1.0" -encoding="iso-2022-jp"?><.

- -

Mark Modrall has extended Tidy to support selectively -stripping out listed tags and attributes, see his email of March -14th.

- -

Yong Taek Bae notes that with the omit end tags option Tidy -omits the body tag even if it has attributes. This is an -error.

- -

Tapio Markula reports that Tidy is incorrectly replacing -accented characters in script elements by entities. The script -element (in HTML but not XHTML) is CDATA and as such entities -won't be expanded. This bug needs to be fixed along with the -support for CDATA sections.

- -

Terrill Bennett reports tidy crashing when producing slides, -and when the -i option has been set. He later added the crash -occurs when the page doesn't include an h1 element. See -Terrill-Bennett-11mar00.txt.

- -

Stephen Lewis notes that if an <hr> element is present -in the head before the title element, then Tidy gets confused and -adds in a spurious extra empty title element. This would be -avoided if Tidy could move the hr into the body before the body -element is encountered. This raises a number of problems for -instance working out when to copy in attributes from an explicit -body element.

- -

Carl Osterly would like Tidy to avoid breaking lines before or -after the = sign in attribute values when this is practical. -Perhaps a simple rule of thumb could be used to decide this?

- -

Rick H Wesson notes that Tidy crashes on CDATA marked sections -when parsing XML.

- -

Luigi Federici would like an option to set the DTD URI for XML -or XHTML.

- -

Mat Sander notes: If I have php code the indentation behaves -strange. Repeated tidying php content and end tag indented one -level extra for each time. The result ends up something like -this:

- -
-...
-    <?php
-                        $r=0;
-                        ?<
-...
-
-I have the fillowing config file for Tidy:
----
-tidy-mark: no
-markup: yes
-wrap: 0
-indent: auto
-output-xml: no
-output-xhtml: yes
-doctype: loose
-char-encoding: latin1
-quote-marks: yes
-assume-xml-procins: yes
-word-2000: yes
-clean: yes
-logical-emphasis: yes
-drop-empty-paras: yes
-enclose-text: yes
-fix-bad-comments: yes
-alt-text: .
-write-back: bool
-keep-time: yes
-show-warnings: no
-quiet: yes
-split: no
----
-
-Best Regards,
-Mats-Olof Sander
-
-
- -

Don Hasson notes that if you make a mistake and leave off the -ending "/" in the <title> tag, tidy will generate an extra -set of <title>s.

- -

Example:

- -
-<html>
-<head><title>No end here<title></head>
-<body>
-Empty
-</body>
-</html>
-
-
- -

produces this:

- -
-<html>
-<head>
-<title>No end here</title>
-<title></title>
-</head>
-<body>
-Empty
-</body>
-</html>
-
-
- -

Jeff Wilkinson would like the HTML Tidy page to include -internal anchors so that he can link directly to the appropriate -sections.

- -

Peter Vince would like to be able to clean presentation -attributes on the body element, as well as translating b and i to -span.

- -

Dave Bryan and Mathew Brealey would like there to be a way to -suppress the default handling of inline elements in favor of -simply inserting the appropriate end tag when encountering an -element that isn't allowed in an inline context. The default -behavior replicates the rendering on existing browsers but can -cause problems for hand editors.

- -

Dave Bryan notes that tidy isn't updating the column position -when parsing attributes.

- -

Can Tidy track when a line break occurs after a PI or comment -and reproduce this in the output? This idea occurred to me after -reading a comment from Brad Stowers.

- -

One interesting suggestion is to make some of Tidy's rules of -thumb sensitive to the program that generated the markup as -indicated by the meta element. This would allow for greater -robustness in how the rules operate.

- -

Dave Bryan would like the quiet mode to be tweaked to suppress -the general info at the end of the report. see -Dave-Bryan-24mar00.txt.

- -

Erik Rossen would like an option to suppress line wrap within -tags, so that the tag is always on the same line regardless of -the number and length of the attributes.

- -

Dan Satria suggest that the clean mechanism check to see if -there are any existing matching style rules before adding new -ones.

- -

Zoltan Hawryluk suggests mapping the Netscape layer tag into -the equivalent CSS positioning syntax.

- -

Jim Walker says Tidy doesn't correctly report errors such as -</</head>.

- -

Tidy's slide feature: see Johannes-Poutre-12jul00.txt

- -

Carole Mah suggests Tidy should recover from multiple class -attributes on the same element.

- -

Other ideas

- - - - - diff --git a/htmldoc/release-notes.html b/htmldoc/release-notes.html deleted file mode 100644 index c06f448..0000000 --- a/htmldoc/release-notes.html +++ /dev/null @@ -1,1771 +0,0 @@ - - - - -HTML TIDY - Release Notes - - - - - - - -

HTML TIDY - Release Notes

- -

Dave Raggett dsr@w3.org

- -

Public Email List for Tidy: <html-tidy@w3.org>

- -

I have set up an archived mailing list devoted to Tidy. To -subscribe send an email to html-tidy-request@w3.org with the word -subscribe in the subject line (include the word unsubscribe if -you want to unsubscribe). The archive -for this list is accessible online. Please use this list to -report errors or enhancement requests.

- -

Things awaiting further attention

- -

These have been moved to the pending -page, which includes all the suggestions for improvements and -bug fixes. I am looking for volunteers to help with these as my -current workload means that I don't get much time left to work on -HTML Tidy.

- -

August 2000

- -

Ann Navarro comments that the "appears to" message is -confusing when it differs from the doctype declaration. Perhaps -it would make sense to also report the doctype? Tidy will now -report the FPI when present, and then the apparent version as -deduced from the elements and attributes present in the rest of -the document.

- -

John Russell sent in an example which featured a script -element in a frameset document where the script element appears -after the head and before the frameset. This is I believe -illegal, but Tidy proceeds to do the dumb thing discarding the -frameset element! I think it should move the script element into -the head and continue. This is now implemented.

- -

Jacques Steyn says that Tidy doesn't know about the HTML4 char -attribute for col elements. Now fixed.

- -

Carlos Piqueres Ayela would like Tidy to detect all cases of -repeated attributes, e.g. repeated valign in table cells. This -was introduced a few releases back, but I forgot to apply this -check for the elements with special purpose attribute checking -methods. Now fixed. Tidy will issue a warning for each repeated -attribute. In principle Tidy could merge repeated class -attributes, but this will require more work. My apologies to -Carole Mah for not having the time to do this now.

- -

Henry Zrepa would like an option to suppress whitespace -munging on selected attributes used for legacy scripts passed as -parameters to plugins. I have added a new boolean option -"literal-attributes" which can be set to yes to preserve -whitespace within attribute values. A better solution would be to -make this selectable on a per element basis, but I don't have -time to explore this now.

- -

Edward Zalta spotted that Tidy always removed newlines -immediately after start tags even for empty elements such as img. -An exception to this rule is the br element. Now fixed.

- -

July 2000

- -

Edward Zalta sent me an example, where Tidy was inadvertently -wrapping lines after an image element. The problem was a -conditional in pprint.c, now fixed.

- -

Andy Quick offered a bug fix for the AddClass() function in -clean.c. My thanks to Terry Teague for bringing this to my -attention. Davor Golek reported a problem with the -f option. I -discovered a bug in line 898 in tidy.c, now fixed.

- -

June 2000

- -

Fixed bug in NormalizeSpaces (== in place of =) on line -1699.

- -

I have added a new config option "gnu-emacs" following a -suggestion by David Biesack. The option changes the way errors -and warnings are reported to make them easier for Emacs to -parse.

- -

Tony Leneis noticed that Tidy didn't know that width and -height attributes on the img element aren't allowed in HTML 2.0. -He also noted that Tidy didn't know that HTML 2.0 allows img as a -direct child of body. Both of these bugs are now fixed.

- -

I have refined CanPrune() to block pruning empty elements with -if they have id or name attributes. Previously any attribute -would prevent an empty element from being pruned. The rationale -is that such empty elements are placed there to be filled -dynamically by a script. This is unlikely to occur unless the -element can be referenced via id or name.

- -

Denis Barbier sent in details patches that suppresses numerous -warnings when compiling tidy, especially:

- - - -

Fixed memory leak in CoerceNode. My thanks to Daniel Persson -for spotting this. Tapio Markula asked if Tidy could give -improved detection of spurious </ in script elements. Now -done.

- -

My thanks to John Russell who pointed out that Tidy wasn't -complaining about src attributes on hr elements. My thanks to -Johann-Christian Hanke who spotted that Tidy didn't know about -the Netscape wrap attribute for the text area element.

- -

Sebastian Lange has contributed a perl wrapper for calling -Tidy from your perl scripts, see sl-tidy.pl.

- -

Stephen Reynolds would like comments that end with a line -break to retain this property when tidied. I have added a new -boolean property to the node structure which is set by the end -comment parser in lexer.c and acted on by the comment formatting -code in pprint.c

- -

Henry Zrepa (sp?) reported that XHTML <param\> elements -were being discarded. This was due to an error in ParseBlock, now -fixed.

- -

Carole E. Mah noted that Tidy doesn't complain if there are -two or more title elements. Tidy will now complain if there are -more than one title element or more than one base element.

- -

May 2000

- -

Following a suggestion by Julian Reschke, I have added an -option to add xml:space="preserve" to elements such as pre, style -and script when generating XML. This is needed if these elements -are to be correctly parsed without access to the DTD.

- -

April 2000

- -

Randy Wacki notes that IsValidAttribute() wasn't checking that -the first character in an attribute name is a letter. Now -fixed.

- -

Jelks Cabaniss wants the naked li style hack made into an -option or at least tweaked to work in IE and Opera as well as -Navigator. Sadly, even Navigator 6 preview 1 replicates the buggy -CSS support for lists found in Navigator 4. Neither Navigator 6 -nor IE5 (win32) supports the CSS marker-offset property, and so -far I have been unable to find a safe way to replicate the visual -rendering of naked li elements (ones without an enclosing ul or -ol element). As a result I have opted for the safer approach of -adding a class value to the generated ul element -(class="noindent") to keep track of which li's weren't properly -enclosed.

- -

Rick Parsons would like to be able to use quote marks around -file names which include spaces, when specifying files in the -config file. Currently, this only effects the "error-file" -option. I have changed that to use ParseString. You can specify -error files with spaces in their names.

- -

Karen Schlesinger would like tidy to avoid pruning empty span -elements when these have id attributes, e.g. for use in setting -the content later via the DOM. Done.

- -

I have modified GetToken() to switch mode from -IgnoreWhitespace to MixedContent when encountering non-white -textual content. This solves a problem noticed by Murray -Longmore, where Tidy was swallowing white space before an end -tag, when the text is the first child of the body element.

- -

Tidy needs to check for text as direct child of blockquote -etc. which isn't allowed in HTML 4 strict. This could be -implemented as a special check which or's in transitional into -the version vector when appropriate.

- -

ParseBlock now recognizes that text isn't allowed directly in -the block content model for HTML strict. Furthermore, following a -suggestion by Berend de Boer, a new option enclose-block-text has -the same effect as enclose-text but also applies to any block -element that allows mixed content for HTML transitional but not -HTML strict.

- -

Jany Quintard noted that Tidy didn't realise the width and -height attribute aren't allowed on table cells in HTML strict -(it's fine on HTML transitional). This is now fixed. Nigel -Wadsworth wanted border on table without a value to be mapped -into border="1". Tidy already does this but only if the output is -XHTML.

- -

Jelks Cabaniss wanted Tidy to check that a link to a external -style sheet includes a type attribute. This is now done. He also -suggested extending the clean operation to migrate presentation -attributes on body to style rules. Done.

- -

March 2000

- -

I have been working on improving the Word2000 cleanup, but -have yet to figure out foolproof rules of thumb for recognizing -when paragraphs should be included as part of ul or ol lists. -Tidy recognizes the class "MsoListBullet" which Word seems to -derive from the Word style named "List Bullet". I have yet to -deal with nested lists in Word2000. This is something I was able -to deal with for html exported from Word97, but it looks like -being significantly harder to deal with for Word2000.

- -

Tidy is now able to create a pre element for paragraphs with -the style "Code". So try to use this style in your Word documents -for preformatted text. Tidy strips out the p tags and coerces -non-breaking spaces to regular spaces when assembling the pre -element's content.

- -

I would very much welcome any suggestions on how to make the -Word2000 clean up work better!

- -

Changed Style2Rule() in clean.c to check for an existing class -attribute, and to append the new class after a space. Previously -you got two class attributes which is an error

- -

Changed default for add-xml-pi to no since this was causing -serious problems for several browsers.

- -

Joakim Holm notes that tidy crashes on ASP when used for -attributes. The problem turned out to be caused by -CheckUniqueAttribute() which was being inappropriate apply to ASP -nodes.

- -

John Bigby noted that Tidy didn't know about Microsoft's data -binding feature. I have added the corresponding attributes to the -table in attr.c and tweaked CanPrune() so that empty elements -aren't deleted if they have attributes.

- -

Tidy is now more sophistocated about how it treats nested -<b>'s etc. It will prune redundant tags as needed. One -difficulty is in knowing whether a start tag is a typo and should -have been an end-tag or whether it starts a nested element. I -can't think of a hard and fast rule for this. Tidy will coerce a -<b> to </b> except when it is directly after a -preceding <b>.

- -

Bertilo Wennergren noted that Tidy lost <frame/> -elements. This has now been fixed with a patch to -ParseFrameSet.

- -

February 2000

- -

Dave Bryan spotted an error in pprint.c which allowed some -attributes to be wrapped even when wrap-attributes was set to no. -On a separate point, I have now added a check to issue a warning -if SYSTEM, PUBLIC, //W3C, //DTD or //EN are not in upper -case.

- -

Tidy now realises that inline content and text is not allowed -as a direct child of body in HTML strict.

- -

Dave Bryan also noticed that Tidy was preferring HTML 4.0 to -4.01 when doctype is set to strict or transitional, since the -entries for 4.0 appeared earlier than those for 4.01 in the table -named W3C_Version in lexer.c. I have reversed the order of the -entries to correct this. Dave also spotted that ParseString() in -config.c is erroneously calling NextProperty() even though it has -already reached the end of the line.

- -

January 2000

- -

I have added a new function ApparentVersion() which takes the -doctype into account as well as other clues. This is now used to -report the apparent version of the html in use.

- -

Thanks to the encouragement of Denis Barbier, I finally got -around to deal with the extra bracketing needed to quiet gcc --Wall. This involved the initialization of the tag, attribute and -entity tables, and miscellaneous side-effecting while and for -loops.

- -

PPrintXMLTree has been updated so that it only inserts line -breaks after start tags and before end tags for elements without -mixed content. This brings Tidy into line with current wisdom for -XML editors. My thanks to Eric Thorbjornsen for suggesting a fix -to FindTag that ensures that Tidy doesn't mistreat elements -looking like html.

- -

<table border> is now converted to -<table border="1"> when converting to XHTML.

- -

I have added support for CDATA marked sections which are -passed through without change, e.g.

- -
-<![CDATA[ .. markup here has no effect .. ]]>
-
- -

A number of people were interested in Tidied documents be -marked as such using a meta element. Tidy will now add the -following to the head if not already present:

- -
-<meta name="generator" content="HTML Tidy, see www.w3.org">
-
- -

If you don't want this added, set the option tidy-mark to -no.

- -

In the January 12th release, ParseXMLElement screwed up on -doctypes and toplevel comments, causing a memory exception. This -has now been fixed. PPrintXMLTree now uses zero indent for -comments to avoid progressive indentation as an XML document is -repeatedly tidied. I have added a blank line after elements -unless they are the last in the parent's content.

- -

Johnny Lee reports that Tidy didn't realise that HTML4 allows -the object element in the document head. Now fixed. Rainer -Gutsche noticed that Tidy wasn't moving an initial space after a -anchor start tag to just before the element. I have streamlined -the trimming of spaces.

- -

Johannes Zellner spotted that newly declared preformatted tags -weren't being treated as such for XML documents. Now fixed.

- -

December 1999

- -

Tidy now generates the XHTML namespace and system identifier -as specified by the current XHTML Proposed -Recommendation. In addition it now assumes the latest version -of HTML4 - HTML 4.01. This fixes an omission in 4.0 by adding the -name attribute to the img and form elements. This means that -documents with rollovers and smart forms will now validate!

- -

James Pickering noticed that Tidy was missing off the xhtml- -prefix for the XHTML DTD file names in the system identifier on -the doctype. This was a recent change to XHTML. I have fixed -lexer.c to deal with this.

- -

This release adds support for -JSTE psuedo elements looking like: <# #>. Note -that Tidy can't distinguish between ASP and JSTE for psuedo -elements looking like: <% %>. Line wrapping of this -syntax is inhibited by setting either the wrap-asp or wrap-jste -options to no.

- -

Thanks to Jacek Niedziela, The Win32 executable for tidy is -now able to example wild cards in filenames. This utilizes the -setargv library supplied with VC++.

- -

Jonathan Adair asked for the hashtables to be cleared when -emptied to avoid problems when running Tidy a second time, when -Tidy is embedded in other code. I have applied this to -FreeEntities(), FreeAttrTable(), FreeConfig(), and -FreeTags().

- -

Ian Davey spotted that Tidy wasn't deleting inline emphasis -elements when these only contained whitespace (other than -non-breaking spaces). This was due to an oversight in the -CanPrune() function, now fixed.

- -

Michel Lemay spotted some bugs in if statements and provided -some sample html files that caused Tidy to crash. On further -study, I found a bug in the code that moves font elements inside -anchors. I have fixed this and added a new method to test the -tree for internal consistency in its bidirectional links: -CheckNodeIntegrity().

- -

I have also refined the code for handling noframes to make it -more robust. It will now handle noframes within a body within a -noframes etc. (something permitted by HTML4). It will also -recover if the noframes end tag is missing or is in the wrong -place.

- -

I have fleshed out the table for mapping characters in the -Windows Western character set into Unicode, see Win2Unicode[]. -Yahoo was, for example, using the Windows Western character for -bullet, which is in Unicode is U+2022.

- -

David Halliday noticed that applets without any content -between the start and end tags were being pruned by Tidy. This is -a bug and has now been fixed.

- -

I have changed the way Tidy handles empty paragraphs when the -drop-empty-paras is set to no. HTML4 doesn't allow empty -paragraphs so I am now replacing them by a pair of br elements, -so that the formatting is preserved. When drop-empty-paras is set -to yes, empty paragraphs are simply removed.

- -

Darren Forcier asked for a way to suppress fixing up of -comments when these include adjacent hyphens since this was -screwing up Cold Fusion's special comment syntax. The new option -is called: fix-bad-comments and defaults to yes.

- -

Using Michel's examples I have improved the way the table -parser deals with unexpected content. This is now consistently -moved before the table, or to the head element as appropriate. -Microsoft and Netscape differ in how an unclosed blockquote -renders when found at the table or tr level. Netscape indents the -table but Microsoft does not. This is getting too tricky for me -to deal with!

- -

Using a sample page from Yahoo, I discovered that Netscape -Navigator doesn't implement the text-align style property on tr -or table elements. As a result I have added a special check for -this in BlockStyle() to avoid translating the align attribute on -tr or table into a style rule.

- -

Richard Allsebrook would like to be able to map b/i to -strong/em without the full clean process being invoked. I have -therefore decoupled these two options. Note that setting -logical-emphasis is also decoupled from drop-font-tags.

- -

30th November 1999

- -

This is an interim release to provide a bug fix for a bug -introduced earlier in the month. I have fixed a bug in the -emphasis code which looks for start tags Which are most likely -intended as end tags. This bug only appeared in the November -release and could cause a crash or indefinite looping. My thanks -to a respondent calling himself "Michael" who provided a -collection of files that allowed me to track this down.

- -

I have also added page transition effects for the slide maker -feature. The effects are currently only visible on IE4 and above, -and take advantage of the meta element. I will provide an option -to select between a range of transition effects in the next -release.

- -

November 1999

- -

David Duffy found a case causing Tidy to loop indefinitely. -The problem occurred when a blocklevel element is found within a -list item that isn't enclosed in a ul or ol element. I have added -a check to ParseList to prevent this.

- -

Takuya Asada tells me that in Raw mode Tidy is incorrectly -mapping 0xA0 to the entity   causing problems for Shift_JIS -etc. Now fixed. Larry Virden reported a problem with ParseConfig -when one of the arguments was null. I have added a check for -this.

- -

Thomas McGuigan notes that Tidy issues a warning for noframes -elements without a body element. HTML4 is defined so that the -content of the noframes element is restricted to a single body -element. However, it also allows you to omit the start and end -tags for body, something that isn't allowed for XHTML. I have -changed the code to only issue the warning when generating -XML.

- -

Added new --version or -v option that reports the release date -to the error stream. ParseConfig() now returns false if it -doesn't use the parameter. This avoids the next argument on the -command line from being swallowed inadvertently, e.g. for unknown -options. Tidy now warns about unrecognized options.

- -

I have revised the way Tidy deals with comments to avoid -problems with repeated hyphens. First "--" is illegal in XML, and -second, the comment syntax for SGML is very error prone when it -comes to when and where you can use hyphens. As a result, Tidy -will now replace repeated hyphens with "=" characters. My thanks -to Yudong Yang and Randy Waki for their input on this.

- -

Emphasis start tags will now be coerced to end tags when the -corresponding element is already open. For instance -<u>...<u>. This behavior doesn't apply to font tags -or start tags with attributes. My thanks to Luis M. Cruz for -suggesting this idea.

- -

Jonathan Adair would like Tidy to warn when the same attribute -appears more than once in the same element. This is an error for -both SGML and XML. The best way to make this check would be to -sort the attributes and look for duplicate entries. Other people -have asked for the attributes to be sorted, but I need further -input on the appropriate sort order. As an interim solution, Tidy -uses a simple test which generates n+1 warnings if an attribute -is repeated n times.

- -

October 1999

- -

On Unix systems you can get Tidy to look for a config file in -~/.tidyrc or ~your/.tidyrc etc. when the HTML_TIDY environment -variable isn't set. To enable this feature don't forget to -uncomment SUPPORT_GETPWNAM in the platform.h file. This feature -won't work on Windows. My thanks to Todd Lewis who contributed -the code.

- -

Darren Forcier reports that Cold Fusion uses the following -syntax:

- -
-<CFIF True IS True>
-   This should always be output 
-<CFELSE>
-   This will never output 
-</CFIF>
-
- -

After declaring the CFIF tag in the config file, Tidy was -screwing up the Cold Fusion expression syntax, mapping 'True' to -'True=""' etc. My fix was to leave such pseudo attributes -untouched if they occur on user defined elements.

- -

Jelks Cabaniss noticed that Tidy wasn't adding an id attribute -to the map element when converting to XHTML. I have added -routines to do this for both 'a' and 'map'. The value of the id -attribute is taken from the name attribute.

- -

Larry Cousin noted that Tidy is now screwing up on option -elements. This proved to be a recently introduced error, which I -have now fixed. Peter Ruevski forwarded an example that caused -Tidy to loop endlessly. The problem was caused by an ol start tag -followed by a b start tag and then an li element. I have solved -the problem with a fix to ParseBlock.

- -

I have revised the way Tidy deals with unexpected content in -lists. Tidy now wraps such content in list items with the style -attribute set to "list-style: none" to suppress list bullets. If -an li element is found unexpectedly in the body or block-level -content, it is wrapped into a ul element with the style attribute -set to "margin-left: -2em". This provides a closer match to the -observed rendering on current browsers. I use a couple of -postprocessing steps (List2BQ and BQ2Div) to further clean this -up to use div elements. My thanks to Thomas Ribbrock for sending -me a challenging example that led me to this solution.

- -

A number of people have asked for a config option to set the -alt attribute for images when missing. The alt-text property can -now be used for this purpose. Please note that YOU are -responsible for making your documents accessible to people who -can't view the images!

- -

Terry Teague spotted a bug in ParseConfigFile() that prevented -Tidy from parsing more that one file. This has been fixed by -setting the char buffer to zero in the call to InitConfig() -before parsing. Terry also noted a few places where I had slipped -back into using malloc and free rather than MemAlloc and MemFree, -now fixed.

- -

Bjoern Hoehrmann notes that the September 27th release mapped -empty paragraphs to br elements, which introduces extra -whitespace in IE and Navigator. The former behavior to strip -empty paragraphs is as per HTML4 and works fine on most browsers -with the exception of Lynx. I have reverted to stripping empty -P's, but have added an option to leave them alone.

- -

Bjoern also drew my attention to a bug in the September -release where table content is lacking a preceding td or th start -tag. Tidy moves such content to before the table element to match -the observed rendering. This is now working as planned. I have -tweaked the printing behavior when the omit end tags option is -set. It now omits the </html> as well as the optional start -tags for html, head and body.

- -

Pao-Hsi Huang had problems with the contents of the option -element being discarded. I was unable to reproduce this problem, -but did notice that I unintentionally preserving newlines within -option text. This is now fixed. Shane Harrelson spotted that -table cells containing a single font element, when cleaned -dropped the font element without getting the corresponding style. -Now fixed via a tweak to InlineStyle().

- -

Andre Hinrichs wanted Tidy to do a better job on font elements -with relative size changes. This is in fact rather tricky. -Currently, Tidy uses percentage scaling values for fonts rather -than the enumeration defined by CSS [xx-small | x-small | small | -medium | large | x-large | xx-large]. The first problem is to -match these 7 values onto the 6 define by the font element. The -next problem is caused by the fact that CSS doesn't provide -matching relative font size values that you could match to the -ones defined for the font element. I have done my best using -percentage values, base on tests with IE and Navigator. If anyone -can come up with a better approach, please let me know.

- -

Tom Berger reported a problem when quote-marks was set to yes. -Using his test file everything is now working fine. Several -people asked for a way to turn off line wrapping. Tidy will now -interpret zero as meaning disable wrapping. Johannes Zellner -wants to include some tcl code in his XML markup and asks for a -way define new tags that behave in the same way as HTML's pre -element. The new option is new-pre-tags.

- -

September 1999

- -

Tidy will now add a type attribute to the style and script -attributes when this is missing. Tidy examines the language -attribute to determine what media type to use. I have also added -code to create an id attribute for anchors when a name attribute -is present, and to report a warning if id and name don't -match.

- -

Added support for cleaning up HTML generated by Microsoft Word -2000 when you save as "Web Page". When you set "word-2000: yes" -Tidy makes a Herculean effort to clean up the mess created when -Word 2000 exports to HTML. Word bulks out HTML with presentation -information that allows it to round-trip documents between HTML -and Word without lost of information. This makes the HTML hard to -edit and can cause some very popular browsers to crash! I haven't -dealt with the VML markup Word uses for line drawings.

- -

Applied fix to InsertNodeAfterElement() to set -node->next->prev. My thanks to "Advocate" for this. This -was only encountered when dealing with PRE tags containing -content illegal for PRE. (Called twice by ParsePre to move -illegal PRE content to be a later sibling of PRE, then open PRE -again afterward)

- -

Change to table row parser so that when Tidy comes across an -empty row, it inserts an empty cell rather than deleting it. This -is consistent with browser behavior and avoids problems with -cells that span rows.

- -

Baruch Even sent extensive patches for improved support for -the PHP preprocessing psuedo tags. You can now use the 'wrap-php: -no' to suppress line wrapping within PHP instructions. In the -process of this work, I have created a new function InsertMisc() -for dealing with comments, processing instructions, ASP and -PHP.

- -

I have update the table of tags to include additional -proprietary tags such as server, ilayer, layer, nolayer and -multicol. Using patches sent in by Edward Avis, Tidy now offers a -quiet mode which suppresses the initial welcome message and the -summary report on the number of errors or warnings. Jason -Tribbeck sent in patches to allow config options normally set in -the config file to be set on the command line, by preceding them -with a "--" (no intervening space), for example:

- -
-  tidy --break-before-br true --show-warnings false
-
- -

Kenichi Numata discovered that Tidy looped indefinitely for -examples similar to the following:

- -
-<font size=+2>Title
-<ol>
-</font>Text
-</ol>
-
- -

I have now cured this problem which used to occur when a -</font> tag was placed at the beginning of a list element. -If the example included a list item before the </ol> Tidy -will now create the following markup:

- -
-<font size=+2>Title</font>
-<blockquote>Text </blockquote>
-<ol>
-<li>list item</li>
-</ol>
-
- -

This uses blockquote to indent the text without the -bullet/number and switches back to the ol list for the first true -list item.

- -

I have worked hard to improve support for server side -preprocessing instructions such as ASP, PHP and Tango. Tidy now -allows you to replace attribute values by such instructions and -is able to fix up the case where the instruction appears without -delimiting quote marks. Tidy supports ASP and PHP in element -content and also in place of attribute value pairs. Support for -Tango is limited to attribute values only.

- -

John Love-Jensen contribute a table for mapping the MacRoman -character set into Unicode. I have added a new charset option -"mac" to support this. Note the translation is one way and -doesn't convert back to the Mac codes on output.

- -

Some people place <p> at the end of their list items to -introduce whitespace before the next item. I have modified -TrimEmptyElement to coerce empty p elements to br elements to -reproduce this rendering. If a p start tag is found in dt -elements, I now coerce the p to a br. Satwinder Mangat has -alerted me to several such problems. First, text as a direct -child of dl should be wrapped in a dt and not a dd element. -Second, unlike other inline tags, browser only close anchors on a -anchor start or end tag. Actually Navigator and IE differ in how -they handle this. Try the following example:

- -
-<p><b><a href=foo>some text</i> which should be in the label</a></p>
-
-<p>next para and guess what the emphasis will be?</p>
-
- -

Navigator 4 renders the second paragraph in normal text while -IE renders it in bold. If you substitute <a> for the -</i>, once again the browsers differ. IE stops underlining -at the <a> text while Navigator continues until the -</a>, although it realizes that you can't click there.

- -

Satwinder continues: browsers happily interpret center within -a heading. Tidy now moves the center element to be the parent of -the rest of the heading, splitting it as needed, rather than -prematurely ending the heading. The same applies to a div element -within a heading. Satwinder notes that Tidy inserts a ul when an -li is encountered as a direct child of body.

- -

This is a case where you can't produce a legal HTML file that -renders the same way as browsers handle this. The same applies to -a dt or dd element without an enclosing dl element. I can report -that W3C's HTML working group was unwilling to bless naked li's -etc. A similar problem arises for dt elements when they contain -hr, center or div. The specs say this is illegal, but browsers -render it fine!

- -

I have done my best for hr, splitting the dt as needed and -enclosing the hr within a dd. The hr doesn't look the same, -sadly, as it now starts at the left margin for the dd'st rather -than the left margin for dt's. I wasn't sure how to deal with -center and div within dt, and chose to discard them.

- -

</br> is now mapped to <br> to match observed -browser rendering. On the same basis, an unmatched </p> is -mapped to <br><br>. This should improve fidelity of -tidied files to the original rendering, subject to the -limitations in the HTML standards described above.

- -

Vlad Harchev spotted that Tidy was swallowing the first and -last spaces within inline elements when in a pre element. Now -fixed. Zac Thompson spotted that Tidy didn't know that the tags -s, strike and u weren't allowed in HTML4 strict. I have now fixed -this.

- -

Tidy now preserves the last modified time for the files it -writes back to. This was introduced on the suggestion of -René Fritz, who uses the SiteCopy utility to upload recently -modified files to his Web server. By preserving file timestamps -Tidy can be used on all files in a directory without impacting -which ones will be uploaded, the next time SiteCopy runs. This is -implemented using the fstat and futime system calls. If your -platform doesn't support these calls, set PRESERVEFILETIMES to 0 -in platform.h

- -

I have fixed a bug on lexer.c which screwed up the removal of -doctype elements. This bug was associated with the symptom of -printing an indefinite number of doctype elements.

- -

August 1999

- -

Added lowsrc and bgproperties attributes to attribute table. -Rob Clark tells me that bgproperties="fixed" on the body elements -causes NS and IE to fix the background relative to the window -rather that the document's content.

- -

Terry Teague kindly drew my attention to several bugs -discovered by other people: My thanks to Randy Waki for -discovering a bug when an unexpected inline end-tag is found in a -ul or ol element. I have added new code to ParseList in parser.c -to pop the inline stack and discard the end tag. I am checking to -see whether a similar problem occurs elsewhere. Randy also -discovered a bug (now fixed) in TrimInitialSpace() in parser.c -which caused it to fail when the element was the first in the -content. John Cumming found that comments cause problems in table -row group elements such as tbody. I have fixed this oversight in -this release.

- -

Bjoern Hoehrmann tells me that bgsound is only allowed in the -head and not in the body, according to the Microsoft -documentation. I have therefore updated the entry in tags.c. The -slide generation feature caused an exception when the original -document didn't include a document type declaration. The fix -involve setting the link to the parent node when creating the -doctype node.

- -

26th July 1999

- -

Jussi Vestman reported a bug in FixDocType in lexer.c which -caused tidy to corrupt the parse tree, leading to an infinite -loop. I independently spotted this and fixed it. Justin -Farnsworth spotted that Tidy wasn't handling XML processing -instructions which end in ?> rather than just > as -specified by SGML. I have added a new option: -assume-xml-procins: yes which when set to yes expects the -XML style of processing instruction. It defaults to no, but is -automatically set to yes for XML input. Justin notes that the XML -PIs are used for a server preprocessor format called PHP, which -will now be easy to handle with Tidy. Richard Allsebrook's mail -prompted me to make sure that the contents of processing -instructions are treated as CDATA so that < and > etc. are -passed through unescaped.

- -

Bill Sowers asks for Tidy to support another server -preprocessor format called Tango which features syntax such -as:

- -
-<b><@include <@cgi><appfilepath>includes/message.html></b>
-
- -

I don't have time to add support for Tango in this release, -but would be happy if someone else were to mail in appropriate -changes. Darrell Bircsak reports problems when using DOS on -Win98. I am using Win95 and have been unable to reproduce the -problem. Jelks Cabaniss notes that Tidy doesn't support XML -document type subset declarations. This is a documented -shortcoming and needs to be fixed in the not too distant future. -Tidy focuses on HTML, so this hasn't been a priority todate.

- -

Jussi Vestman asks for an optional feature for mapping IP -addresses to DNS hostnames and back again in URLs. Sadly, I don't -expect to be able to do this for quite a while. Adding network -support to Tidy would also allow it to check for bad URLs.

- -

Ryan Youck reports that Tidy's behavior when finding a ul -element when it expects an li start tag doesn't match Netscape or -IE. I have confirmed this and have changed the code for parsing -lists to append misplaced lists to the end of the previous list -item. If a new list is found in place of the first list item, I -now place it into a blockquote and move it before the start of -the current list, so as to preserve the intended rendering.

- -

I have added a new option - enclose-text which encloses any -text it finds at the body level within p elements. This is very -useful for curing problems with the margins when applying style -sheets.

- -

9th July 1999

- -

Added bgsound to tags.c. Added '_' to definition of namechars -to match html4.decl. My thanks to Craig Horman for spotting -this.

- -

Jelks Cabaniss asked for the clean option to be automatically -set when the drop-font-tags option is set. Jelks also notes that -a lot of the authoring tools automatically generate, for example, -<I> and <B> in place of <em> and <strong> -(MS FrontPage 98 generated the latter, but FP2000 has reverted to -the former - with no option to change or set it). Jelks suggested -adding a general tag substitution mechanism. As a simpler measure -for now, I have added a new property called logical-emphasis to -the config file for replacing i by em and b by strong.

- -

7th July 1999

- -

Fixed recent bug with escaping ampersands and plugged memory -leaks following Terry Teagues suggestions. Changed -IsValidAttrName() in lexer.c to test for namechars to allow - and -: in names.

- -

2nd July 1999

- -

Chami noticed that the definition for the marquee tag was -wrong. I have fixed the entry in tags.c and Tidy now works fine -on the example he sent. To support mixing MathML with HTML I have -added a new config option for declaring empty inline tags -"new-empty-tags". Philip Riebold noted that single quote marks -were being silently dropped unless quote marks was set to yes. -This is an unfortunate bug recently introduced and now fixed.

- -

Paul Smith sent in an example of badly formed tables, where -paragraph elements occurred in table rows without enclosing table -cells. Tidy was handling this by inserting a table cell. After -comparison with Netscape and IE, I have revised the code for -parsing table rows to move unexpected content to just before the -table.

- -

26th June 1999

- -

Tony Leneis reports that Tidy incorrectly thinks the table -frame attribute is a transitional feature. Now fixed. Chami -reported a bug in ParseIndent in config.c and that onsumbit is -missing from the table of attributes. Both now fixed. Carsten -Allefeld reports that Tidy doesn't know that the valign attribute -was introduced in HTML 3.2 and is ok in HTML 4.0 strict, -necessitating a trivial change to attrs.c.

- -

Axel Kielhorn notes that Tidy wasn't checking the preamble for -the DOCTYPE tag matches either "html PUBLIC" or "html SYSTEM". -Bill Homer spotted changes needed for Tidy to compile with SGI -MIPSpro C++. All of Bill's changes have been incorporated, except -for the include file "unistd.h" (for the unlink call) which isn't -available on win32. To include this define NEEDS_UNISTD_H

- -

Bjoern Hoehrmann asked for information on how to use the -result returned by Tidy when it exits. I have included a example -using Perl that Bjoern sent in. Bodo Eing reported that Tidy gave -misleading warning when title text is emphasized. It now reports -a missing </title> before any unexpected markup.

- -

Bruce Aron says that many WYSIWYG HTML editors place a font -element around an hypertext link enclosing the anchor element -rather that its contents. Unfortunately, the anchor element then -overrides the color change specified by the font element! I have -added an extra rule to ParseInline to move the font element -inside an anchor when the anchor is the only child of the font -element. Note CSS is a better long term solution, and Tidy can be -used to replace font elements by style rules using the clean -option.

- -

Carsten Allefeld reported that valign on table cells caused -Tidy to mislabel content as HTML 4.0 transitional rather than -strict. Now fixed. A number of people said they expected the -quote-mark option to apply to all text and not just to attribute -values. I have obliged and changed the option accordingly.

- -

Some people have wondered why "</" causes an error when -present within scripts. The reason is that this substring is not -permitted by the SGML and XML standards. Tidy now fixes this by -inserting a backslash, changing the substring to "<\/". Note -this is only done for JavaScript and not for other scripting -languages.

- -

Chami reported that onsubmit wasn't recognized by Tidy - now -fixed. Chris Nappin drew my attention to the fact that script -string literals in attributes weren't being wrapped correctly -when QuoteMarks was set to no. Now fixed. Christian Zuckschwerdt -asked for support for the POSIX long options format e.g. --help. -I have modified tidy.c to support this for all the long options. -I have kept support for -help and -clean etc.

- -

Craig Horman sent in a routine for checking attribute names -don't contain invalid characters, such as commas. I have used -this to avoid spurious attribute/value pairs when a quotemark is -misplaced. Darren Forcier is interested in wrapping Tidy up as a -Win32 DLL. Darren asked for Tidy to release its memory resources -for the various tables on exit. Now done, see DeInitTidy() in -tidy.c

- -

Darren also asks about the config file mechanism for declaring -additional tags, e.g. new-blocklevel-tags: cfoutput, -cfquery for use with Cold Fusion. You can add inline and -blocklevel elements but as yet you can't add empty elements -(similar to br or hr) or to change the content model for the -table, ul, ol and dl elements. Note that the indent option -applies to new elements in the same way as it does for built-in -elements. Tidy will accept the following:

- -
-<cfquery name="MyQuery" datasource="Customer">
- select CustomerName from foo where x > 1
-</cfquery>
-
-<cfoutput query="MyQuery">
-  <table>
-    <tr>
-    <td>#CustomerName#</TD>
-    </tr>
-  </table>
-</cfoutput>
-
- -

but the next example won't since you can't as yet -modify the content model for the table element:

- -
-<cfquery name="MyQuery" datasource="Customer">
- select CustomerName from foo where x > 1
-</cfquery>
-
-<table>
-  <cfoutput query="MyQuery">
-    <tr>
-    <td>#CustomerName#</TD>
-    </tr>
-  </cfoutput>
-</table>
-
- -

I have been studying richer ways to support modular extensions -to html using assertions and a generalization of regular -expressions to trees. This work has led a tool for generating -DTDs named dtdgen and I am in the process of creating a -further tool for verification. More information is available in -my note on Assertion -Grammars. Please contact me if you are interested in helping -with this work.

- -

David Fallon is interested in using Tidy to dynamically repair -markup in an HTML editor as people type. My recommendation is to -take advantage of the tables in tags.c and attrs.c for this, and -to defer to application of the full range of heuristics to such a -time as saving to disk or when explicitly requested. The CM_OPT -property in the tags table indicates that the end tag is -optional, while CM_EMPTY indicates that an element is -empty, i.e. has no content.

- -

Betsy Miller reports: I tried printing the HTML Tidy page -for a class I am teaching tomorrow on HTML, and everything in the -"green" style (all of the examples) print in the smallest font I -have ever seen (in fact they look like tiny little horizontal -lines). Any explanation?.

- -

Yes. This is a problem with Internet Explorer and Style -Sheets. The Tidy page includes a CSS style sheet that tries to -make the size of the font used for the examples 80% smaller than -for normal text. Internet Explorer gets this wrong, picking a -very much smaller font. I am hoping this bug is fixed in the IE -5.0 release. I have changed the style sheet to work around -this.

- -

Francisco Guardiola writes that Tidy wasn't fixing frameset -documents with body elements unenclosed in noframes elements. Now -fixed. Frederik Fouvry found that comments after the html end tag -generated a warning for content after body. I can't reproduce -this symptom and assume it was fixed in an earlier release.

- -

Indrek Toom wants to know how to format tables so that tr -elements indent their content, but td tags do not. The solution -is to use indent: auto. Jelks Cabaniss noted that the -clean option created style rules with tag names in uppercase, -which would cause problems for Extensible HTML (xhtml). This -prompted me to overhaul Tidy to switch to lower case for that tag -tables and literals. I have adopted Jelks' suggestion for adding -support for a doctype property in config files. This supports -omit, auto, strict, loose or a string specifying the fpi -(formal public identifier).

- -

Johannes Koch notes that Tidy doesn't fix up the doctype -correctly when bursting to slides. He says that if a document -contains the HTML 4.0 strict DT declaration, then the slides also -include the same strict DT declaration, but also contain the -center tag which does not appear in the strict DTD. I have -applied a simple work around, which is to remove the original -doctype when bursting to slides.

- -

I have extended the support for the ASP preprocessing syntax -to cope with the use of ASP within tags for attributes. I have -also added a new option wrap-asp to the config file -support to allow you to turn off wrapping within ASP code. Thanks -to Ken Cox for this idea.

- -

Larry Virden asked for a compile-time option for setting the -config file, he says "The reason it would be useful is to be able -to define a set of commonly used additional tags. For instance, -our site is starting to use a lot of ColdFusion. I would love to -be able to put the CF tags into a site wide file so that users of -tidy automatically get them defined". You can now do this by -defining CONFIG_FILE in platform.h

- -

Loïc Trégan asks: Is there a way to generate a -"light" xml, with no "<!DOCTYPE...>" and "xlmns=..."? I -have tweaked the code to allow the doctype property to apply when -outputting XML, and added a new property "add-xml-pi" to control -whether an <?xml?> processing instruction is added or not. -To generate a minimal XML document, you can set the xml-out -property to yes, the doctype and add-xml-pi property to no.

- -

Marc Jauvin has been using Windows Application to generate Web -pages and found that some of them generate very "non-portable" -HTML. One of the problems that is often introduced is the use of -"\" in URLs instead of "/" which confuses Unix Web servers. To -deal with this I have introduced the "fix-backslash" property. -This has been set by default to yes, but can be set to no if that -causes problems.

- -

The new property indent-attributes when set to yes -places each attribute on a new line. Note that the attributes are -only indented one space. Paul Ossenbruggen asked for something -slightly different, where the second and subsequent attributes -start on a new line and are indented to line up under the first -attribute. That proved to involve rather more work to implement -than I have time for right now. I plan to work some more on this -for a future release.

- -

Peter Jeremy reported that when an error file is specified to -tidy (-f file), the error file is opened for every HTML file -specified on the command line, but not closed until all HTML -files have been processed. If a large number of files are -specified on the command line (e.g. processing the FreeBSD -handbook), this can overflow the process or system file -descriptor table. I have now fixed this so that the error file is -only opened once.

- -

Rafi Stern notes: I have entered output-xml: yes in my config -file, not output-xhtml. Tidy second guesses me and adds the xmlns -attribute for XHTML at the head of my file, which I then have to -remove as this interferes with my XSLT parser. Fixed along with -the other bugs reported by Rafi.

- -

Steffen Ullrich and Andy Quick both spotted a problem with -attribute values consisting of an empty string, e.g. -alt="". This was caused by bugs in tidy.c and in -lexer.c, both now fixed. Jussi Vestman noted Tidy had problems -with hr elements within headings. This appears to be an old bug -that came back to life! Now fixed. Jussi also asked for a config -file option for fixing URLs where non-conforming tools have used -backslash instead of forward slash.

- -

An example from Thomas Wolff allowed me to the idea of -inserting the appropriate container elements for naked list items -when these appear in block level elements. At the same time I -have fixed a bug in the table code to infer implicit table rows -for text occurring within row group elements such as thead and -tbody. An example sent in by Steve Lee allowed me to pin point an -endless loop when a head or body element is unexpectedly found in -a table cell.

- -

15th April 1999

- -

Another minor release. Jacob Sparre Andersen reports a bug -with &quot; in attribute values. Now fixed. Francisco -Guardiola reports problems when a body element follows the -frameset end tag. I have fixed this with a patch to ParseHTML, -ParseNoFrames and ParseFrameset in parser.c Chris Nappin wrote in -with the suggestion for a config file option for enabling -wrapping script attributes within embedded string literals. You -can now do this using "wrap-script-strings: yes".

- -

14th April 1999

- -

Added check for Asp tags on line 2674 in parser.c so that Asp -tags are not forcibly moved inside an HTML element. My thanks to -Stuart Updegrave for this. Fixed problem with & entities. -Bede McCall spotted that &amp; was being written out as -&amp;amp;. The fix alters ParseEntity() in lexer.c

- -

12th April 1999

- -

Added a missing "else" on line 241 in config.c (thanks for -Keith Blakemore-Noble for spotting this). Added config.c and .o -to the Makefile (an oversight in the release on the 8th -April).

- -

8th April 1999

- -

Localization:

- -

All the message text is now defined in localize.c which should -make it a tad easier to localize Tidy for different -languages.

- -

Config file support:

- -

I have added support for configuring tidy via a configuration -file. The new code is in config.h which provides a table driven -parser for RFC822 style headers. The new command line option --config <filename> can be used to identify the config file. -The environment variable "HTML_TIDY" may be used to name the -config file. If defined, it is parsed before scanning the command -line. You are advised to use an absolute path for the variable to -avoid problems when running tidy in different directories.

- -

Allan Kuchinsky:

- -

Reports that the XML DOM parser by Eduard Derksen screws up on - , naked & and % in URLs as well as having problems with -newlines after the '=' before attribute values.

- -

I have tweaked PrintChar when generating XML to output   -in place of &nbsp; and &amp; in place of &. In -general XHTML when parsed as well-formed XML shouldn't use named -entities other than those defined in XML 1.0. Note that this -isn't a problem if the parser uses the XHTML DTDs which import -the entity definitions.

- -

Allan Odgaard:

- -

When tidy encounter entities without a terminating semi-colon -(e.g. "©") then it correctly outputs "©", but it -doesn't report an error.

- -

I have added a ReportEntityError procedure to localize.c and -updated ParseEntity to call this for missing semicolons and -unknown entities.

- -

Andreas Buchholz:

- -

Tidy warns if table element is missing. This is incorrect for -HTML 3.2 which doesn't define this attribute.

- -

The summary attribute was introduced in HTML 4.0 as an aid for -accessibility. I have modified CheckTABLE to suppress the warning -when the document type explicitly designates the document as -being HTML 2.0 or HTML 3.2.

- -

Andy Brown:

- -

I have renamed the field from class to tag_class as "class" is -a reserved word in C++ with the goal of allowing tidy to be -compiled as C++ e.g. when part of a larger program.

- -

I have switched to Bool and the values yes and no to avoid -problems with detecting which compilers define bool and those -that don't.

- -

Andy would prefer a return code or C++ exception rather than -an exit. I have removed the calls to exit from pprint.c and used -a long jump from FatalError() back to main() followed by -returning 2. It should be easy to adapt this to generate a C++ -exception.

- -

Sometimes the prev links are inconsistent with next links. I -have fixed some tree operations which might have caused this. Let -me know if any inconsistencies remain.

- -

Ann Navarro:

- -

Would like to be able to use:

- -
-   tidy file.html | more
-
- -

to pause the screen output, and/or full output passing to file -as with

- -
-   tidy file.html > output.txt
-
- -

Tidy writes markup to stdout and errors to stderr. 'More' only -works for stdout so that the errors fly by. My compromise is to -write errors to stdout when the markup is suppressed using the -command line option -e or "markup: no" in the config file.

- -

html-kit@chamisplace.com

- -

Writes asking for a single output routine for Tidy. Acting on -his suggestion, I have added a new routine tidy_out() which -should make it easier to embed HTML Tidy in a GUI application -such as HTML-Kit. The new routine is in localize.c. All input -takes place via ReadCharFromStream() in tidy.c, excepting command -line arguments and the new config file mechanism.

- -

Chami also asks for single routines for initializing and -de-initializing Tidy, something that happens often from the GUI -environment of HTML-Kit. I have added InitTidy() and DeInitTidy() -in tidy.c to try to satisfy this need. Chami now supports an -online interface for Tidy at the URL:

- -
-   http://www.chamisplace.com/asp/hk.asp
-
- -

He further asks for Tidy to optionally output a length -parameter whenever possible. This could represent the length of -the element, attribute or code block related to the error. An -online validator could then highlight the starting and ending -columns which may be easier for beginners to understand, rather -than pointing to a single character column. I will investigate -this for a future release.

- -

Chang Hyun Baek:

- -

Reports a problem when generating XML using -iso2022. Tidy -inserts ?/p< rather than </p>. I tried Chang's test file -but it worked fine with in all the right places. Please let me -know if this problem persists.

- -

Christian Ruetgers:

- -

When using -indent option Tidy emits a newline before which -alters the layout of some tables.

- -

I note that browsers aren't conforming to the SGML spec on -generally ignoring a newline immediately after start tags and -immediately before end tags. Netscape does this for pre elements -but not for other tags! My work around is to avoid additional -newlines for the content of th and td elements, except where -their content starts with a block level element. This kind of -thing is getting really hairy!

- -

Christian Pantel:

- -

Would like the servlet tag added to tidy. This looks very -similar to applet and used for preprocessing document content -before delivery. Servlet acts as a container for param elements -and fallback content to be shown if the server doesn't support -servlet. I have added it as a proprietary tag and parse it in the -same way as applet.

- -

Christian also reports that <td><hr/></td> -caused Tidy to discard the <hr/> element. I have fixed the -associated bug in ParseBlock.

- -

Chuck Baslock:

- -

Points out that an isolated & is converted to & in -element content and in attribute values. This is in fact correct -and in agreement with the recommendations for HTML 2.0 -onwards.

- -

Craig Horman:

- -

Reports that Tidy loops indefinitely if a naked LI is found in -a table cell. I have patched ParseBlock to fix this, and now -successfully deal with naked list items appearing in table cells, -clothing them in a ul.

- -

Craig Johnson:

- -

Reports that Tidy gets confused by </comment> before the -doctype. This is apparently inserted by some authoring tool or -other. I have patched Tidy to safely recover from the -unrecognized and unexpected end tag without moving the parse -state into the head or body.

- -

Daniel Vogelheim:

- -

Asks for Tidy to recognize obsolete elements such as LISTING -and to replace them by more modern equivalents, in this case pre. -I have added code to issue a warning and replace such elements as -xmp, listing, plaintext by pre, and dir and menu by ul. Daniel -also asks for a means to suppressing warnings, i.e. to only -report errors. I have added the boolean "show-warnings" to the -config file support to deal with this and split off warnings to -ReportWarnings().

- -

Dan Rudman:

- -

Would love a version of Tidy written in Java. This is a big -job. I am working on a completely new implementation of Tidy, -this time using an object-oriented approach but I don't expect to -have this done until later this year. DEFERRED

- -

David Brooke:

- -

Reports that when tidying an XMLfile with characters above 127 -Tidy is outputting the numeric entity followed by the character. -I have fixed this by a patch to PPrintChar() for XmlTags.

- -

David Getchell:

- -

Reports that Tidy thinks an ol list is HTML 4.0 when you use -the type attribute. I have fixed an error in attrs.c to correct -this feature to first appearing in HTML 3.2.

- -

Drew Adams:

- -

Reported problems when using comments to hide the contents of -script elements from ancient browsers. I wasn't able to reproduce -the problem, and guess I fixed it earlier.

- -

Drew also reported a problem which on further investigation is -caused by the very weird syntax for comments in SGML and XML. The -syntax for comments is really error prone:

- -
- <!--[text excluding --]--[[whitespace]*--[text excluding --]--]*>
-
- -

This means that <!----> is a complete comment but -<!------> is not since the parser is expecting a matching -terminating -- and as it doesn't find the -- it ploughs on and on -treating the rest of the markup as a comment unless it finds -another end comment. I have added a rule of thumb (a heuristic) -for detecting this situation. Basically I count the number of -comment groups without other characters and if the count is > -2 and a '>' is seen, a warning is generated.

- -

Drew goes on to comment on the -clean option. This made me -take another look at the relative font sizes I am using for the -absolute font sizes for 0 through 6. I have tweaked them to get a -reasonable match before/after applying -clean as viewed on NS4 -and IE4. Font size=3 is taken as the normal body font size and as -such the font element is silently dropped unless it also defines -a color.

- -

I have also added InlineStyle to deal with the cases where an -inline element has as its only child a font element. A further -possibility would be to promote style properties common to all -children of an element to the element. I will have to leave this -for future work.

- -

Drew asks why </ is not allowed in script content. The -answer is that SGML treats </ as delimiting the end of CDATA -element content, so that it ends prematurely before the -</script> end tag. Browsers tend not to follow the SGML -standard in this respect, but Tidy is designed to help you do -so.

- -

Guus Goos:

- -

Notes that tidy *.html doesn't work under DOS. This is because -DOS unlike Unix doesn't expand names with wildcards to the list -of matching file names. This is a right nuisance and one more -reason why Linux is gaining popularity. I plan to provide a work -around in a future release of Tidy. Are there any free drop-in -replacements for the DOS shell that fix this problem?

- -

Jack Horsfield:

- -

Like a number of others would like list items and table cells -to be output compactly where possible. I have added a flag to -avoid indentation of content to tags.c that avoids further -indentation when the content is inline, e.g.

- -
- <ul>
-   <li>some text</li>
-   <li>
-     <p>
-        a new paragraph
-     </p>
-   </li>
- </ul>
-
- -

This behavior is enabled via "smart-indent: yes" and overrides -"indent: no". Use "indent-spaces: 5" to set the number of spaces -used for each level of indentation.

- -

Jeff Young:

- -

Has a few suggestions that will make Tidy work with XSL. -Thanks, I have incorporated all of them into the new release.

- -

Jelks Cabaniss:

- -

Reports that the Tidy thinks the end tag is missing if the -script element has no content. I have patched ParseScript to fix -this. Jelks also asks for a way to ask Tidy to hide the contents -of script and style elements; a way to avoid promoting inline -styles with -clean to style rules as a work around for a bug in -IE for URLs with relative URLs; finally, a way to avoid empty -elements being discarded, especially if they define an ID for -scripting. Very reasonable, but I would prefer leave these to a -future release. (This release is big enough right now!).

- -

One thing I can satisfy right away is a mailing list for Tidy. -html-tidy@w3.org has been created for discussing Tidy and I have -placed the details for subscribing and accessing the Web archive -on the Tidy overview page.

- -

Johannes Koch:

- -

Reports that Tidy isn't quite right about when it reports the -doctype as inconsistent or not. I have tweaked HTMLVersion() to -fix this. Let me know if any further problems arise.

- -

John Tobler:

- -

Wants to know how to get Tidy to preserve his explicit -entities e.g. " and  . Currently Tidy interprets all -entities as character values and as such has no way to -distinguish whether these were derived from entities or not. To -help John with this release you can use "quote-marks: yes" in the -config file if you want all " marks to appear as " and -"quote-nbsp: yes" if you want non-breaking spaces to be shown as -entities. Note that for XML in general   is not-predeclared, -so you should also use "numeric-entities: yes". This doesn't -apply to XHTML though.

- -

John also reports that the weirdly complex URLs using the -javascript: scheme as used by www.bookmarklets.com can cause Tidy -indigestion. I have made Tidy aware of which attributes are using -Javascript and disabled the missing quote mark heuristic for -these. I have also tweaked the way unknown entities are reported -to say that the markup have contain unescaped ampersands.

- -

Mathew Cepl:

- -

Notes that dir and menu are deprecated and not allowed in -HTML4 strict. I have updated the entry in the tags table for -these two. I also now coerce them automatically to ul when -clean -is set.

- -

Maurice Buxton:

- -

Reports that some implementations of gcc don't work with the -current compiler directive Tidy uses to avoid duplicate typedefs -for uint and ulong. I don't have a truly platform independent -solution for this, so you may need to edit platform.h if the code -doesn't compile out of the box on your platform.

- -

Osma Ahvenlampi:

- -

Found that Tidy is confused by map elements in the head. Tidy -knows that map is only allowed in the body and thinks the author -has left out the

- -

start tag. Thereafter elements which it knows only belong in -the head are moved to the head, so things should work out ok. -Osma also reports having difficulties with non-breaking spaces, -but I was unable to reproduce these with the new release of Tidy, -so perhaps the problems have been fixed.

- -

Paul Ward:

- -

Reports that Tidy caused JavaScript errors when it introduced -linebreaks in JavaScript attributes. Tidy goes to some efforts to -avoid this and I am interested in any reports of further problems -with the new release.

- -

Rafi Stern:

- -

Would like Tidy to warn when a tag has an extra quote mark, as -in <a href="xxxxxx"">. I have patched ParseAttribute to do -this.

- -

Rene Fritz:

- -

Reported a space being inserted at the end of lines when a the -text is wrapped at the start of hypertext links. This isn't -occurring with this release, so I guess the problem was solved a -while back. Rene also suggests that Tidy could be used to add and -remove metadata and attributes etc. for a group of files, e.g. to -add a link to a style sheet or to assert attribution. This sounds -like a good idea for work in the future.

- -

Shane McCarron:

- -

Reports that Tidy sometimes wraps text within markup that -occurs in the context of a pre element. I am only able to repeat -this when the markup wraps within start tags, e.g. between -attribute values. This is perfectly legitimate and doesn't effect -rendering.

- -

Steven Lobo:

- -

Notes that Tidy doesn't remove entities such as &nbsp; or -&copy; which aren't defined by XML 1.0. That is true - these -entities are fine if you are using XHTML. If you want to -generate generic XML then you need to use the -n option or to set -"numeric-entities: yes" in the config file. This will then output -all such entities in their numeric form or as direct character -values according to the character encoding flags.

- -

Steven Pemberton:

- -

Comments that he would like Tidy to replace naked & in -URLs by &. You can now use "quote-ampersands: yes" in the -config file to ensure this. Note that this is always done when -outputting to XML where naked '&' characters are illegal.

- -

Steven also asks for a way to allow Tidy to proceed after -finding unknown elements. The issue is how to parse them, e.g. to -treat them as inline or block level elements? The latter would -terminate the current paragraph whereas the former would not.

- -

If treated as inline, presumably, unknown tags should be -treated specially, for instance, normal inline end tags close the -currently open inline element, but this doesn't feel right for -unknown tags. What should the content model for unknown tags be - -flow? Again its far from obvious. One way to avoid these -difficulties would be to provide a means for authors to declare -unknown tags in the config file.

- -

You can now declare new inline and block-level tags in the -config file, e.g.:

- -
-define-inline-tags: foo, bar
-define-blocklevel-tags: blob
-
- -

The content model for new tags allows for block or inline -content. Steven further comments that some authors use ul without -an li to indent content. Tidy currently coerces these to wrap the -content within an li which alters the rendering. He suggests -using blockquote instead. I have done this, and if you use the --clean option at the same time, it gets replaced by a div element -with a class and style rule for indenting the content.

- -

Stuart Updegrave:

- -

Would like to be able to coerce attributes to uppercase. I -have added support for "uppercase-attributes: yes" for this. -Stuart also asks for Tidy to support Microsoft's ASP tags. These -are part of Microsoft's server-side scripting model (similar to -CGI). I have treated ASP tags in the same way as processing -instructions, and they don't effect the version of HTML as they -are assumed to have been interpreted before delivery to the -client.

- -

Stuart is also interested in having Tidy reading from and -writing back to the Windows clipboard. This sounds interesting -but I have to leave this to a future release.

- -

Terry Cassidy:

- -

Points out that Tidy doesn't like "top" or "bottom" for the -align attribute on the caption element. I have added a new -routine to check the align attribute for the caption element and -cleaned up the code for checking the document type.

- -

Xavier Plantefeve:

- -

Suggests that I should ensure that the options are self -consistent, e.g. if -asxml is set, then this should imply lower -case and override any instruction to omit optional end tags. -Accordingly, I have introduced a new routine AdjustConfig() that -is applied after reading the command line and config files and -before tidying any files.

- -

Xavier wonders whether name attributes should be replaced or -supplemented by id attributes when translating HTML anchors to -XHTML. This is something I am thinking about for a future release -along with supplementing lang attributes by xml:lang -attributes.

- -

Zdenek Kabelac:

- -

Asks for headings and paragraphs to be treated specially when -other tags are indented. I have dealt with this via the new -smart-indent mechanism.

- -

22nd February 1999

- -

Tidy can now fix up XML empty tags for which the attribute -values are unquoted, e.g. <br clear=all/>. Care is taken to -avoid this being applied to tags with URLs, e.g. <a -href=http://acme.com/> where the / is part of the attribute -value and doesn't signify an empty tag. Authors are advised to -always quote attribute values to avoid such problems!

- -

22nd January 1999

- -

Tidy no longer complains about a missing </tr> before a -<tbody>. Added link to a free win32 GUI for -tidy.

- -

11th January 1999

- -

Added a link to the OS/2 distribution of Tidy made available -by Kaz SHiMZ. No changes to Tidy's source code.

- -

7th January 1999

- -

Fixed bug in ParseBlock that resulted in nested table -cells.

- -

Fixed clean.c to add the style property "text-align:" rather -than "align:".

- -

Disabled line wrapping within HTML alt, content and value -attribute values. Wrapping will still occur when output as -XML.

- -

16th December 1998

- -

This release fixes a problem with missing quotemarks in -attribute values introduced in the December 14th release. It also -fixes problems with parsing tables when the table cells include -naked list items and when unexpected end tags are encountered for -td and tr cells. Warnings are now generated for unknown entities -(those not defined by HTML 4.0). It may be worth thinking about a -new option to determine how to handle these, especially for -XML.

- -

14th December 1998

- -

Rewrote parser for elements with CDATA content to fix problems -with tags in script content.

- -

New pretty printer for XML mode. I have also modified the XML -parser to recognize xml:space attributes appropriately. I have -yet to add support for CDATA marked sections though.

- -

script and noscript are now allowed in inline content.

- -

To make it easier to drive tidy from scripts, it now returns 2 -if any errors are found, 1 if any warnings are found, otherwise -it returns 0. Note tidy doesn't generate the cleaned up markup if -it finds errors other than warnings.

- -

Fixed bug causing the column to be reported incorrectly when -there are inline tags early on the same line.

- -

Added -numeric option to force character entities to be -written as numeric rather than as named character entities. -Hexadecimal character entities are never generated since Netscape -4 doesn't support them.

- -

Entities which aren't part of HTML 4.0 are now passed through -unchanged, e.g. &precompiler-entity; This means that an -isolated & will be pass through unchanged since there is no -way to distinguish this from an unknown entity.

- -

Tidy now detects malformed comments, where something other -than whitespace or '--' is found when '>' is expected at the -end of a comment.

- -

The <br> tags are now positioned at the start of a blank -line to make their presence easier to spot.

- -

The -asxml mode now inserts the appropriate Voyager html -namespace on the html element and strips the doctype. The html -namespace will be usable for rigorous validation as soon as W3C -finishes work on formalizing the definition of document profiles, -see: WD-html-in-xml.

- -

13th November 1998 and earlier releases

- -

Fixed bug wherein <style type=text/css> was written -out as <style type="text/ss">.

- -

Tidy now handles wrapping of attributes containing JavaScript -text strings, inserting the line continuation marker as needed, -for instance:

- -
-onmouseover="window.status='Mission Statement, \
-Our goals and why they matter.'; return true"
-
- -

You can now set the wrap margin with the -wrap option.

- -

When the output is XML, tidy now ensures the content starts -with <?xml version="1.0"?>.

- -

The Document type for HTML 2.0 is now "-//IETF//DTD HTML -2.0//". In previous versions of tidy, it was incorrectly set to -"-//W3C//DTD HTML 2.0//".

- -

When using the -clean option isolated FONT elements are now -mapped to SPAN elements. Previously these FONT elements were -simply dropped.

- -

NOFRAMES now works fine with BODY element in frameset -documents.

- - - diff --git a/htmldoc/tidy.gif b/htmldoc/tidy.gif deleted file mode 100644 index a5edeb2..0000000 Binary files a/htmldoc/tidy.gif and /dev/null differ diff --git a/index.html b/index.html new file mode 100644 index 0000000..62d688b --- /dev/null +++ b/index.html @@ -0,0 +1,562 @@ + + +HTML Tidy for HTML5 (experimental) + + +

HTML Tidy for HTML5 (experimental)

+

This page documents the experimental HTML5 fork of HTML Tidy available +at +https://github.com/w3c/tidy-html5. + +

File bug reports and enhancement requests at +https://github.com/w3c/tidy-html5/issues.

+ +

The W3C public mailing list for HTML Tidy discussion is +html-tidy@w3.org (list archive). + +

For more information on HTML5:

+ +

+Validate your HTML documents using the +W3C Nu Markup Validator. + +

What Tidy does

+

Tidy corrects and cleans up HTML content by fixing markup errors. +Here are a few examples: +

+ +

How to run Tidy from the command line

+

This is the syntax for invoking Tidy from the command line: +

+   tidy [[options] filename]*
+
+

+Tidy defaults to reading from standard input, so if you run Tidy without +specifying the filename argument, it will just sit +there waiting for input to read. +And Tidy defaults to writing to standard output. So you can pipe output +from Tidy to other programs, as well as pipe output from other programs to +Tidy. You can page through the output from Tidy by piping it to a pager:

+
+   tidy file.html | less
+
+

+To have Tidy write its output to a file instead, either use the +-o filename or -output filename +option, or redirect standard output to the file; for example: +

+   tidy -o output.html index.html
+   tidy index.html > output.html
+
+

Both of those run tidy on the file index.html and write the +output to the file output.html, while writing any error messages to +standard error. +

+Tidy defaults to writing its error messages to standard error (that is, to +the console where you’re running Tidy). To page through the error messages, +along with the output, redirect standard error to standard output, and pipe +it to your pager: +

+   tidy index.html 2>&1 | less
+
+

+To have Tidy write the errors to a file instead, either use the +-f filename or -file filename +option, or redirect standard error to a file:

+
+   tidy -o output.html -f errs.txt index.html
+   tidy index.html > output.html 2> errs.txt 
+
+

Both of those run tidy on the file index.html and write the +output to the file output.html, while writing any error messages to +the file errs.txt. +

+Writing the error messages to a file is especially useful if the file you +are checking has many errors; reading them from a file instead of the +console or pager can make it easier to review them. +

You can use the or -m or -modify option to +modify (in-place) the contents of the input file you are checking; that is, +to overwrite those contents with the output from Tidy. Example: +

+   tidy -f errs.txt -m index.html
+
+

That runs tidy on the file index.html, modifying it in place +and writing the error messages to the file errs.txt. +

+Caution: If you use the -m option, you should first save a copy of your file. +

Options and configuration settings

+

To get a list of available options, use:

+
+   tidy -help
+
+

To get a list of all configuration settings, use:

+
+   tidy -help-config
+
+

To read the help output a page at time, pipe it to a pager: +

+   tidy -help | less
+   tidy -help-config | less
+
+

Single-letter options other than -f may be combined; for example: +

+  tidy -f errs.txt -imu foo.html
+
+ +

Using a config file

+

The most convenient way to configure Tidy is by using separate +config file. +Assuming you have created a +Tidy config file named config.txt (the name doesn't matter), you can +instruct Tidy to use it via the command line option +-config config.txt; for example: +

+   tidy -config config.txt file1.html file2.html
+
+

Alternatively, you can name the default config file via the +environment variable named HTML_TIDY, the value of which is +the absolute path for the config file. +

You can also set config options on the command line by preceding +the name of the option immediately (no intervening space) with the string "--"; +for example:

+
+  tidy --break-before-br true --show-warnings false
+
+

You can find documentation for full set of configuration options +on the +Quick Reference +page. + +

Sample config file

+

The following is an example of a Tidy config file.

+
+// sample config file for HTML tidy
+indent: auto
+indent-spaces: 2
+wrap: 72
+markup: yes
+output-xml: no
+input-xml: no
+show-warnings: yes
+numeric-entities: yes
+quote-marks: yes
+quote-nbsp: yes
+quote-ampersand: no
+break-before-br: no
+uppercase-tags: no
+uppercase-attributes: no
+char-encoding: latin1
+new-inline-tags: cfif, cfelse, math, mroot, 
+  mrow, mi, mn, mo, msqrt, mfrac, msubsup, munderover,
+  munder, mover, mmultiscripts, msup, msub, mtext,
+  mprescripts, mtable, mtr, mtd, mth
+new-blocklevel-tags: cfoutput, cfquery
+new-empty-tags: cfelse
+
+ +

Indenting output for readability

+

Indenting the source markup of an HTML document makes the markup easier +to read. Tidy can indent the markup for an HTML document while recognizing +elements whose contents should not be indented. In the example below, Tidy +indents the output while preserving the formatting of the <pre> +element:

+

Input:

+
+ <html>
+ <head>
+ <title>Test document</title>
+ </head>
+ <body>
+ <p>This example shows how Tidy can indent output while preserving
+ formatting of particular elements.</p>
+ 
+ <pre>This is
+ <em>genuine
+       preformatted</em>
+    text
+ </pre>
+ </body>
+ </html>
+ 
+
+

Output:

+
+<html>
+  <head>
+    <title>Test document</title>
+  </head>
+
+  <body>
+    <p>This example shows how Tidy can indent output while preserving
+    formatting of particular elements.</p>
+<pre>
+This is
+<em>genuine
+       preformatted</em>
+   text
+</pre>
+  </body>
+</html>
+
+

Tidy’s indenting behavior is not perfect and can sometimes cause your +output to be rendered by browsers in a different way than the input. +You can avoid unexpected indenting-related rendering problems by setting +indent: no or indent: auto in a config file.

+ +

Preserving original indenting not possible

+

Tidy is not capable of preserving the original indenting of the markup +from the input it receives. That’s because Tidy starts by building a clean +parse tree from the input, and that parse tree doesn’t contain any +information about the original indenting. Tidy then pretty-prints the parse +tree using the current config settings. Trying to preserve the original +indenting from the input would interact badly with the repair operations +needed to build a clean parse tree, and would considerably complicate the +code.

+ +

Encodings and character references

+

+Tidy defaults to assuming you want output to be encoded in UTF-8. +But Tidy offers you a choice of other character encodings: US ASCII, ISO +Latin-1, and the ISO 2022 family of 7 bit encodings. +

+Tidy doesn't yet recognize the use of the HTML <meta> element for +specifying the character encoding.

+

+The full set of HTML character references are defined. Cleaned-up output +uses named character references for characters when appropriate. Otherwise, +characters outside the normal range are output as numeric character +references. + +

Accessibility

+

Tidy offers advice on potential accessibility problems for people using +non-graphical browsers. + +

Cleaning up presentational markup

+

Some tools generate HTML with presentational elements such as <font>, +<nobr>, and <center>. +Tidy's -clean option will replace those elements with CSS style +properties. +

Some HTML documents rely on the presentational effects of <p> start +tags that are not followed by any content. Tidy deletes such <p> tags +(as well as any headings that don’t have content). So do not use <p> +tags simply for adding vertical whitespace; instead use CSS, or the +<br> element. However, note that Tidy won’t discard <p> tags that +are followed by any nonbreaking space (that is, the &nbsp; named +character reference). + +

Teaching Tidy about new tags

+

You can teach Tidy about new tags by declaring them in the +configuration file, the syntax is:

+
+  new-inline-tags: tag1, tag2, tag3
+  new-empty-tags: tag1, tag2, tag3
+  new-blocklevel-tags: tag1, tag2, tag3
+  new-pre-tags: tag1, tag2, tag3
+
+

The same tag can be defined as empty and as inline or as empty +and as block.

+

These declarations can be combined to define a new empty +inline or empty block element. But you are not advised to declare +tags as being both inline and block.

+

Note that the new tags can only appear where Tidy expects inline +or block-level tags respectively. That means you can’t place +new tags within the document head or other contexts with restricted +content models. + +

Ignoring PHP, ASP, and JSTE instructions

+

Tidy will gracefully ignore many cases of PHP, ASP, and JSTE +instructions within element content and as replacements for attributes, +and preserve them as-is in output; for example:

+
+  <option <% if rsSchool.Fields("ID").Value
+    = session("sessSchoolID")
+    then Response.Write("selected") %>
+    value='<%=rsSchool.Fields("ID").Value%>'>
+    <%=rsSchool.Fields("Name").Value%>
+    (<%=rsSchool.Fields("ID").Value%>)
+  </option>
+
+

But note that Tidy may report missing attributes when those are “hidden” +within the PHP, ASP, or JSTE code. If you use PHP, ASP, or JSTE code to +create a start tag, but place the end tag explicitly in the HTML markup, Tidy +won’t be able to match them up, and will delete the end tag. So in that +case you are advised to make the start tag explicit and to use PHP, ASP, or +JSTE code for just the attributes; for example:

+
+   <a href="<%=random.site()%>">do you feel lucky?</a>
+
+

+Tidy can also get things wrong if the PHP, ASP, or JSTE code includes +quotation marks; for example: +

+
+    value="<%=rsSchool.Fields("ID").Value%>"
+
+

Tidy will see the quotation mark preceding ID as ending the +attribute value, and proceed to complain about what follows. +

Tidy allows you to control whether line wrapping on spaces within +PHP, ASP, and JSTE +instructions is enabled; see the wrap-php, wrap-asp, +and wrap-jste config options.

+ +

Correcting well-formedness errors in XML markup

+

Tidy can help you to correct well-formedness errors in XML markup. Tidy +doesn't yet recognize all XML features, though; for example, it doesn't +understand CDATA sections or DTD subsets.

+ +

Using Tidy from scripts

+

If you want to run Tidy from a Perl or other scripting language +you may find it of value to inspect the result returned by Tidy +when it exits: 0 if everything is fine, 1 if there were warnings +and 2 if there were errors. This is an example using Perl:

+
+if (close(TIDY) == 0) {
+  my $exitcode = $? >> 8;
+  if ($exitcode == 1) {
+    printf STDERR "tidy issued warning messages\n";
+  } elsif ($exitcode == 2) {
+    printf STDERR "tidy issued error messages\n";
+  } else {
+    die "tidy exited with code: $exitcode\n";
+  }
+} else {
+  printf STDERR "tidy detected no errors\n";
+}
+
+ +

Source code

+

The source code for the experimental HTML5 fork of Tidy can be found at +https://github.com/w3c/tidy-html5. + +

Acknowledgements

+

Dave Raggett has a list of +Acknowledgements +for people who made suggestions or reported bugs for the +original version of Tidy. + +

+Show TOC +
+
+Close +
    +
  1. What Tidy does +
  2. How to run Tidy from the command line +
  3. Options and configuration settings +
  4. Using a config file +
  5. Sample config file +
  6. Indenting output for readability +
  7. Preserving original indenting not possible +
  8. Encodings and character references +
  9. Accessibility +
  10. Cleaning up presentational markup +
  11. Teaching Tidy about new tags +
  12. Ignoring PHP, ASP, and JSTE instructions +
  13. Correcting well-formedness errors in XML markup +
  14. Using Tidy from scripts +
  15. Source code +
  16. Acknowledgements +
+