tidy-html5/src/tidylib.c

2657 lines
74 KiB
C
Raw Normal View History

2011-11-17 02:44:16 +00:00
/* tidylib.c -- internal library definitions
(c) 1998-2008 (W3C) MIT, ERCIM, Keio University
See tidy.h for the copyright notice.
Defines HTML Tidy API implemented by tidy library.
2011-11-17 02:44:16 +00:00
Very rough initial cut for discussion purposes.
Public interface is const-correct and doesn't explicitly depend
on any globals. Thus, thread-safety may be introduced w/out
changing the interface.
Looking ahead to a C++ wrapper, C functions always pass
2011-11-17 02:44:16 +00:00
this-equivalent as 1st arg.
Created 2001-05-20 by Charles Reitzel
*/
#include <errno.h>
#include "tidy-int.h"
#include "parser.h"
#include "clean.h"
#include "gdoc.h"
2011-11-17 02:44:16 +00:00
#include "config.h"
#include "message.h"
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
#include "messageobj.h"
2011-11-17 02:44:16 +00:00
#include "pprint.h"
#include "entities.h"
#include "tmbstr.h"
#include "utf8.h"
#include "mappedio.h"
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
#include "language.h"
2011-11-17 02:44:16 +00:00
#ifdef TIDY_WIN32_MLANG_SUPPORT
#include "win32tc.h"
#endif
2015-02-19 18:14:40 +00:00
#if !defined(NDEBUG) && defined(_MSC_VER)
#include "sprtf.h"
#endif
2011-11-17 02:44:16 +00:00
/* Create/Destroy a Tidy "document" object */
static TidyDocImpl* tidyDocCreate( TidyAllocator *allocator );
static void tidyDocRelease( TidyDocImpl* impl );
static int tidyDocStatus( TidyDocImpl* impl );
/* Parse Markup */
static int tidyDocParseFile( TidyDocImpl* impl, ctmbstr htmlfil );
static int tidyDocParseStdin( TidyDocImpl* impl );
static int tidyDocParseString( TidyDocImpl* impl, ctmbstr content );
static int tidyDocParseBuffer( TidyDocImpl* impl, TidyBuffer* inbuf );
static int tidyDocParseSource( TidyDocImpl* impl, TidyInputSource* docIn );
/* Execute post-parse diagnostics and cleanup.
** Note, the order is important. You will get different
** results from the diagnostics depending on if they are run
** pre-or-post repair.
*/
static int tidyDocRunDiagnostics( TidyDocImpl* doc );
2014-11-22 07:42:28 +00:00
static void tidyDocReportDoctype( TidyDocImpl* doc );
2011-11-17 02:44:16 +00:00
static int tidyDocCleanAndRepair( TidyDocImpl* doc );
/* Save cleaned up file to file/buffer/sink */
static int tidyDocSaveFile( TidyDocImpl* impl, ctmbstr htmlfil );
static int tidyDocSaveStdout( TidyDocImpl* impl );
static int tidyDocSaveString( TidyDocImpl* impl, tmbstr buffer, uint* buflen );
static int tidyDocSaveBuffer( TidyDocImpl* impl, TidyBuffer* outbuf );
static int tidyDocSaveSink( TidyDocImpl* impl, TidyOutputSink* docOut );
static int tidyDocSaveStream( TidyDocImpl* impl, StreamOut* out );
#ifdef NEVER
TidyDocImpl* tidyDocToImpl( TidyDoc tdoc )
{
return (TidyDocImpl*) tdoc;
}
TidyDoc tidyImplToDoc( TidyDocImpl* impl )
{
return (TidyDoc) impl;
}
Node* tidyNodeToImpl( TidyNode tnod )
{
return (Node*) tnod;
}
TidyNode tidyImplToNode( Node* node )
{
return (TidyNode) node;
}
AttVal* tidyAttrToImpl( TidyAttr tattr )
{
return (AttVal*) tattr;
}
TidyAttr tidyImplToAttr( AttVal* attval )
{
return (TidyAttr) attval;
}
const TidyOptionImpl* tidyOptionToImpl( TidyOption topt )
{
return (const TidyOptionImpl*) topt;
}
TidyOption tidyImplToOption( const TidyOptionImpl* option )
{
return (TidyOption) option;
}
#endif
/* Tidy public interface
**
** Most functions return an integer:
**
** 0 -> SUCCESS
** >0 -> WARNING
** <0 -> ERROR
**
2011-11-17 02:44:16 +00:00
*/
TidyDoc TIDY_CALL tidyCreate(void)
{
TidyDocImpl* impl = tidyDocCreate( &TY_(g_default_allocator) );
return tidyImplToDoc( impl );
}
TidyDoc TIDY_CALL tidyCreateWithAllocator( TidyAllocator *allocator )
{
TidyDocImpl* impl = tidyDocCreate( allocator );
return tidyImplToDoc( impl );
}
void TIDY_CALL tidyRelease( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
tidyDocRelease( impl );
}
TidyDocImpl* tidyDocCreate( TidyAllocator *allocator )
{
TidyDocImpl* doc = (TidyDocImpl*)TidyAlloc( allocator, sizeof(TidyDocImpl) );
TidyClearMemory( doc, sizeof(*doc) );
doc->allocator = allocator;
TY_(InitMap)();
TY_(InitTags)( doc );
TY_(InitAttrs)( doc );
TY_(InitConfig)( doc );
TY_(InitPrintBuf)( doc );
/* By default, wire tidy messages to standard error.
** Document input will be set by parsing routines.
** Document output will be set by pretty print routines.
** Config input will be set by config parsing routines.
** But we need to start off with a way to report errors.
*/
doc->errout = TY_(StdErrOutput)();
return doc;
}
void tidyDocRelease( TidyDocImpl* doc )
{
/* doc in/out opened and closed by parse/print routines */
if ( doc )
{
assert( doc->docIn == NULL );
assert( doc->docOut == NULL );
TY_(ReleaseStreamOut)( doc, doc->errout );
doc->errout = NULL;
TY_(FreePrintBuf)( doc );
TY_(FreeNode)(doc, &doc->root);
TidyClearMemory(&doc->root, sizeof(Node));
if (doc->givenDoctype)
TidyDocFree(doc, doc->givenDoctype);
TY_(FreeConfig)( doc );
TY_(FreeAttrTable)( doc );
TY_(FreeAttrPriorityList)( doc );
2011-11-17 02:44:16 +00:00
TY_(FreeTags)( doc );
/*\
* Issue #186 - Now FreeNode depend on the doctype, so the lexer is needed
* to determine which hash is to be used, so free it last.
\*/
TY_(FreeLexer)( doc );
2011-11-17 02:44:16 +00:00
TidyDocFree( doc, doc );
}
}
/* Let application store a chunk of data w/ each Tidy tdocance.
** Useful for callbacks.
*/
void TIDY_CALL tidySetAppData( TidyDoc tdoc, void* appData )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
impl->appData = appData;
}
void* TIDY_CALL tidyGetAppData( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return impl->appData;
return NULL;
}
ctmbstr TIDY_CALL tidyReleaseDate(void)
{
return TY_(ReleaseDate)();
}
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
ctmbstr TIDY_CALL tidyLibraryVersion(void)
{
return TY_(tidyLibraryVersion)();
}
2011-11-17 02:44:16 +00:00
/* Get/set configuration options
*/
Bool TIDY_CALL tidySetOptionCallback( TidyDoc tdoc, TidyOptCallback pOptCallback )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
impl->pOptCallback = pOptCallback;
return yes;
}
return no;
}
Bool TIDY_CALL tidySetConfigCallback(TidyDoc tdoc, TidyConfigCallback pConfigCallback)
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
impl->pConfigCallback = pConfigCallback;
return yes;
}
return no;
}
2011-11-17 02:44:16 +00:00
int TIDY_CALL tidyLoadConfig( TidyDoc tdoc, ctmbstr cfgfil )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(ParseConfigFile)( impl, cfgfil );
return -EINVAL;
}
int TIDY_CALL tidyLoadConfigEnc( TidyDoc tdoc, ctmbstr cfgfil, ctmbstr charenc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(ParseConfigFileEnc)( impl, cfgfil, charenc );
return -EINVAL;
}
int TIDY_CALL tidySetCharEncoding( TidyDoc tdoc, ctmbstr encnam )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
int enc = TY_(CharEncodingId)( impl, encnam );
if ( enc >= 0 && TY_(AdjustCharEncoding)(impl, enc) )
return 0;
TY_(ReportBadArgument)( impl, "char-encoding" );
}
return -EINVAL;
}
int TIDY_CALL tidySetInCharEncoding( TidyDoc tdoc, ctmbstr encnam )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
int enc = TY_(CharEncodingId)( impl, encnam );
if ( enc >= 0 && TY_(SetOptionInt)( impl, TidyInCharEncoding, enc ) )
return 0;
TY_(ReportBadArgument)( impl, "in-char-encoding" );
}
return -EINVAL;
}
int TIDY_CALL tidySetOutCharEncoding( TidyDoc tdoc, ctmbstr encnam )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
int enc = TY_(CharEncodingId)( impl, encnam );
if ( enc >= 0 && TY_(SetOptionInt)( impl, TidyOutCharEncoding, enc ) )
return 0;
TY_(ReportBadArgument)( impl, "out-char-encoding" );
}
return -EINVAL;
}
TidyOptionId TIDY_CALL tidyOptGetIdForName( ctmbstr optnam )
{
const TidyOptionImpl* option = TY_(lookupOption)( optnam );
if ( option )
return option->id;
return N_TIDY_OPTIONS; /* Error */
}
TidyIterator TIDY_CALL tidyGetOptionList( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(getOptionList)( impl );
return (TidyIterator) -1;
}
TidyOption TIDY_CALL tidyGetNextOption( TidyDoc tdoc, TidyIterator* pos )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
const TidyOptionImpl* option = NULL;
if ( impl )
option = TY_(getNextOption)( impl, pos );
else if ( pos )
*pos = 0;
return tidyImplToOption( option );
}
TidyOption TIDY_CALL tidyGetOption( TidyDoc ARG_UNUSED(tdoc), TidyOptionId optId )
{
const TidyOptionImpl* option = TY_(getOption)( optId );
return tidyImplToOption( option );
}
TidyOption TIDY_CALL tidyGetOptionByName( TidyDoc ARG_UNUSED(doc), ctmbstr optnam )
{
const TidyOptionImpl* option = TY_(lookupOption)( optnam );
return tidyImplToOption( option );
}
TidyOptionId TIDY_CALL tidyOptGetId( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option )
return option->id;
return N_TIDY_OPTIONS;
}
ctmbstr TIDY_CALL tidyOptGetName( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option )
return option->name;
return NULL;
}
TidyOptionType TIDY_CALL tidyOptGetType( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option )
return option->type;
return (TidyOptionType) -1;
}
TidyConfigCategory TIDY_CALL tidyOptGetCategory( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option )
return option->category;
return (TidyConfigCategory) -1;
}
ctmbstr TIDY_CALL tidyOptGetDefault( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option && option->type == TidyString )
return option->pdflt; /* Issue #306 - fix an old typo hidden by a cast! */
2011-11-17 02:44:16 +00:00
return NULL;
}
ulong TIDY_CALL tidyOptGetDefaultInt( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option && option->type != TidyString )
return option->dflt;
return ~0U;
}
Bool TIDY_CALL tidyOptGetDefaultBool( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option && option->type != TidyString )
return ( option->dflt ? yes : no );
return no;
}
Bool TIDY_CALL tidyOptIsReadOnly( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option )
return ( option->parser == NULL );
return yes;
}
TidyIterator TIDY_CALL tidyOptGetPickList( TidyOption topt )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option )
return TY_(getOptionPickList)( option );
return (TidyIterator) -1;
}
ctmbstr TIDY_CALL tidyOptGetNextPick( TidyOption topt, TidyIterator* pos )
{
const TidyOptionImpl* option = tidyOptionToImpl( topt );
if ( option )
return TY_(getNextOptionPick)( option, pos );
return NULL;
}
ctmbstr TIDY_CALL tidyOptGetValue( TidyDoc tdoc, TidyOptionId optId )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
ctmbstr optval = NULL;
if ( impl )
optval = cfgStr( impl, optId );
return optval;
}
Bool TIDY_CALL tidyOptSetValue( TidyDoc tdoc, TidyOptionId optId, ctmbstr val )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(ParseConfigValue)( impl, optId, val );
return no;
}
Bool TIDY_CALL tidyOptParseValue( TidyDoc tdoc, ctmbstr optnam, ctmbstr val )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(ParseConfigOption)( impl, optnam, val );
return no;
}
ulong TIDY_CALL tidyOptGetInt( TidyDoc tdoc, TidyOptionId optId )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
ulong opti = 0;
if ( impl )
opti = cfg( impl, optId );
return opti;
}
Bool TIDY_CALL tidyOptSetInt( TidyDoc tdoc, TidyOptionId optId, ulong val )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(SetOptionInt)( impl, optId, val );
return no;
}
Bool TIDY_CALL tidyOptGetBool( TidyDoc tdoc, TidyOptionId optId )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
Bool optb = no;
if ( impl )
{
const TidyOptionImpl* option = TY_(getOption)( optId );
if ( option )
{
optb = cfgBool( impl, optId );
}
}
return optb;
}
Bool TIDY_CALL tidyOptSetBool( TidyDoc tdoc, TidyOptionId optId, Bool val )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(SetOptionBool)( impl, optId, val );
return no;
}
ctmbstr TIDY_CALL tidyOptGetEncName( TidyDoc tdoc, TidyOptionId optId )
{
uint enc = tidyOptGetInt( tdoc, optId );
return TY_(CharEncodingOptName)( enc );
}
ctmbstr TIDY_CALL tidyOptGetCurrPick( TidyDoc tdoc, TidyOptionId optId )
{
const TidyOptionImpl* option = TY_(getOption)( optId );
2011-11-17 02:44:16 +00:00
if ( option && option->pickList )
{
uint ix = 0;
uint pick = tidyOptGetInt( tdoc, optId );
const PickListItem *item = NULL;
// loop through the picklist until index matches the value
while ( (item = &(*option->pickList)[ ix ]) && item->label && ix<pick )
{
++ix;
}
if ( ix==pick && item->label )
return item->label;
2011-11-17 02:44:16 +00:00
}
2011-11-17 02:44:16 +00:00
return NULL;
}
TidyIterator TIDY_CALL tidyOptGetDeclTagList( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
TidyIterator declIter = 0;
if ( impl )
declIter = TY_(GetDeclaredTagList)( impl );
return declIter;
}
ctmbstr TIDY_CALL tidyOptGetNextDeclTag( TidyDoc tdoc, TidyOptionId optId,
TidyIterator* iter )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
ctmbstr tagnam = NULL;
if ( impl )
{
UserTagType tagtyp = tagtype_null;
if ( optId == TidyInlineTags )
tagtyp = tagtype_inline;
else if ( optId == TidyBlockTags )
tagtyp = tagtype_block;
else if ( optId == TidyEmptyTags )
tagtyp = tagtype_empty;
else if ( optId == TidyPreTags )
tagtyp = tagtype_pre;
if ( tagtyp != tagtype_null )
tagnam = TY_(GetNextDeclaredTag)( impl, tagtyp, iter );
}
return tagnam;
}
ctmbstr TIDY_CALL tidyOptGetDoc( TidyDoc ARG_UNUSED(tdoc), TidyOption opt )
{
const TidyOptionId optId = tidyOptGetId( opt );
return tidyLocalizedString(optId);
2011-11-17 02:44:16 +00:00
}
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
#if SUPPORT_CONSOLE_APP
/* TODO - GROUP ALL CONSOLE-ONLY FUNCTIONS */
2011-11-17 02:44:16 +00:00
TidyIterator TIDY_CALL tidyOptGetDocLinksList( TidyDoc ARG_UNUSED(tdoc), TidyOption opt )
{
const TidyOptionId optId = tidyOptGetId( opt );
const TidyOptionDoc* docDesc = TY_(OptGetDocDesc)( optId );
if (docDesc && docDesc->links)
return (TidyIterator)docDesc->links;
return (TidyIterator)NULL;
}
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
#endif /* SUPPORT_CONSOLE_APP */
2011-11-17 02:44:16 +00:00
TidyOption TIDY_CALL tidyOptGetNextDocLinks( TidyDoc tdoc, TidyIterator* pos )
{
const TidyOptionId* curr = (const TidyOptionId *)*pos;
TidyOption opt;
if (*curr == TidyUnknownOption)
{
*pos = (TidyIterator)NULL;
return (TidyOption)0;
}
opt = tidyGetOption(tdoc, *curr);
curr++;
*pos = (*curr == TidyUnknownOption ) ?
(TidyIterator)NULL:(TidyIterator)curr;
return opt;
}
int TIDY_CALL tidyOptSaveFile( TidyDoc tdoc, ctmbstr cfgfil )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(SaveConfigFile)( impl, cfgfil );
return -EINVAL;
}
int TIDY_CALL tidyOptSaveSink( TidyDoc tdoc, TidyOutputSink* sink )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(SaveConfigSink)( impl, sink );
return -EINVAL;
}
Bool TIDY_CALL tidyOptSnapshot( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
TY_(TakeConfigSnapshot)( impl );
return yes;
}
return no;
}
Bool TIDY_CALL tidyOptResetToSnapshot( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
TY_(ResetConfigToSnapshot)( impl );
return yes;
}
return no;
}
Bool TIDY_CALL tidyOptResetAllToDefault( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
TY_(ResetConfigToDefault)( impl );
return yes;
}
return no;
}
Bool TIDY_CALL tidyOptResetToDefault( TidyDoc tdoc, TidyOptionId optId )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(ResetOptionToDefault)( impl, optId );
return no;
}
Bool TIDY_CALL tidyOptDiffThanDefault( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(ConfigDiffThanDefault)( impl );
return no;
}
Bool TIDY_CALL tidyOptDiffThanSnapshot( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return TY_(ConfigDiffThanSnapshot)( impl );
return no;
}
Bool TIDY_CALL tidyOptCopyConfig( TidyDoc to, TidyDoc from )
{
TidyDocImpl* docTo = tidyDocToImpl( to );
TidyDocImpl* docFrom = tidyDocToImpl( from );
if ( docTo && docFrom )
{
TY_(CopyConfig)( docTo, docFrom );
return yes;
}
return no;
}
/* I/O and Message handling interface
**
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
** By default, Tidy will define, create and use instance of input and output
** handlers for standard C buffered I/O (i.e. FILE* stdin, FILE* stdout and
** FILE* stderr for content input, content output and diagnostic output,
** respectively. A FILE* cfgFile input handler will be used for config files.
** Command line options will just be set directly.
2011-11-17 02:44:16 +00:00
*/
void TIDY_CALL tidySetEmacsFile( TidyDoc tdoc, ctmbstr filePath )
{
tidyOptSetValue( tdoc, TidyEmacsFile, filePath );
}
ctmbstr TIDY_CALL tidyGetEmacsFile( TidyDoc tdoc )
{
return tidyOptGetValue( tdoc, TidyEmacsFile );
}
2011-11-17 02:44:16 +00:00
/* Use TidyReportFilter to filter messages by diagnostic level:
** info, warning, etc. Just set diagnostic output
2011-11-17 02:44:16 +00:00
** handler to redirect all diagnostics output. Return true
** to proceed with output, false to cancel.
*/
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
Bool TIDY_CALL tidySetReportFilter( TidyDoc tdoc, TidyReportFilter filt )
2011-11-17 02:44:16 +00:00
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
impl->reportFilter = filt;
2011-11-17 02:44:16 +00:00
return yes;
}
return no;
}
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
/* tidySetReportCallback functions similar to TidyReportFilter, but provides the
* string version of the internal enum name so that LibTidy users can use
** the string as a lookup key for providing their own error localizations.
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
** See the string key definitions in tidyenum.h.
*/
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
Bool TIDY_CALL tidySetReportCallback( TidyDoc tdoc, TidyReportCallback filt )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
impl->reportCallback = filt;
return yes;
}
return no;
}
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
Bool TIDY_CALL tidySetMessageCallback( TidyDoc tdoc, TidyMessageCallback filt )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
impl->messageCallback = filt;
return yes;
}
return no;
}
TidyDoc TIDY_CALL tidyGetMessageDoc( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
TidyDocImpl* doc = TY_(getMessageDoc)(*message);
return tidyImplToDoc(doc);
}
uint TIDY_CALL tidyGetMessageCode( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageCode)(*message);
}
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
ctmbstr TIDY_CALL tidyGetMessageKey( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageKey)(*message);
}
int TIDY_CALL tidyGetMessageLine( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageLine)(*message);
}
int TIDY_CALL tidyGetMessageColumn( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageColumn)(*message);
}
TidyReportLevel TIDY_CALL tidyGetMessageLevel( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageLevel)(*message);
}
ctmbstr TIDY_CALL tidyGetMessageFormatDefault( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageFormatDefault)(*message);
}
ctmbstr TIDY_CALL tidyGetMessageFormat( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageFormat)(*message);
}
ctmbstr TIDY_CALL tidyGetMessageDefault( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageDefault)(*message);
}
ctmbstr TIDY_CALL tidyGetMessage( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessage)(*message);
}
ctmbstr TIDY_CALL tidyGetMessagePosDefault( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessagePosDefault)(*message);
}
ctmbstr TIDY_CALL tidyGetMessagePos( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessagePos)(*message);
}
ctmbstr TIDY_CALL tidyGetMessagePrefixDefault( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessagePrefixDefault)(*message);
}
ctmbstr TIDY_CALL tidyGetMessagePrefix( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessagePrefix)(*message);
}
ctmbstr TIDY_CALL tidyGetMessageOutputDefault( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageOutputDefault)(*message);
}
ctmbstr TIDY_CALL tidyGetMessageOutput( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageOutput)(*message);
}
TidyIterator TIDY_CALL tidyGetMessageArguments( TidyMessage tmessage )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getMessageArguments)(*message);
}
TidyMessageArgument TIDY_CALL tidyGetNextMessageArgument( TidyMessage tmessage, TidyIterator* iter )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getNextMessageArgument)(*message, iter);
}
TidyFormatParameterType TIDY_CALL tidyGetArgType( TidyMessage tmessage, TidyMessageArgument* arg )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getArgType)(*message, arg);
}
ctmbstr TIDY_CALL tidyGetArgFormat( TidyMessage tmessage, TidyMessageArgument* arg )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getArgFormat)(*message, arg);
}
ctmbstr TIDY_CALL tidyGetArgValueString( TidyMessage tmessage, TidyMessageArgument* arg )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getArgValueString)(*message, arg);
}
uint TIDY_CALL tidyGetArgValueUInt( TidyMessage tmessage, TidyMessageArgument* arg )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getArgValueUInt)(*message, arg);
}
int TIDY_CALL tidyGetArgValueInt( TidyMessage tmessage, TidyMessageArgument* arg )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getArgValueInt)(*message, arg);
}
double TIDY_CALL tidyGetArgValueDouble( TidyMessage tmessage, TidyMessageArgument* arg )
{
TidyMessageImpl *message = tidyMessageToImpl(tmessage);
return TY_(getArgValueDouble)(*message, arg);
}
2011-11-17 02:44:16 +00:00
#if 0 /* Not yet */
int tidySetContentOutputSink( TidyDoc tdoc, TidyOutputSink* outp )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
impl->docOut = outp;
return 0;
}
return -EINVAL;
}
int tidySetDiagnosticOutputSink( TidyDoc tdoc, TidyOutputSink* outp )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
impl->msgOut = outp;
return 0;
}
return -EINVAL;
}
/* Library helpers
*/
cmbstr tidyLookupMessage( TidyDoc tdoc, int errorNo )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
cmbstr mssg = NULL;
if ( impl )
mssg = tidyMessage_Lookup( impl->messages, errorNo );
return mssg;
}
#endif
FILE* TIDY_CALL tidySetErrorFile( TidyDoc tdoc, ctmbstr errfilnam )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
FILE* errout = fopen( errfilnam, "wb" );
if ( errout )
{
uint outenc = cfg( impl, TidyOutCharEncoding );
uint nl = cfg( impl, TidyNewline );
TY_(ReleaseStreamOut)( impl, impl->errout );
impl->errout = TY_(FileOutput)( impl, errout, outenc, nl );
return errout;
}
else /* Emit message to current error sink */
TY_(ReportFileError)( impl, errfilnam, FILE_CANT_OPEN );
2011-11-17 02:44:16 +00:00
}
return NULL;
}
int TIDY_CALL tidySetErrorBuffer( TidyDoc tdoc, TidyBuffer* errbuf )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
uint outenc = cfg( impl, TidyOutCharEncoding );
uint nl = cfg( impl, TidyNewline );
TY_(ReleaseStreamOut)( impl, impl->errout );
impl->errout = TY_(BufferOutput)( impl, errbuf, outenc, nl );
return ( impl->errout ? 0 : -ENOMEM );
}
return -EINVAL;
}
int TIDY_CALL tidySetErrorSink( TidyDoc tdoc, TidyOutputSink* sink )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
uint outenc = cfg( impl, TidyOutCharEncoding );
uint nl = cfg( impl, TidyNewline );
TY_(ReleaseStreamOut)( impl, impl->errout );
impl->errout = TY_(UserOutput)( impl, sink, outenc, nl );
return ( impl->errout ? 0 : -ENOMEM );
}
return -EINVAL;
}
2015-11-28 07:34:23 +00:00
/* Use TidyPPProgress to monitor the progress of the pretty printer.
*/
Bool TIDY_CALL tidySetPrettyPrinterCallback(TidyDoc tdoc, TidyPPProgress callback)
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
{
impl->progressCallback = callback;
return yes;
}
return no;
}
2011-11-17 02:44:16 +00:00
/* Document info */
int TIDY_CALL tidyStatus( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
int tidyStat = -EINVAL;
if ( impl )
tidyStat = tidyDocStatus( impl );
return tidyStat;
}
int TIDY_CALL tidyDetectedHtmlVersion( TidyDoc ARG_UNUSED(tdoc) )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
return TY_(HTMLVersionNumberFromCode)( impl->lexer->versionEmitted );
2011-11-17 02:44:16 +00:00
}
2011-11-17 02:44:16 +00:00
Bool TIDY_CALL tidyDetectedXhtml( TidyDoc ARG_UNUSED(tdoc) )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
return impl->lexer->isvoyager;
2011-11-17 02:44:16 +00:00
}
Bool TIDY_CALL tidyDetectedGenericXml( TidyDoc ARG_UNUSED(tdoc) )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
return impl->xmlDetected;
2011-11-17 02:44:16 +00:00
}
uint TIDY_CALL tidyErrorCount( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
uint count = 0xFFFFFFFF;
if ( impl )
count = impl->errors;
return count;
}
uint TIDY_CALL tidyWarningCount( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
uint count = 0xFFFFFFFF;
if ( impl )
count = impl->warnings;
return count;
}
uint TIDY_CALL tidyAccessWarningCount( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
uint count = 0xFFFFFFFF;
if ( impl )
count = impl->accessErrors;
return count;
}
uint TIDY_CALL tidyConfigErrorCount( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
uint count = 0xFFFFFFFF;
if ( impl )
count = impl->optionErrors;
return count;
}
/* Error reporting functions
2011-11-17 02:44:16 +00:00
*/
void TIDY_CALL tidyErrorSummary( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
TY_(ErrorSummary)( impl );
}
void TIDY_CALL tidyGeneralInfo( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
{
TY_(Dialogue)( impl, TEXT_GENERAL_INFO );
TY_(Dialogue)( impl, TEXT_GENERAL_INFO_PLEA );
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
}
2011-11-17 02:44:16 +00:00
}
/* I/O Functions
**
** Initial version supports only whole-file operations.
** Do not expose Tidy StreamIn or Out data structures - yet.
*/
/* Parse/load Functions
**
** HTML/XHTML version determined from input.
*/
int TIDY_CALL tidyParseFile( TidyDoc tdoc, ctmbstr filnam )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocParseFile( doc, filnam );
}
int TIDY_CALL tidyParseStdin( TidyDoc tdoc )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocParseStdin( doc );
}
int TIDY_CALL tidyParseString( TidyDoc tdoc, ctmbstr content )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocParseString( doc, content );
}
int TIDY_CALL tidyParseBuffer( TidyDoc tdoc, TidyBuffer* inbuf )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocParseBuffer( doc, inbuf );
}
int TIDY_CALL tidyParseSource( TidyDoc tdoc, TidyInputSource* source )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocParseSource( doc, source );
}
int tidyDocParseFile( TidyDocImpl* doc, ctmbstr filnam )
{
int status = -ENOENT;
FILE* fin = fopen( filnam, "r+" );
if ( !fin )
{
TY_(ReportFileError)( doc, filnam, FILE_NOT_FILE );
return status;
}
fclose( fin );
2011-11-17 02:44:16 +00:00
#ifdef _WIN32
return TY_(DocParseFileWithMappedFile)( doc, filnam );
#else
fin = fopen( filnam, "rb" );
2011-11-17 02:44:16 +00:00
#if PRESERVE_FILE_TIMES
struct stat sbuf = {0};
/* get last modified time */
TidyClearMemory( &doc->filetimes, sizeof(doc->filetimes) );
if ( fin && cfgBool(doc,TidyKeepFileTimes) &&
fstat(fileno(fin), &sbuf) != -1 )
{
doc->filetimes.actime = sbuf.st_atime;
doc->filetimes.modtime = sbuf.st_mtime;
}
#endif
if ( fin )
{
StreamIn* in = TY_(FileInput)( doc, fin, cfg( doc, TidyInCharEncoding ));
if ( !in )
{
fclose( fin );
return status;
}
status = TY_(DocParseStream)( doc, in );
TY_(freeFileSource)(&in->source, yes);
TY_(freeStreamIn)(in);
}
else /* Error message! */
TY_(ReportFileError)( doc, filnam, FILE_CANT_OPEN );
2011-11-17 02:44:16 +00:00
return status;
#endif
}
int tidyDocParseStdin( TidyDocImpl* doc )
{
StreamIn* in = TY_(FileInput)( doc, stdin, cfg( doc, TidyInCharEncoding ));
int status = TY_(DocParseStream)( doc, in );
TY_(freeStreamIn)(in);
return status;
}
int tidyDocParseBuffer( TidyDocImpl* doc, TidyBuffer* inbuf )
{
int status = -EINVAL;
if ( inbuf )
{
StreamIn* in = TY_(BufferInput)( doc, inbuf, cfg( doc, TidyInCharEncoding ));
status = TY_(DocParseStream)( doc, in );
TY_(freeStreamIn)(in);
}
return status;
}
int tidyDocParseString( TidyDocImpl* doc, ctmbstr content )
{
int status = -EINVAL;
TidyBuffer inbuf;
StreamIn* in = NULL;
if ( content )
{
tidyBufInitWithAllocator( &inbuf, doc->allocator );
tidyBufAttach( &inbuf, (byte*)content, TY_(tmbstrlen)(content)+1 );
in = TY_(BufferInput)( doc, &inbuf, cfg( doc, TidyInCharEncoding ));
status = TY_(DocParseStream)( doc, in );
tidyBufDetach( &inbuf );
TY_(freeStreamIn)(in);
}
return status;
}
int tidyDocParseSource( TidyDocImpl* doc, TidyInputSource* source )
{
StreamIn* in = TY_(UserInput)( doc, source, cfg( doc, TidyInCharEncoding ));
int status = TY_(DocParseStream)( doc, in );
TY_(freeStreamIn)(in);
return status;
}
/* Print/save Functions
**
*/
int TIDY_CALL tidySaveFile( TidyDoc tdoc, ctmbstr filnam )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocSaveFile( doc, filnam );
}
int TIDY_CALL tidySaveStdout( TidyDoc tdoc )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocSaveStdout( doc );
}
int TIDY_CALL tidySaveString( TidyDoc tdoc, tmbstr buffer, uint* buflen )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocSaveString( doc, buffer, buflen );
}
int TIDY_CALL tidySaveBuffer( TidyDoc tdoc, TidyBuffer* outbuf )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocSaveBuffer( doc, outbuf );
}
int TIDY_CALL tidySaveSink( TidyDoc tdoc, TidyOutputSink* sink )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
return tidyDocSaveSink( doc, sink );
}
int tidyDocSaveFile( TidyDocImpl* doc, ctmbstr filnam )
{
int status = -ENOENT;
FILE* fout = NULL;
/* Don't zap input file if no output */
if ( doc->errors > 0 &&
cfgBool(doc, TidyWriteBack) && !cfgBool(doc, TidyForceOutput) )
status = tidyDocStatus( doc );
else
2011-11-17 02:44:16 +00:00
fout = fopen( filnam, "wb" );
if ( fout )
{
uint outenc = cfg( doc, TidyOutCharEncoding );
uint nl = cfg( doc, TidyNewline );
StreamOut* out = TY_(FileOutput)( doc, fout, outenc, nl );
status = tidyDocSaveStream( doc, out );
fclose( fout );
TidyDocFree( doc, out );
#if PRESERVE_FILE_TIMES
if ( doc->filetimes.actime )
{
/* set file last accessed/modified times to original values */
utime( filnam, &doc->filetimes );
TidyClearMemory( &doc->filetimes, sizeof(doc->filetimes) );
}
#endif /* PRESERVFILETIMES */
}
if ( status < 0 ) /* Error message! */
TY_(ReportFileError)( doc, filnam, FILE_CANT_OPEN );
2011-11-17 02:44:16 +00:00
return status;
}
/* Note, _setmode() does NOT work on Win2K Pro w/ VC++ 6.0 SP3.
** The code has been left in in case it works w/ other compilers
** or operating systems. If stdout is in Text mode, be aware that
** it will garble UTF16 documents. In text mode, when it encounters
** a single byte of value 10 (0xA), it will insert a single byte
2011-11-17 02:44:16 +00:00
** value 13 (0xD) just before it. This has the effect of garbling
** the entire document.
*/
#if !defined(NO_SETMODE_SUPPORT)
#if defined(_WIN32) || defined(OS2_OS)
#include <fcntl.h>
#include <io.h>
#endif
#endif
int tidyDocSaveStdout( TidyDocImpl* doc )
{
#if !defined(NO_SETMODE_SUPPORT)
#if defined(_WIN32) || defined(OS2_OS)
int oldstdoutmode = -1, oldstderrmode = -1;
#endif
#endif
int status = 0;
uint outenc = cfg( doc, TidyOutCharEncoding );
uint nl = cfg( doc, TidyNewline );
StreamOut* out = TY_(FileOutput)( doc, stdout, outenc, nl );
#if !defined(NO_SETMODE_SUPPORT)
#if defined(_WIN32) || defined(OS2_OS)
oldstdoutmode = setmode( fileno(stdout), _O_BINARY );
oldstderrmode = setmode( fileno(stderr), _O_BINARY );
#endif
#endif
if ( 0 == status )
status = tidyDocSaveStream( doc, out );
fflush(stdout);
fflush(stderr);
#if !defined(NO_SETMODE_SUPPORT)
#if defined(_WIN32) || defined(OS2_OS)
if ( oldstdoutmode != -1 )
oldstdoutmode = setmode( fileno(stdout), oldstdoutmode );
if ( oldstderrmode != -1 )
oldstderrmode = setmode( fileno(stderr), oldstderrmode );
#endif
#endif
TidyDocFree( doc, out );
return status;
}
int tidyDocSaveString( TidyDocImpl* doc, tmbstr buffer, uint* buflen )
{
uint outenc = cfg( doc, TidyOutCharEncoding );
uint nl = cfg( doc, TidyNewline );
TidyBuffer outbuf;
StreamOut* out;
int status;
2011-11-17 02:44:16 +00:00
tidyBufInitWithAllocator( &outbuf, doc->allocator );
out = TY_(BufferOutput)( doc, &outbuf, outenc, nl );
status = tidyDocSaveStream( doc, out );
if ( outbuf.size > *buflen )
status = -ENOMEM;
else
memcpy( buffer, outbuf.bp, outbuf.size );
*buflen = outbuf.size;
tidyBufFree( &outbuf );
TidyDocFree( doc, out );
return status;
}
int tidyDocSaveBuffer( TidyDocImpl* doc, TidyBuffer* outbuf )
{
int status = -EINVAL;
if ( outbuf )
{
uint outenc = cfg( doc, TidyOutCharEncoding );
uint nl = cfg( doc, TidyNewline );
StreamOut* out = TY_(BufferOutput)( doc, outbuf, outenc, nl );
2011-11-17 02:44:16 +00:00
status = tidyDocSaveStream( doc, out );
TidyDocFree( doc, out );
}
return status;
}
int tidyDocSaveSink( TidyDocImpl* doc, TidyOutputSink* sink )
{
uint outenc = cfg( doc, TidyOutCharEncoding );
uint nl = cfg( doc, TidyNewline );
StreamOut* out = TY_(UserOutput)( doc, sink, outenc, nl );
int status = tidyDocSaveStream( doc, out );
TidyDocFree( doc, out );
return status;
}
int tidyDocStatus( TidyDocImpl* doc )
{
if ( doc->errors > 0 )
return 2;
if ( doc->warnings > 0 || doc->accessErrors > 0 )
return 1;
return 0;
}
int TIDY_CALL tidyCleanAndRepair( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return tidyDocCleanAndRepair( impl );
return -EINVAL;
}
int TIDY_CALL tidyRunDiagnostics( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl )
return tidyDocRunDiagnostics( impl );
return -EINVAL;
}
2014-11-22 07:42:28 +00:00
int TIDY_CALL tidyReportDoctype( TidyDoc tdoc )
{
int iret = -EINVAL;
TidyDocImpl* impl = tidyDocToImpl( tdoc );
if ( impl ) {
tidyDocReportDoctype( impl );
iret = 0;
}
return iret;
}
2011-11-17 02:44:16 +00:00
/* Workhorse functions.
**
** Parse requires input source, all input config items
2011-11-17 02:44:16 +00:00
** and diagnostic sink to have all been set before calling.
**
** Emit likewise requires that document sink and all
** pretty printing options have been set.
*/
static ctmbstr integrity = "\nPanic - tree has lost its integrity\n";
int TY_(DocParseStream)( TidyDocImpl* doc, StreamIn* in )
{
Bool xmlIn = cfgBool( doc, TidyXmlTags );
int bomEnc;
assert( doc != NULL && in != NULL );
assert( doc->docIn == NULL );
doc->docIn = in;
TY_(ResetTags)(doc); /* reset table to html5 mode */
2011-11-17 02:44:16 +00:00
TY_(TakeConfigSnapshot)( doc ); /* Save config state */
TY_(FreeAnchors)( doc );
TY_(FreeNode)(doc, &doc->root);
TidyClearMemory(&doc->root, sizeof(Node));
if (doc->givenDoctype)
TidyDocFree(doc, doc->givenDoctype);
/*\
* Issue #186 - Now FreeNode depend on the doctype, so the lexer is needed
* to determine which hash is to be used, so free it last.
\*/
TY_(FreeLexer)( doc );
2011-11-17 02:44:16 +00:00
doc->givenDoctype = NULL;
doc->lexer = TY_(NewLexer)( doc );
/* doc->lexer->root = &doc->root; */
doc->root.line = doc->lexer->lines;
doc->root.column = doc->lexer->columns;
doc->inputHadBOM = no;
doc->xmlDetected = no;
2011-11-17 02:44:16 +00:00
bomEnc = TY_(ReadBOMEncoding)(in);
if (bomEnc != -1)
{
in->encoding = bomEnc;
TY_(SetOptionInt)(doc, TidyInCharEncoding, bomEnc);
}
#ifdef TIDY_WIN32_MLANG_SUPPORT
if (in->encoding > WIN32MLANG)
TY_(Win32MLangInitInputTranscoder)(in, in->encoding);
#endif /* TIDY_WIN32_MLANG_SUPPORT */
/* Tidy doesn't alter the doctype for generic XML docs */
if ( xmlIn )
{
TY_(ParseXMLDocument)( doc );
if ( !TY_(CheckNodeIntegrity)( &doc->root ) )
TidyPanic( doc->allocator, integrity );
}
else
{
doc->warnings = 0;
TY_(ParseDocument)( doc );
if ( !TY_(CheckNodeIntegrity)( &doc->root ) )
TidyPanic( doc->allocator, integrity );
}
#ifdef TIDY_WIN32_MLANG_SUPPORT
TY_(Win32MLangUninitInputTranscoder)(in);
#endif /* TIDY_WIN32_MLANG_SUPPORT */
doc->docIn = NULL;
return tidyDocStatus( doc );
}
int tidyDocRunDiagnostics( TidyDocImpl* doc )
{
Bool quiet = cfgBool( doc, TidyQuiet );
Bool force = cfgBool( doc, TidyForceOutput );
if ( !quiet )
{
TY_(ReportMarkupVersion)( doc );
TY_(ReportNumWarnings)( doc );
}
2011-11-17 02:44:16 +00:00
if ( doc->errors > 0 && !force )
TY_(Dialogue)(doc, STRING_NEEDS_INTERVENTION );
2011-11-17 02:44:16 +00:00
return tidyDocStatus( doc );
}
2014-11-22 07:42:28 +00:00
void tidyDocReportDoctype( TidyDocImpl* doc )
{
TY_(ReportMarkupVersion)( doc );
}
/*****************************************************************************
* HTML5 STUFF
*****************************************************************************/
2014-11-22 07:42:28 +00:00
#if !defined(NDEBUG) && defined(_MSC_VER)
extern void show_not_html5(void);
/* -----------------------------
List tags that do not have version HTML5 (HT50|XH50)
acronym applet basefont big center dir font frame frameset isindex
listing noframes plaintext rb rbc rtc strike tt xmp nextid
align bgsound blink comment ilayer layer marquee multicol nobr noembed
nolayer nosave server servlet spacer
Listed total 35 tags that do not have version 393216
------------------------------ */
static void list_not_html5(void)
{
static Bool done_list = no;
if (done_list == no) {
done_list = yes;
show_not_html5();
}
}
#endif
/* What about <blink>, <s> stike-through, <u> underline */
static struct _html5Info
{
const char *tag;
uint id;
} const html5Info[] = {
{"acronym", TidyTag_ACRONYM},
{"applet", TidyTag_APPLET },
{"basefont",TidyTag_BASEFONT },
{ "big", TidyTag_BIG },
{ "center", TidyTag_CENTER },
{ "dir", TidyTag_DIR },
{ "font", TidyTag_FONT },
{ "frame", TidyTag_FRAME},
{ "frameset", TidyTag_FRAMESET},
{ "noframes", TidyTag_NOFRAMES },
{ "strike", TidyTag_STRIKE },
{ "tt", TidyTag_TT },
{ 0, 0 }
};
Bool inRemovedInfo( uint tid )
{
int i;
for (i = 0; ; i++) {
if (html5Info[i].tag == 0)
break;
if (html5Info[i].id == tid)
return yes;
}
return no;
}
/* Things that should not be in an HTML5 body. This is special for CheckHTML5(),
and we might just want to remove CheckHTML5()'s output altogether and count
on the default --strict-tags-attributes.
*/
2016-02-17 02:56:21 +00:00
static int BadBody5Attribs[] = {
TidyAttr_BACKGROUND,
TidyAttr_BGCOLOR,
TidyAttr_TEXT,
TidyAttr_LINK,
TidyAttr_VLINK,
TidyAttr_ALINK,
TidyAttr_UNKNOWN /* Must be last! */
};
2014-11-22 07:42:28 +00:00
static Bool nodeHasAlignAttr( Node *node )
{
/* #define attrIsALIGN(av) AttrIsId( av, TidyAttr_ALIGN ) */
AttVal* av;
for ( av = node->attributes; av != NULL; av = av->next ) {
if (attrIsALIGN(av))
return yes;
}
return no;
}
/*
* Perform special checks for HTML, even when we're not using the default
* option `--strict-tags-attributes yes`. This will ensure that HTML5 warning
* and error output is given regardless of the new option, and ensure that
* cleanup takes place. This provides mostly consistent Tidy behavior even with
* the introduction of this new option. Note that strings have changed, though,
* in order to maintain consistency with the `--strict-tags-attributes`
* messages.
*
* See also: http://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#obsolete
*/
2014-11-22 07:42:28 +00:00
void TY_(CheckHTML5)( TidyDocImpl* doc, Node* node )
{
Bool clean = cfgBool( doc, TidyMakeClean );
Bool already_strict = cfgBool( doc, TidyStrictTagsAttr );
2014-11-22 07:42:28 +00:00
Node* body = TY_(FindBody)( doc );
Bool warn = yes; /* should this be a warning, error, or report??? */
AttVal* attr = NULL;
int i = 0;
2014-11-22 07:42:28 +00:00
#if !defined(NDEBUG) && defined(_MSC_VER)
// list_not_html5();
2014-11-22 07:42:28 +00:00
#endif
while (node)
{
if ( nodeHasAlignAttr( node ) ) {
/* @todo: Is this for ALL elements that accept an 'align' attribute,
* or should this be a sub-set test?
*/
/* We will only emit this message if `--strict-tags-attributes==no`;
* otherwise if yes this message will be output during later
* checking.
*/
if ( !already_strict )
TY_(ReportAttrError)(doc, node, TY_(AttrGetById)(node, TidyAttr_ALIGN), MISMATCHED_ATTRIBUTE_WARN);
2014-11-22 07:42:28 +00:00
}
if ( node == body ) {
i = 0;
/* We will only emit these messages if `--strict-tags-attributes==no`;
* otherwise if yes these messages will be output during later
* checking.
*/
if ( !already_strict ) {
while ( BadBody5Attribs[i] != TidyAttr_UNKNOWN ) {
attr = TY_(AttrGetById)(node, BadBody5Attribs[i]);
if ( attr )
TY_(ReportAttrError)(doc, node, attr , MISMATCHED_ATTRIBUTE_WARN);
i++;
}
2014-11-22 07:42:28 +00:00
}
} else
if ( nodeIsACRONYM(node) ) {
if (clean) {
/* Replace with 'abbr' with warning to that effect.
* Maybe should use static void RenameElem( TidyDocImpl* doc, Node* node, TidyTagId tid )
2014-11-22 07:42:28 +00:00
*/
TY_(CoerceNode)(doc, node, TidyTag_ABBR, warn, no);
} else {
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
}
} else
2014-11-22 07:42:28 +00:00
if ( nodeIsAPPLET(node) ) {
if (clean) {
/* replace with 'object' with warning to that effect
2014-11-22 07:42:28 +00:00
* maybe should use static void RenameElem( TidyDocImpl* doc, Node* node, TidyTagId tid )
*/
TY_(CoerceNode)(doc, node, TidyTag_OBJECT, warn, no);
} else {
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
}
} else
if ( nodeIsBASEFONT(node) ) {
/* basefont: CSS equivalent 'font-size', 'font-family' and 'color'
* on body or class on each subsequent element.
* Difficult - If it is the first body element, then could consider
* adding that to the <body> as a whole, else could perhaps apply it
* to all subsequent elements. But also in consideration is the fact
* that it was NOT supported in many browsers.
* - For now just report a warning
*/
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
} else
if ( nodeIsBIG(node) ) {
/* big: CSS equivalent 'font-size:larger'
* so could replace the <big> ... </big> with
2014-11-22 07:42:28 +00:00
* <span style="font-size: larger"> ... </span>
* then replace <big> with <span>
* Need to think about that...
* Could use -
* TY_(AddStyleProperty)( doc, node, "font-size: larger" );
* TY_(CoerceNode)(doc, node, TidyTag_SPAN, no, no);
* Alternatively generated a <style> but how to get the style name
* TY_(AddAttribute)( doc, node, "class", "????" );
* Also maybe need a specific message like
* Element '%s' replaced with 'span' with a 'font-size: larger style attribute
* maybe should use static void RenameElem( TidyDocImpl* doc, Node* node, TidyTagId tid )
*/
2014-11-22 07:42:28 +00:00
if (clean) {
TY_(AddStyleProperty)( doc, node, "font-size: larger" );
TY_(CoerceNode)(doc, node, TidyTag_SPAN, warn, no);
} else {
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
}
} else
if ( nodeIsCENTER(node) ) {
/* center: CSS equivalent 'text-align:center'
* and 'margin-left:auto; margin-right:auto' on descendant blocks
* Tidy already handles this if 'clean' by SILENTLY generating the
* <style> and adding a <div class="c1"> around the elements.
2014-11-22 07:42:28 +00:00
* see: static Bool Center2Div( TidyDocImpl* doc, Node *node, Node **pnode)
*/
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
} else
2014-11-22 07:42:28 +00:00
if ( nodeIsDIR(node) ) {
/* dir: replace by <ul>
* Tidy already actions this and issues a warning
* Should this be CHANGED???
*/
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
} else
if ( nodeIsFONT(node) ) {
/* Tidy already handles this -
* If 'clean' replaced by CSS, else
2014-11-22 07:42:28 +00:00
* if is NOT clean, and doctype html5 then warnings issued
* done in Bool Font2Span( TidyDocImpl* doc, Node *node, Node **pnode ) (I think?)
*/
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
} else
if (( nodesIsFRAME(node) ) || ( nodeIsFRAMESET(node) ) || ( nodeIsNOFRAMES(node) )) {
/* YOW: What to do here?????? Maybe <iframe>????
*/
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
} else
if ( nodeIsSTRIKE(node) ) {
/* strike: CSS equivalent 'text-decoration:line-through'
2014-11-22 07:42:28 +00:00
* maybe should use static void RenameElem( TidyDocImpl* doc, Node* node, TidyTagId tid )
*/
2014-11-22 07:42:28 +00:00
if (clean) {
TY_(AddStyleProperty)( doc, node, "text-decoration: line-through" );
TY_(CoerceNode)(doc, node, TidyTag_SPAN, warn, no);
} else {
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
}
} else
if ( nodeIsTT(node) ) {
/* tt: CSS equivalent 'font-family:monospace'
2014-11-22 07:42:28 +00:00
* Tidy presently does nothing. Tidy5 issues a warning
* But like the 'clean' <font> replacement this could also be replaced with CSS
* maybe should use static void RenameElem( TidyDocImpl* doc, Node* node, TidyTagId tid )
*/
2014-11-22 07:42:28 +00:00
if (clean) {
TY_(AddStyleProperty)( doc, node, "font-family: monospace" );
TY_(CoerceNode)(doc, node, TidyTag_SPAN, warn, no);
} else {
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
2014-11-22 07:42:28 +00:00
}
} else
if (TY_(nodeIsElement)(node)) {
if (node->tag) {
2016-02-13 03:53:53 +00:00
if ( (!(node->tag->versions & VERS_HTML5) && !(node->tag->versions & VERS_PROPRIETARY)) || (inRemovedInfo(node->tag->id)) ) {
if ( !already_strict )
TY_(Report)(doc, node, node, REMOVED_HTML5);
}
2014-11-22 07:42:28 +00:00
}
}
if (node->content)
TY_(CheckHTML5)( doc, node->content );
2014-11-22 07:42:28 +00:00
node = node->next;
}
}
/*****************************************************************************
* END HTML5 STUFF
*****************************************************************************/
/*
* Check and report HTML tags and attributes that are:
* - Proprietary, and/or
* - Not supported in the current version of HTML, defined as the version
* of HTML that we are emitting.
* Proprietary items are reported as WARNINGS, and version mismatches will
* be reported as WARNING or ERROR in the following conditions:
* - ERROR if the emitted doctype is a strict doctype.
* - WARNING if the emitted doctype is a non-strict doctype.
* The propriety checks are *always* run as they have always been an integral
* part of Tidy. The version checks are controlled by `strict-tags-attributes`.
2014-11-22 07:42:28 +00:00
*/
void TY_(CheckHTMLTagsAttribsVersions)( TidyDocImpl* doc, Node* node )
{
uint versionEmitted = doc->lexer->versionEmitted;
uint declared = doc->lexer->doctype;
uint version = versionEmitted == 0 ? declared : versionEmitted;
int tagReportType = VERS_STRICT & version ? ELEMENT_VERS_MISMATCH_ERROR : ELEMENT_VERS_MISMATCH_WARN;
int attrReportType = VERS_STRICT & version ? MISMATCHED_ATTRIBUTE_ERROR : MISMATCHED_ATTRIBUTE_WARN;
Bool check_versions = cfgBool( doc, TidyStrictTagsAttr );
AttVal *next_attr, *attval;
Bool attrIsProprietary = no;
Bool attrIsMismatched = yes;
Bool tagLooksCustom = no;
Bool htmlIs5 = (doc->lexer->doctype & VERS_HTML5) > 0;
while (node)
{
/* This bit here handles our HTML tags */
if ( TY_(nodeIsElement)(node) && node->tag ) {
/* Leave XML stuff alone. */
if ( !cfgBool(doc, TidyXmlTags) )
{
/* Version mismatches take priority. */
if ( check_versions && !(node->tag->versions & version) )
{
TY_(Report)(doc, NULL, node, tagReportType );
}
/* If it's not mismatched, it could still be proprietary. */
else if ( node->tag->versions & VERS_PROPRIETARY )
{
if ( !cfgBool(doc, TidyMakeClean) ||
( !nodeIsNOBR(node) && !nodeIsWBR(node) ) )
{
/* It looks custom, despite whether it's a known tag. */
tagLooksCustom = TY_(nodeIsAutonomousCustomFormat)( node );
/* If we're in HTML5 mode and the tag does not look
like a valid custom tag, then issue a warning.
Appearance is good enough because invalid tags have
been dropped. Also, if we're not in HTML5 mode, then
then everything that reaches here gets the warning.
Everything else can be ignored. */
if ( (htmlIs5 && !tagLooksCustom) || !htmlIs5 )
{
TY_(Report)(doc, NULL, node, PROPRIETARY_ELEMENT );
}
if ( nodeIsLAYER(node) )
doc->badLayout |= USING_LAYER;
else if ( nodeIsSPACER(node) )
doc->badLayout |= USING_SPACER;
else if ( nodeIsNOBR(node) )
doc->badLayout |= USING_NOBR;
}
}
}
}
/* And this bit here handles our attributes */
if (TY_(nodeIsElement)(node))
{
attval = node->attributes;
while (attval)
{
next_attr = attval->next;
attrIsProprietary = TY_(AttributeIsProprietary)(node, attval);
attrIsMismatched = check_versions ? TY_(AttributeIsMismatched)(node, attval, doc) : no;
/* Let the PROPRIETARY_ATTRIBUTE warning have precedence. */
if ( attrIsProprietary )
{
if ( cfgBool(doc, TidyWarnPropAttrs) )
TY_(ReportAttrError)(doc, node, attval, PROPRIETARY_ATTRIBUTE);
}
else if ( attrIsMismatched )
{
TY_(ReportAttrError)(doc, node, attval, attrReportType);
}
/* @todo: do we need a new option to drop mismatches? Or should we
simply drop them? */
if ( ( attrIsProprietary || attrIsMismatched ) && cfgBool(doc, TidyDropPropAttrs) )
TY_(RemoveAttribute)( doc, node, attval );
attval = next_attr;
}
}
if (node->content)
TY_(CheckHTMLTagsAttribsVersions)( doc, node->content );
node = node->next;
}
}
2015-02-24 12:20:26 +00:00
2015-02-19 18:14:40 +00:00
#if !defined(NDEBUG) && defined(_MSC_VER)
2015-02-24 12:20:26 +00:00
/* *** FOR DEBUG ONLY *** */
const char *dbg_get_lexer_type( void *vp )
2015-02-19 18:14:40 +00:00
{
2015-02-24 12:20:26 +00:00
Node *node = (Node *)vp;
switch ( node->type )
2015-02-19 18:14:40 +00:00
{
2015-02-24 12:20:26 +00:00
case RootNode: return "Root";
case DocTypeTag: return "DocType";
case CommentTag: return "Comment";
case ProcInsTag: return "ProcIns";
case TextNode: return "Text";
case StartTag: return "StartTag";
case EndTag: return "EndTag";
case StartEndTag: return "StartEnd";
case CDATATag: return "CDATA";
case SectionTag: return "Section";
case AspTag: return "Asp";
case JsteTag: return "Jste";
case PhpTag: return "Php";
case XmlDecl: return "XmlDecl";
2015-02-19 18:14:40 +00:00
}
2015-02-24 12:20:26 +00:00
return "Uncased";
}
2015-02-19 18:14:40 +00:00
2015-02-24 12:20:26 +00:00
/* NOTE: THis matches the above lexer type, except when element has a name */
const char *dbg_get_element_name( void *vp )
{
Node *node = (Node *)vp;
2015-02-19 18:14:40 +00:00
switch ( node->type )
{
2015-02-24 12:20:26 +00:00
case TidyNode_Root: return "Root";
case TidyNode_DocType: return "DocType";
case TidyNode_Comment: return "Comment";
case TidyNode_ProcIns: return "ProcIns";
case TidyNode_Text: return "Text";
case TidyNode_CDATA: return "CDATA";
case TidyNode_Section: return "Section";
case TidyNode_Asp: return "Asp";
case TidyNode_Jste: return "Jste";
case TidyNode_Php: return "Php";
case TidyNode_XmlDecl: return "XmlDecl";
2015-02-19 18:14:40 +00:00
case TidyNode_Start:
case TidyNode_End:
case TidyNode_StartEnd:
default:
if (node->element)
2015-02-24 12:20:26 +00:00
return node->element;
}
return "Unknown";
}
void dbg_show_node( TidyDocImpl* doc, Node *node, int caller, int indent )
{
2015-03-06 17:36:01 +00:00
AttVal* av;
Lexer* lexer = doc->lexer;
2015-02-24 12:20:26 +00:00
ctmbstr call = "";
ctmbstr name = dbg_get_element_name(node);
ctmbstr type = dbg_get_lexer_type(node);
ctmbstr impl = node->implicit ? "implicit" : "";
switch ( caller )
{
case 1: call = "discard"; break;
case 2: call = "trim"; break;
case 3: call = "test"; break;
2015-02-19 18:14:40 +00:00
}
while (indent--)
SPRTF(" ");
2015-02-24 12:20:26 +00:00
if (strcmp(type,name))
2015-03-06 17:36:01 +00:00
SPRTF("%s %s %s %s", type, name, impl, call );
2015-02-24 12:20:26 +00:00
else
2015-03-06 17:36:01 +00:00
SPRTF("%s %s %s", name, impl, call );
if (lexer && (strcmp("Text",name) == 0)) {
uint len = node->end - node->start;
uint i;
SPRTF(" (%d) '", len);
if (len < 40) {
/* show it all */
for (i = node->start; i < node->end; i++) {
SPRTF("%c", lexer->lexbuf[i]);
}
} else {
/* partial display */
uint max = 19;
for (i = node->start; i < max; i++) {
SPRTF("%c", lexer->lexbuf[i]);
}
SPRTF("...");
i = node->end - 19;
for (; i < node->end; i++) {
SPRTF("%c", lexer->lexbuf[i]);
}
}
SPRTF("'");
}
2015-03-06 17:36:01 +00:00
for (av = node->attributes; av; av = av->next) {
name = av->attribute;
if (name) {
SPRTF(" %s",name);
if (av->value) {
SPRTF("=\"%s\"", av->value);
}
}
}
2015-03-06 17:36:01 +00:00
SPRTF("\n");
2015-02-19 18:14:40 +00:00
}
2015-02-24 12:20:26 +00:00
void dbg_show_all_nodes( TidyDocImpl* doc, Node *node, int indent )
2015-02-19 18:14:40 +00:00
{
while (node)
{
2015-02-24 12:20:26 +00:00
dbg_show_node( doc, node, 0, indent );
dbg_show_all_nodes( doc, node->content, indent + 1 );
2015-02-19 18:14:40 +00:00
node = node->next;
}
}
#endif
2014-11-22 07:42:28 +00:00
2011-11-17 02:44:16 +00:00
int tidyDocCleanAndRepair( TidyDocImpl* doc )
{
Bool word2K = cfgBool( doc, TidyWord2000 );
Bool logical = cfgBool( doc, TidyLogicalEmphasis );
Bool clean = cfgBool( doc, TidyMakeClean );
Bool gdoc = cfgBool( doc, TidyGDocClean );
2011-11-17 02:44:16 +00:00
Bool htmlOut = cfgBool( doc, TidyHtmlOut );
Bool xmlOut = cfgBool( doc, TidyXmlOut );
Bool xhtmlOut = cfgBool( doc, TidyXhtmlOut );
Bool xmlDecl = cfgBool( doc, TidyXmlDecl );
Bool tidyMark = cfgBool( doc, TidyMark );
Bool tidyXmlTags = cfgBool( doc, TidyXmlTags );
Bool wantNameAttr = cfgBool( doc, TidyAnchorAsName );
Bool mergeEmphasis = cfgBool( doc, TidyMergeEmphasis );
2011-11-17 02:44:16 +00:00
Node* node;
2015-02-19 18:14:40 +00:00
#if !defined(NDEBUG) && defined(_MSC_VER)
2015-02-24 12:20:26 +00:00
SPRTF("All nodes BEFORE clean and repair\n");
dbg_show_all_nodes( doc, &doc->root, 0 );
2015-02-19 18:14:40 +00:00
#endif
2011-11-17 02:44:16 +00:00
if (tidyXmlTags)
return tidyDocStatus( doc );
/* Issue #567 - move style elements from body to head */
TY_(CleanStyle)(doc, &doc->root);
2011-11-17 02:44:16 +00:00
/* simplifies <b><b> ... </b> ...</b> etc. */
if ( mergeEmphasis )
TY_(NestedEmphasis)( doc, &doc->root );
2011-11-17 02:44:16 +00:00
/* cleans up <dir>indented text</dir> etc. */
TY_(List2BQ)( doc, &doc->root );
TY_(BQ2Div)( doc, &doc->root );
/* replaces i by em and b by strong */
if ( logical )
TY_(EmFromI)( doc, &doc->root );
if ( word2K && TY_(IsWord2000)(doc) )
{
/* prune Word2000's <![if ...]> ... <![endif]> */
TY_(DropSections)( doc, &doc->root );
/* drop style & class attributes and empty p, span elements */
TY_(CleanWord2000)( doc, &doc->root );
TY_(DropEmptyElements)(doc, &doc->root);
}
/* replaces presentational markup by style rules */
if ( clean )
2011-11-17 02:44:16 +00:00
TY_(CleanDocument)( doc );
2012-07-21 04:37:07 +00:00
/* clean up html exported by Google Docs */
if ( gdoc )
TY_(CleanGoogleDocument)( doc );
2011-11-17 02:44:16 +00:00
/* Move terminating <br /> tags from out of paragraphs */
/*! Do we want to do this for all block-level elements? */
/* This is disabled due to http://tidy.sf.net/bug/681116 */
#if 0
FixBrakes( doc, TY_(FindBody)( doc ));
#endif
/* Reconcile http-equiv meta element with output encoding */
2017-05-14 17:08:29 +00:00
TY_(TidyMetaCharset)(doc);
2011-11-17 02:44:16 +00:00
if ( !TY_(CheckNodeIntegrity)( &doc->root ) )
TidyPanic( doc->allocator, integrity );
/* remember given doctype for reporting */
node = TY_(FindDocType)(doc);
2011-11-17 02:44:16 +00:00
if (node)
{
AttVal* fpi = TY_(GetAttrByName)(node, "PUBLIC");
if (AttrHasValue(fpi))
{
if (doc->givenDoctype)
TidyDocFree(doc, doc->givenDoctype);
doc->givenDoctype = TY_(tmbstrdup)(doc->allocator,fpi->value);
}
}
if ( doc->root.content )
{
/* If we had XHTML input but want HTML output */
if ( htmlOut && doc->lexer->isvoyager )
{
Node* node = TY_(FindDocType)(doc);
/* Remove reference, but do not free */
if (node)
TY_(RemoveNode)(node);
}
if (xhtmlOut && !htmlOut)
{
TY_(SetXHTMLDocType)(doc);
TY_(FixAnchors)(doc, &doc->root, wantNameAttr, yes);
TY_(FixXhtmlNamespace)(doc, yes);
TY_(FixLanguageInformation)(doc, &doc->root, yes, yes);
}
else
{
TY_(FixDocType)(doc);
TY_(FixAnchors)(doc, &doc->root, wantNameAttr, yes);
TY_(FixXhtmlNamespace)(doc, no);
TY_(FixLanguageInformation)(doc, &doc->root, no, yes);
}
if (tidyMark )
TY_(AddGenerator)(doc);
2011-11-17 02:44:16 +00:00
}
/* ensure presence of initial <?xml version="1.0"?> */
if ( xmlOut && xmlDecl )
TY_(FixXmlDecl)( doc );
/* At this point the apparent doctype is going to be as stable as
it can ever be, so we can start detecting things that shouldn't
be in this version of HTML
*/
if (doc->lexer)
{
/*\
* Issue #429 #426 - These services can only be used
* when there is a document loaded, ie a lexer created.
* But really should not be calling a Clean and Repair
* service with no doc!
\*/
if (doc->lexer->versionEmitted & VERS_HTML5)
TY_(CheckHTML5)( doc, &doc->root );
TY_(CheckHTMLTagsAttribsVersions)( doc, &doc->root );
if ( !doc->lexer->isvoyager && doc->xmlDetected )
{
TY_(Report)(doc, NULL, TY_(FindXmlDecl)(doc), XML_DECLARATION_DETECTED );
}
}
2015-02-19 18:14:40 +00:00
#if !defined(NDEBUG) && defined(_MSC_VER)
2015-02-24 12:20:26 +00:00
SPRTF("All nodes AFTER clean and repair\n");
dbg_show_all_nodes( doc, &doc->root, 0 );
2015-02-19 18:14:40 +00:00
#endif
2011-11-17 02:44:16 +00:00
return tidyDocStatus( doc );
}
static
Bool showBodyOnly( TidyDocImpl* doc, TidyTriState bodyOnly )
{
Node* node;
switch( bodyOnly )
{
case TidyNoState:
return no;
case TidyYesState:
return yes;
default:
node = TY_(FindBody)( doc );
if (node && node->implicit )
return yes;
}
return no;
}
int tidyDocSaveStream( TidyDocImpl* doc, StreamOut* out )
{
Bool showMarkup = cfgBool( doc, TidyShowMarkup );
Bool forceOutput = cfgBool( doc, TidyForceOutput );
#if SUPPORT_UTF16_ENCODINGS
Bool outputBOM = ( cfgAutoBool(doc, TidyOutputBOM) == TidyYesState );
Bool smartBOM = ( cfgAutoBool(doc, TidyOutputBOM) == TidyAutoState );
#endif
Bool xmlOut = cfgBool( doc, TidyXmlOut );
Bool xhtmlOut = cfgBool( doc, TidyXhtmlOut );
TidyTriState bodyOnly = cfgAutoBool( doc, TidyBodyOnly );
Bool dropComments = cfgBool(doc, TidyHideComments);
Bool makeClean = cfgBool(doc, TidyMakeClean);
Bool asciiChars = cfgBool(doc, TidyAsciiChars);
Bool makeBare = cfgBool(doc, TidyMakeBare);
Bool escapeCDATA = cfgBool(doc, TidyEscapeCdata);
2015-11-04 04:44:15 +00:00
Bool ppWithTabs = cfgBool(doc, TidyPPrintTabs);
2011-11-17 02:44:16 +00:00
TidyAttrSortStrategy sortAttrStrat = cfg(doc, TidySortAttributes);
2015-11-04 04:44:15 +00:00
if (ppWithTabs)
TY_(PPrintTabs)();
else
TY_(PPrintSpaces)();
2011-11-17 02:44:16 +00:00
if (escapeCDATA)
TY_(ConvertCDATANodes)(doc, &doc->root);
if (dropComments)
TY_(DropComments)(doc, &doc->root);
if (makeClean)
{
/* noop */
TY_(DropFontElements)(doc, &doc->root, NULL);
}
if ((makeClean && asciiChars) || makeBare)
TY_(DowngradeTypography)(doc, &doc->root);
if (makeBare)
/* Note: no longer replaces &nbsp; in */
/* attribute values / non-text tokens */
TY_(NormalizeSpaces)(doc->lexer, &doc->root);
else
TY_(ReplacePreformattedSpaces)(doc, &doc->root);
TY_(SortAttributes)(doc, &doc->root, sortAttrStrat);
2011-11-17 02:44:16 +00:00
if ( showMarkup && (doc->errors == 0 || forceOutput) )
{
#if SUPPORT_UTF16_ENCODINGS
/* Output a Byte Order Mark if required */
if ( outputBOM || (doc->inputHadBOM && smartBOM) )
TY_(outBOM)( out );
#endif
/* No longer necessary. No DOCTYPE == HTML 3.2,
** which gives you only the basic character entities,
** which are safe in any browser.
** if ( !TY_(FindDocType)(doc) )
** TY_(SetOptionBool)( doc, TidyNumEntities, yes );
*/
doc->docOut = out;
if ( xmlOut && !xhtmlOut )
TY_(PPrintXMLTree)( doc, NORMAL, 0, &doc->root );
else if ( showBodyOnly( doc, bodyOnly ) )
TY_(PrintBody)( doc );
else
TY_(PPrintTree)( doc, NORMAL, 0, &doc->root );
TY_(PFlushLine)( doc, 0 );
doc->docOut = NULL;
}
TY_(ResetConfigToSnapshot)( doc );
return tidyDocStatus( doc );
}
/* Tree traversal functions
**
** The big issue here is the degree to which we should mimic
** a DOM and/or SAX nodes.
**
** Is it 100% possible (and, if so, how difficult is it) to
2011-11-17 02:44:16 +00:00
** emit SAX events from this API? If SAX events are possible,
** is that 100% of data needed to build a DOM?
*/
TidyNode TIDY_CALL tidyGetRoot( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
Node* node = NULL;
if ( impl )
node = &impl->root;
return tidyImplToNode( node );
}
TidyNode TIDY_CALL tidyGetHtml( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
Node* node = NULL;
if ( impl )
node = TY_(FindHTML)( impl );
return tidyImplToNode( node );
}
TidyNode TIDY_CALL tidyGetHead( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
Node* node = NULL;
if ( impl )
node = TY_(FindHEAD)( impl );
return tidyImplToNode( node );
}
TidyNode TIDY_CALL tidyGetBody( TidyDoc tdoc )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
Node* node = NULL;
if ( impl )
node = TY_(FindBody)( impl );
return tidyImplToNode( node );
}
/* parent / child */
TidyNode TIDY_CALL tidyGetParent( TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
return tidyImplToNode( nimp->parent );
}
TidyNode TIDY_CALL tidyGetChild( TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
return tidyImplToNode( nimp->content );
}
/* remove a node */
TidyNode TIDY_CALL tidyDiscardElement( TidyDoc tdoc, TidyNode tnod )
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
Node* nimp = tidyNodeToImpl( tnod );
Node* next = TY_(DiscardElement)( doc, nimp );
return tidyImplToNode( next );
}
2011-11-17 02:44:16 +00:00
/* siblings */
TidyNode TIDY_CALL tidyGetNext( TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
return tidyImplToNode( nimp->next );
}
TidyNode TIDY_CALL tidyGetPrev( TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
return tidyImplToNode( nimp->prev );
}
/* Node info */
TidyNodeType TIDY_CALL tidyNodeGetType( TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
TidyNodeType ntyp = TidyNode_Root;
if ( nimp )
ntyp = (TidyNodeType) nimp->type;
return ntyp;
}
uint TIDY_CALL tidyNodeLine( TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
uint line = 0;
if ( nimp )
line = nimp->line;
return line;
}
uint TIDY_CALL tidyNodeColumn( TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
uint col = 0;
if ( nimp )
col = nimp->column;
return col;
}
ctmbstr TIDY_CALL tidyNodeGetName( TidyNode tnod )
2011-11-17 02:44:16 +00:00
{
Node* nimp = tidyNodeToImpl( tnod );
ctmbstr nnam = NULL;
if ( nimp )
nnam = nimp->element;
return nnam;
}
Bool TIDY_CALL tidyNodeHasText( TidyDoc tdoc, TidyNode tnod )
2011-11-17 02:44:16 +00:00
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
if ( doc )
return TY_(nodeHasText)( doc, tidyNodeToImpl(tnod) );
return no;
}
Bool TIDY_CALL tidyNodeGetText( TidyDoc tdoc, TidyNode tnod, TidyBuffer* outbuf )
2011-11-17 02:44:16 +00:00
{
TidyDocImpl* doc = tidyDocToImpl( tdoc );
Node* nimp = tidyNodeToImpl( tnod );
if ( doc && nimp && outbuf )
{
uint outenc = cfg( doc, TidyOutCharEncoding );
uint nl = cfg( doc, TidyNewline );
StreamOut* out = TY_(BufferOutput)( doc, outbuf, outenc, nl );
Bool xmlOut = cfgBool( doc, TidyXmlOut );
Bool xhtmlOut = cfgBool( doc, TidyXhtmlOut );
doc->docOut = out;
if ( xmlOut && !xhtmlOut )
TY_(PPrintXMLTree)( doc, NORMAL, 0, nimp );
else
TY_(PPrintTree)( doc, NORMAL, 0, nimp );
TY_(PFlushLine)( doc, 0 );
doc->docOut = NULL;
2011-11-17 02:44:16 +00:00
TidyDocFree( doc, out );
return yes;
}
return no;
}
Bool TIDY_CALL tidyNodeGetValue( TidyDoc tdoc, TidyNode tnod, TidyBuffer* buf )
{
TidyDocImpl *doc = tidyDocToImpl( tdoc );
Node *node = tidyNodeToImpl( tnod );
if ( doc == NULL || node == NULL || buf == NULL )
return no;
switch( node->type ) {
case TextNode:
case CDATATag:
case CommentTag:
case ProcInsTag:
case SectionTag:
case AspTag:
case JsteTag:
case PhpTag:
{
tidyBufClear( buf );
tidyBufAppend( buf, doc->lexer->lexbuf + node->start,
node->end - node->start );
break;
}
default:
/* The node doesn't have a value */
return no;
}
return yes;
}
Bool TIDY_CALL tidyNodeIsProp( TidyDoc ARG_UNUSED(tdoc), TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
Bool isProprietary = yes;
if ( nimp )
{
switch ( nimp->type )
{
case RootNode:
case DocTypeTag:
case CommentTag:
case XmlDecl:
case ProcInsTag:
case TextNode:
case CDATATag:
isProprietary = no;
break;
case SectionTag:
case AspTag:
case JsteTag:
case PhpTag:
isProprietary = yes;
break;
case StartTag:
case EndTag:
case StartEndTag:
isProprietary = ( nimp->tag
? (nimp->tag->versions&VERS_PROPRIETARY)!=0
: yes );
break;
}
}
return isProprietary;
}
TidyTagId TIDY_CALL tidyNodeGetId(TidyNode tnod)
{
Node* nimp = tidyNodeToImpl(tnod);
TidyTagId tagId = TidyTag_UNKNOWN;
if (nimp && nimp->tag)
tagId = nimp->tag->id;
return tagId;
}
/* Iterate over attribute values */
TidyAttr TIDY_CALL tidyAttrFirst( TidyNode tnod )
{
Node* nimp = tidyNodeToImpl( tnod );
AttVal* attval = NULL;
if ( nimp )
attval = nimp->attributes;
return tidyImplToAttr( attval );
}
TidyAttr TIDY_CALL tidyAttrNext( TidyAttr tattr )
{
AttVal* attval = tidyAttrToImpl( tattr );
AttVal* nxtval = NULL;
if ( attval )
nxtval = attval->next;
return tidyImplToAttr( nxtval );
}
ctmbstr TIDY_CALL tidyAttrName( TidyAttr tattr )
{
AttVal* attval = tidyAttrToImpl( tattr );
ctmbstr anam = NULL;
if ( attval )
anam = attval->attribute;
return anam;
}
ctmbstr TIDY_CALL tidyAttrValue( TidyAttr tattr )
{
AttVal* attval = tidyAttrToImpl( tattr );
ctmbstr aval = NULL;
if ( attval )
aval = attval->value;
return aval;
}
void TIDY_CALL tidyAttrDiscard( TidyDoc tdoc, TidyNode tnod, TidyAttr tattr )
{
TidyDocImpl* impl = tidyDocToImpl( tdoc );
Node* nimp = tidyNodeToImpl( tnod );
AttVal* attval = tidyAttrToImpl( tattr );
TY_(RemoveAttribute)( impl, nimp, attval );
}
2011-11-17 02:44:16 +00:00
TidyAttrId TIDY_CALL tidyAttrGetId( TidyAttr tattr )
{
AttVal* attval = tidyAttrToImpl( tattr );
TidyAttrId attrId = TidyAttr_UNKNOWN;
if ( attval && attval->dict )
attrId = attval->dict->id;
return attrId;
}
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
/*******************************************************************
** Message Key Management
*******************************************************************/
ctmbstr TIDY_CALL tidyErrorCodeAsKey(uint code)
{
return TY_(tidyErrorCodeAsKey)( code );
}
uint TIDY_CALL tidyErrorCodeFromKey(ctmbstr code)
{
return TY_(tidyErrorCodeFromKey)( code );
}
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
TidyIterator TIDY_CALL getErrorCodeList()
{
return TY_(getErrorCodeList)();
}
uint TIDY_CALL getNextErrorCode( TidyIterator* iter )
{
return TY_(getNextErrorCode)(iter);
}
/*******************************************************************
** Localization Support
*******************************************************************/
tmbstr TIDY_CALL tidySystemLocale(tmbstr result)
{
return TY_(tidySystemLocale)( result );
}
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
Bool TIDY_CALL tidySetLanguage( ctmbstr languageCode )
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
{
return TY_(tidySetLanguage)( languageCode );
}
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
ctmbstr TIDY_CALL tidyGetLanguage()
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
{
return TY_(tidyGetLanguage)();
}
ctmbstr TIDY_CALL tidyLocalizedStringN( uint messageType, uint quantity )
{
return TY_(tidyLocalizedStringN)( messageType, quantity);
}
ctmbstr TIDY_CALL tidyLocalizedString( uint messageType )
{
return TY_(tidyLocalizedString)( messageType );
}
ctmbstr TIDY_CALL tidyDefaultString( uint messageType )
{
return TY_(tidyDefaultString)( messageType );
}
TidyIterator TIDY_CALL getStringKeyList()
{
return TY_(getStringKeyList)();
}
uint TIDY_CALL getNextStringKey( TidyIterator* iter )
{
return TY_(getNextStringKey)( iter );
}
TidyIterator TIDY_CALL getWindowsLanguageList()
{
return TY_(getWindowsLanguageList)();
}
//#define tidyOptionToImpl( topt ) ((const TidyOptionImpl*)(topt))
//#define tidyImplToOption( option ) ((TidyOption)(option))
const tidyLocaleMapItem* TIDY_CALL getNextWindowsLanguage( TidyIterator* iter )
Several foundational changes preparing for release of 5.4 and future 5.5: - Consolidated all output string definitions enums into `tidyenum.h`, which is where they belong, and where they have proper visibility. - Re-arranged `messages.c/h` with several comments useful to developers. - Properly added the key lookup functions and the language localization functions into tidy.h/tidylib.c with proper name-spacing. - Previous point restored a *lot* of sanity to the #include pollution that's been introduced in light of these. - Note that opaque types have been (properly) introduced. Look at the updated headers for `language.h`. In particular only an opaque structure is passed outside of LibTidy, and so use TidyLangWindowsName and TidyLangPosixName to poll these objects. - Console application updated as a result of this. - Removed dead code: - void TY_(UnknownOption)( TidyDocImpl* doc, char c ); - void TY_(UnknownFile)( TidyDocImpl* doc, ctmbstr program, ctmbstr file ); - Redundant strings were removed with the removal of this dead code. - Several enums were given fixed starting values. YOUR PROGRAMS SHOULD NEVER depend on enum values. `TidyReportLevel` is an example of such. - Some enums were removed as a result of this. `TidyReportLevel` now has matching strings, so the redundant `TidyReportLevelStrings` was removed. - All of the PO's and language header files were regenerated as a result of the string cleanup and header cleanup. - Made the interface to the library version and release date consistent. - CMakeLists.txt now supports SUPPORT_CONSOLE_APP. The intention is to be able to remove console-only code from LibTidy (for LibTidy users). - Updated README/MESSAGES.md, which is *vastly* more simple now.
2017-02-13 19:29:47 +00:00
{
/* Get a real structure */
const tidyLocaleMapItemImpl *item = TY_(getNextWindowsLanguage)( iter );
/* Return it as the opaque version */
return ((tidyLocaleMapItem*)(item));
}
const ctmbstr TIDY_CALL TidyLangWindowsName( const tidyLocaleMapItem *item )
{
return TY_(TidyLangWindowsName)( (tidyLocaleMapItemImpl*)(item) );
}
const ctmbstr TIDY_CALL TidyLangPosixName( const tidyLocaleMapItem *item )
{
return TY_(TidyLangPosixName)( (tidyLocaleMapItemImpl*)(item) );
}
TidyIterator TIDY_CALL getInstalledLanguageList()
{
return TY_(getInstalledLanguageList)();
}
ctmbstr TIDY_CALL getNextInstalledLanguage( TidyIterator* iter )
{
return TY_(getNextInstalledLanguage)( iter );
}
2011-11-17 02:44:16 +00:00
/*
* local variables:
* mode: c
* indent-tabs-mode: nil
* c-basic-offset: 4
* eval: (c-set-offset 'substatement-open 0)
* end:
*/