tidy-html5/src/tidy-int.h

181 lines
6.8 KiB
C
Raw Normal View History

2011-11-17 02:44:16 +00:00
#ifndef __TIDY_INT_H__
#define __TIDY_INT_H__
/* tidy-int.h -- internal library declarations
(c) 1998-2007 (W3C) MIT, ERCIM, Keio University
See tidy.h for the copyright notice.
*/
#include "tidy.h"
#include "config.h"
#include "lexer.h"
#include "tags.h"
#include "attrs.h"
#include "pprint.h"
#include "access.h"
#include "message.h"
2011-11-17 02:44:16 +00:00
#ifndef MAX
#define MAX(a,b) (((a) > (b))?(a):(b))
#endif
#ifndef MIN
#define MIN(a,b) (((a) < (b))?(a):(b))
#endif
/*\
* Issue #166 - repeated <main> element
* Change the previous on/off uint flag badForm
* to a BIT flag to support other than <form>
* errors. This could be extended more...
\*/
#define flg_BadForm 0x00000001
#define flg_BadMain 0x00000002
2011-11-17 02:44:16 +00:00
struct _TidyDocImpl
{
/* The Document Tree (and backing store buffer) */
Node root; /* This MUST remain the first declared
variable in this structure */
Lexer* lexer;
/* Config + Markup Declarations */
TidyConfigImpl config;
TidyTagImpl tags;
TidyAttribImpl attribs;
TidyAccessImpl access;
TidyMutedMessages muted;
2011-11-17 02:44:16 +00:00
/* The Pretty Print buffer */
TidyPrintImpl pprint;
/* I/O */
StreamIn* docIn;
StreamOut* docOut;
StreamOut* errout;
TidyReportFilter reportFilter;
TidyReportCallback reportCallback;
TidyMessageCallback messageCallback;
TidyOptCallback pOptCallback;
TidyConfigCallback pConfigCallback;
TidyConfigChangeCallback pConfigChangeCallback;
TidyPPProgress progressCallback;
2011-11-17 02:44:16 +00:00
/* Parse + Repair Results */
uint optionErrors;
uint errors;
uint warnings;
uint accessErrors;
uint infoMessages;
uint docErrors;
int parseStatus;
uint badAccess; /* for accessibility errors */
uint badLayout; /* for bad style errors */
uint badChars; /* for bad char encodings */
uint badForm; /* bit field, for badly placed form tags, or other format errors */
uint footnotes; /* bit field, for other footnotes, until formalized */
2011-11-17 02:44:16 +00:00
Bool HTML5Mode; /* current mode is html5 */
Bool xmlDetected; /* true if XML was used/detected */
2011-11-17 02:44:16 +00:00
/* Memory allocator */
TidyAllocator* allocator;
/* Miscellaneous */
void* appData;
uint nClassId;
Bool inputHadBOM;
#if PRESERVE_FILE_TIMES
struct utimbuf filetimes;
#endif
tmbstr givenDoctype;
};
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
/** The basic struct for communicating a message within LibTidy. All of the
** relevant information pertaining to a message can be retrieved with the
** accessor functions and one of these records.
*/
struct _TidyMessageImpl
{
TidyDocImpl *tidyDoc; /* document instance this message is attributed to */
Node *tidyNode; /* the node reporting the message, if applicable */
uint code; /* the message code */
int line; /* the line message applies to */
int column; /* the column the message applies to */
TidyReportLevel level; /* the severity level of the message */
Bool allowMessage; /* indicates whether or not a filter rejected a message */
Bool muted; /* indicates whether or not a configuration mutes this message */
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
int argcount; /* the number of arguments */
struct printfArg* arguments; /* the arguments' values and types */
ctmbstr messageKey; /* the message code as a key string */
ctmbstr messageFormatDefault; /* the built-in format string */
ctmbstr messageFormat; /* the localized format string */
tmbstr messageDefault; /* the message, formatted, default language */
tmbstr message; /* the message, formatted, localized */
tmbstr messagePosDefault; /* the position part, default language */
tmbstr messagePos; /* the position part, localized */
ctmbstr messagePrefixDefault; /* the prefix part, default language */
ctmbstr messagePrefix; /* the prefix part, localized */
tmbstr messageOutputDefault; /* the complete string Tidy would output */
tmbstr messageOutput; /* the complete string, localized */
};
2011-11-17 02:44:16 +00:00
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
#define tidyDocToImpl( tdoc ) ((TidyDocImpl*)(tdoc))
#define tidyImplToDoc( doc ) ((TidyDoc)(doc))
#define tidyMessageToImpl( tmessage ) ((TidyMessageImpl*)(tmessage))
#define tidyImplToMessage( message ) ((TidyMessage)(message))
2011-11-17 02:44:16 +00:00
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
#define tidyNodeToImpl( tnod ) ((Node*)(tnod))
#define tidyImplToNode( node ) ((TidyNode)(node))
2011-11-17 02:44:16 +00:00
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
#define tidyAttrToImpl( tattr ) ((AttVal*)(tattr))
#define tidyImplToAttr( attval ) ((TidyAttr)(attval))
2011-11-17 02:44:16 +00:00
Massive Revamp of the Messaging System This is a rather large refactoring of Tidy's messaging system. This was done mostly to allow non-C libraries that cannot adequately take advantage of arg_lists a chance to query report filter information for information related to arguments used in constructing an error message. Three main goals were in mind for this project: - Don't change the contents of Tidy's existing output sinks. This will ensure that changes do no affect console Tidy users, or LibTidy users who use the output sinks directly. This was accomplished 100% other than some improved cosmetics in the output. See tidy-html5-tests repository, the `refactor` and `more_messages_changes` branches for these minor diffs. - Provide an API that is simple and also extensible without having to write new error filters all the time. This was accomplished by adding the new message callback `TidyMessageCallback` that provides callback functions an opaque object representing the message, and an API to query the message for wanted details. With this, we should never have to add a new callback routine again, as additional API can simply be written against the opaque object. - The API should work the same as the rest of LibTidy's API in that it's consistent and only uses simple types with wide interoperability with other languages. Thanks to @gagern who suggested the model for the API in #409. Although the API uses the "Tidy" way off accessing data via an iterator rather than an index, this can be easily abstracted in the target language. There are two *major* API breaking changes: - Removed TidyReportFilter2 - This was only used by one application in the entire world, and was a hacky kludge that served its purpose. TidyReportCallback (né TidyReportFilter3) is much better. If, for some reason, this affects you, I recommend using TidyReportCallback instead. It's a minor change for your application. - Renamed TidyReportFilter3 to TidyReportCallback - This name is much more semantic, and much more sensible in light of improved callback system. As the name implies, it remains capable of *only* receiving callbacks for Tidy "reports." Introducing TidyMessageCallback, and a new message interrogation API. - As its name implies, it is able to capture (and optionally suppress) *all* of Tidy's output, including the dialogue messages that never make it to the existing report filters. - Provides an opaque `TidyMessage` and an API that can be used to query against it to find the juicy goodness inside. - For example, `tidyGetMessageOutput( tmessage )` will return the complete, localized message. - Another example, `tidyGetMessageLine( tmessage )` will return the line the message applies to. - You can also get information about the individual arguments that make up a message. By using the `tidyGetMessageArguments( tmessage )` itorator and `tidyGetNextMessageArgument` you will obtain an opaque `TidyMessageArgument` which has its own interrogation API. For example: - tidyGetArgType( tmessage, &iterator ); - tidyGetArgFormat( tmessage, &iterator ); - tidyGetArgValueString( tmessage, &iterator ); - …and so on. Other major changes include refactoring `messages.c` to use the new message "object" directly when emitting messages to the console or output sinks. This allowed replacement of a lot of specialized functions with generalized ones. Some of this generalizing involved modifications to the `language_xx.h` header files, and these are all positive improvements even without the above changes.
2017-03-13 17:28:57 +00:00
#define tidyOptionToImpl( topt ) ((const TidyOptionImpl*)(topt))
#define tidyImplToOption( option ) ((TidyOption)(option))
2011-11-17 02:44:16 +00:00
/** Wrappers for easy memory allocation using the document's allocator */
#define TidyDocAlloc(doc, size) TidyAlloc((doc)->allocator, size)
#define TidyDocRealloc(doc, block, size) TidyRealloc((doc)->allocator, block, size)
#define TidyDocFree(doc, block) TidyFree((doc)->allocator, block)
#define TidyDocPanic(doc, msg) TidyPanic((doc)->allocator, msg)
TY_PRIVATE int TY_(DocParseStream)( TidyDocImpl* impl, StreamIn* in );
2011-11-17 02:44:16 +00:00
/*
[i_a] generic node tree traversal code; used in several spots.
Define your own callback, which returns one of the NodeTraversalSignal values
to instruct the tree traversal routine TraverseNodeTree() what to do.
Pass custom data to/from the callback using the 'propagate' reference.
*/
typedef enum
{
ContinueTraversal, /* visit siblings and children */
SkipChildren, /* visit siblings of this node; ignore its children */
SkipSiblings, /* ignore subsequent siblings of this node; ignore their children; traverse */
SkipChildrenAndSiblings, /* visit siblings of this node; ignore its children */
VisitParent, /* REVERSE traversal: visit the parent of the current node */
ExitTraversal /* terminate traversal on the spot */
} NodeTraversalSignal;
typedef NodeTraversalSignal NodeTraversalCallBack(TidyDocImpl* doc, Node* node, void *propagate);
TY_PRIVATE NodeTraversalSignal TY_(TraverseNodeTree)(TidyDocImpl* doc, Node* node, NodeTraversalCallBack *cb, void *propagate);
2011-11-17 02:44:16 +00:00
#endif /* __TIDY_INT_H__ */