7.8 KiB
Tidy Regression Testing Specification
Background
HTML Tidy uses regression testing as its main means of quality control when implementing new features and fixing bugs. HTML Tidy has been in constant development since before unit testing and automated testing were in wide use, and has proven effective in guiding the development of Tidy.
This repository is the regression testing tool used by Tidy for both continuous integration, and for development work, and consists for test cases split into multiple set sets, as well as tools for automating testing.
Testing consists of automatically running some version (of your choice) of HTML Tidy on various operating systems and architectures against the suite of test cases, and comparing the Tidy and report output against known, “good” versions thereof.
This testing process ensures that:
-
No regressions occur as a result of the changes you make to HTML Tidy. Everything that has passed in the past should continue to pass, regardless of your changes. Changing test expectations for existing test cases must result in a discussion on the pull request discussion thread, otherwise regressions are prima facie cause for rejecting your pull request.
-
Although touted as a “regression test,” code changes should also be furnished with a test case that demonstrates the issue being corrected or the feature being added. Logically you are already informally using one or most test cases during your development of the patch; this simply formalizes the requirement for HTML Tidy, and makes it much easier for the maintainers to understand the impact of your proposed change.
Additionally, when introducing new features or fixing bugs, new test cases should be written to demonstrate that the fix works against the test case.
About the Test Tool (test.rb)
The test.rb
tool replaces the previous Windows shell and Bash testing
scripts. This start-from-scratch approach is intended to provide a single
script that’s platform agnostic, for the primary purpose of enabling
automated testing, but with strong support for use as a manual tool during
HTML Tidy development.
Ruby was chosen as the scripting language of choice because it is available on every platform, is easy to read (even if you're not a Ruby programmer), and is supported by the major continuous integration testing providers, such as Github.
We recognize that some developers have scripting environment preferences,
and as such, please feel free write wrappers around test.rb
as needed in
order to suit your prefences. If additional CLI API is needed to enable your
scripting environment wrapper, please feel free to request such.
Building Tidy, and Tidy Versions
The testing tool works by executing tidy
(or tidy.exe
, referred to only
as tidy
continuing) on your platform. Naturally, you don’t want to conduct
testing using the normal, installed version of tidy
, but rather version(s)
that you’ve built for testing.
By default, the tidy
used will be in the standard build folder of the
tidy-html5
directory that is a sibling to this tidy-html5-tests
directory. The complete relative path from test.rb
, then, is:
../tidy-html5/build/cmake/tidy[.exe]
This makes it convenient when performing testing on both repositories when they’re checked out. However you can also specify another build of HTML Tidy as an optional argument, too.
Static Build Considerations
By default, HTML Tidy is built as a console application statically linked to LibTidy. Although the option to link against a dylib or dll exist when building, it’s suggested that you no longer do so, because you might put yourself into a situation where you’re testing multiple command line executables that are all linked to the same dynamic library!
Although not formally deprecated, you should consider dynamic linking deprecated and treat it that way. In a world where entire Java Runtime Environments are shipped per program, the benefits of dynamic linking no longer exist on any modern computer or operating system. In some cases, modern security hardening even prevents dynamic linking, and we’re likely to see such restrictions become more common in the future.
Running Test Tests
Preparing the Environment
Assuming that you have a working Ruby interpreter, version 2.7 or so, upon
CD
-ing into the tidy-html5-test
directory, you should execute bundle install
, which ensures that any dependencies that your environment doesn’t
already have will be downloaded.
Executing the Program
In Windows shell and powershell, simply typing
test
will run the tool. Usually. Probably. If not, try ruby test.rb
in case
your environment is not configured to work directly.
Unix and Unix-like operating systems (including WSL and other Unix-like environments for Windows) can run the program like such:
test.rb
Testing
When used without any arguments, help will be provided. In general, though, you can do the following:
Command | Effect |
---|---|
./test.rb test |
Tests all cases in all test sets. |
./test.rb only <setname> |
Tests only in the given test set. |
./test.rb case <case_number> |
Tests only on a single case. |
Input Specification
Test Sets
“Test sets” are groups of individual tests that are thematically related,
such as accessibility checks, XML-specific tests, historical tests, etc.
Each set of cases consists of directories and a text file within the cases/
directory. Each test set shall consist of the following directories/files, where
setname
indicates the name of the testing set, e.g., testbase
(our default
set of case files).
-
setname/
, which contains the HTML files to tidy, and an optional configuration file for each case.-
Test files shall have the format
case-basename@n<.html|.xml|.xhtml>
, wherennn
represents the test case name, and the@n
metadata represents the required shell exit status code that HTML Tidy should produce after running the test case. The case name cannot contain hyphens or the@
symbol, and should represent something meaningful such a a Github issue number. -
Optional Tidy configuration files shall be named
case-basename.conf
. -
In the absence of a configuration file, the file
config_default.conf
in each directory will be used instead. -
README<.txt|.md>
, which describes the test set.
-
-
setname-expects/
, which contains the expected output from HTML Tidy.- Files in the format
case-nnn<.html|.xml|.xhtml>
represent the expected HTML file as generated by Tidy. - Files in the format
case-nnn.txt
represent the expected warning/error output from Tidy.
- Files in the format
Example
cases/
testbase/
config_default.cong
case-427821.html
case-427821.conf
testbase-expects/
case-427821.html
case-427821.txt
Output Specification
The output specification is written such that it makes it trivial to easily
diff
a setname-expects
directory with the output of a test in order
to check for differences.
Test results consist of Tidy's HTML output and Tidy's warning/error output.
Each set of results consists of directories within the cases/
directory.
setname-results
contains Tidy's HTML and warning/error output.- Files in the format
case-nnn.html
are the HTML file generated by Tidy. - Files in the format
case-nnn.txt
are the warning/error output from Tidy.
- Files in the format
Example
cases/
testbase-results/
case-427821.html
case-427821.txt