tidy-html5/regression_testing
2021-08-17 07:24:53 -04:00
..
cases The XML Parser and XML Pretty Printer are now non-recursive. 2021-08-17 07:24:53 -04:00
Gemfile Move the testing repository back into the Tidy codebase. 2021-05-21 10:43:35 -04:00
Gemfile.lock Move the testing repository back into the Tidy codebase. 2021-05-21 10:43:35 -04:00
README.md Spelling fixes, thanks to @jschleus. 2021-07-21 15:50:53 -04:00
test.rb Spelling fixes, thanks to @jschleus. 2021-07-21 15:50:53 -04:00

Tidy Regression Testing Specification

Background

HTML Tidy uses regression testing as its main means of quality control when implementing new features and fixing bugs. HTML Tidy has been in constant development since before unit testing and automated testing were in wide use, and has proven effective in guiding the development of Tidy.

This repository is the regression testing tool used by Tidy for both continuous integration, and for development work, and consists for test cases split into multiple set sets, as well as tools for automating testing.

Testing consists of automatically running some version (of your choice) of HTML Tidy on various operating systems and architectures against the suite of test cases, and comparing the Tidy and report output against known, “good” versions thereof.

This testing process ensures that:

  • No regressions occur as a result of the changes you make to HTML Tidy. Everything that has passed in the past should continue to pass, regardless of your changes. Changing test expectations for existing test cases must result in a discussion on the pull request discussion thread, otherwise regressions are prima facie cause for rejecting your pull request.

  • Although touted as a “regression test,” code changes should also be furnished with a test case that demonstrates the issue being corrected or the feature being added. Logically you are already informally using one or most test cases during your development of the patch; this simply formalizes the requirement for HTML Tidy, and makes it much easier for the maintainers to understand the impact of your proposed change.

Additionally, when introducing new features or fixing bugs, new test cases should be written to demonstrate that the fix works against the test case.

About the Test Tool (test.rb)

The test.rb tool replaces the previous Windows shell and Bash testing scripts. This start-from-scratch approach is intended to provide a single script thats platform agnostic, for the primary purpose of enabling automated testing, but with strong support for use as a manual tool during HTML Tidy development.

Ruby was chosen as the scripting language of choice because it is available on every platform, is easy to read (even if you're not a Ruby programmer), and is supported by the major continuous integration testing providers, such as Github.

We recognize that some developers have scripting environment preferences, and as such, please feel free write wrappers around test.rb as needed in order to suit your prefences. If additional CLI API is needed to enable your scripting environment wrapper, please feel free to request such.

Building Tidy, and Tidy Versions

The testing tool works by executing tidy (or tidy.exe, referred to only as tidy continuing) on your platform. Naturally, you dont want to conduct testing using the normal, installed version of tidy, but rather version(s) that youve built for testing.

By default, the tidy used will be in the standard build folder of the tidy-html5 directory that is a sibling to this tidy-html5-tests directory. The complete relative path from test.rb, then, is:

../tidy-html5/build/cmake/tidy[.exe]

This makes it convenient when performing testing on both repositories when theyre checked out. However you can also specify another build of HTML Tidy as an optional argument, too.

Static Build Considerations

By default, HTML Tidy is built as a console application statically linked to LibTidy. Although the option to link against a dylib or dll exist when building, its suggested that you no longer do so, because you might put yourself into a situation where youre testing multiple command line executables that are all linked to the same dynamic library!

Although not formally deprecated, you should consider dynamic linking deprecated and treat it that way. In a world where entire Java Runtime Environments are shipped per program, the benefits of dynamic linking no longer exist on any modern computer or operating system. In some cases, modern security hardening even prevents dynamic linking, and were likely to see such restrictions become more common in the future.

Running Test Tests

Preparing the Environment

Assuming that you have a working Ruby interpreter, version 2.7 or so, upon CD-ing into the tidy-html5-test directory, you should execute bundle install, which ensures that any dependencies that your environment doesnt already have will be downloaded.

Executing the Program

In Windows shell and powershell, simply typing

test 

will run the tool. Usually. Probably. If not, try ruby test.rb in case your environment is not configured to work directly.

Unix and Unix-like operating systems (including WSL and other Unix-like environments for Windows) can run the program like such:

test.rb

Testing

When used without any arguments, help will be provided. In general, though, you can do the following:

Command Effect
./test.rb test Tests all cases in all test sets.
./test.rb only <setname> Tests only in the given test set.
./test.rb case <case_number> Tests only on a single case.

Input Specification

Test Sets

“Test sets” are groups of individual tests that are thematically related, such as accessibility checks, XML-specific tests, historical tests, etc. Each set of cases consists of directories and a text file within the cases/ directory. Each test set shall consist of the following directories/files, where setname indicates the name of the testing set, e.g., testbase (our default set of case files).

  • setname/, which contains the HTML files to tidy, and an optional configuration file for each case.

    • Test files shall have the format case-basename@n<.html|.xml|.xhtml>, where nnn represents the test case name, and the @n metadata represents the required shell exit status code that HTML Tidy should produce after running the test case. The case name cannot contain hyphens or the @ symbol, and should represent something meaningful such a a Github issue number.

    • Optional Tidy configuration files shall be named case-basename.conf.

    • In the absence of a configuration file, the file config_default.conf in each directory will be used instead.

    • README<.txt|.md>, which describes the test set.

  • setname-expects/, which contains the expected output from HTML Tidy.

    • Files in the format case-nnn<.html|.xml|.xhtml> represent the expected HTML file as generated by Tidy.
    • Files in the format case-nnn.txt represent the expected warning/error output from Tidy.

Example

cases/
   testbase/
      config_default.cong
      case-427821.html
      case-427821.conf
   testbase-expects/
      case-427821.html
      case-427821.txt

Output Specification

The output specification is written such that it makes it trivial to easily diff a setname-expects directory with the output of a test in order to check for differences.

Test results consist of Tidy's HTML output and Tidy's warning/error output.

Each set of results consists of directories within the cases/ directory.

  • setname-results contains Tidy's HTML and warning/error output.
    • Files in the format case-nnn.html are the HTML file generated by Tidy.
    • Files in the format case-nnn.txt are the warning/error output from Tidy.

Example

cases/
   testbase-results/
      case-427821.html
      case-427821.txt