tidy-html5/regression_testing/README.md

Tidy Regression Testing Specification
=====================================

Background
----------
HTML Tidy uses regression testing as its main means of quality control when 
implementing new features and fixing bugs. HTML Tidy has been in constant 
development since before unit testing and automated testing were in wide use,
and has proven effective in guiding the development of Tidy.

This repository is the regression testing tool used by Tidy for both 
continuous integration, and for development work, and consists for test 
cases split into multiple set sets, as well as tools for automating testing.

Testing consists of automatically running some version (of your choice) of HTML 
Tidy on various operating systems and architectures against the suite of 
test cases, and comparing the Tidy and report output against known, “good” 
versions thereof.

This testing process ensures that:

- No regressions occur as a result of the changes you make to HTML Tidy. 
  Everything that has passed in the past should continue to pass, regardless 
  of your changes. Changing test expectations for existing test cases must 
  result in a discussion on the pull request discussion thread, otherwise 
  regressions are _prima facie_ cause for rejecting your pull request.

- Although touted as a “regression test,” code changes should also be 
  furnished with a test case that demonstrates the issue being corrected or 
  the feature being added. Logically you are already informally using one or 
  most test cases during your development of the patch; this simply 
  formalizes the requirement for HTML Tidy, and makes it much easier for the 
  maintainers to understand the impact of your proposed change.

Additionally, when introducing new features or fixing bugs,
new test cases should be written to demonstrate that the fix works against 
the test case.


About the Test Tool (test.rb)
-----------------------------
The `test.rb` tool replaces the previous Windows shell and Bash testing 
scripts. This start-from-scratch approach is intended to provide a single 
script that’s platform agnostic, for the primary purpose of enabling 
automated testing, but with strong support for use as a manual tool during 
HTML Tidy development.

Ruby was chosen as the scripting language of choice because it is available 
on every platform, is easy to read (even if you're not a Ruby programmer), 
and is supported by the major continuous integration testing providers, such 
as Github.

We recognize that some developers have scripting environment preferences, 
and as such, please feel free write wrappers around `test.rb` as needed in 
order to suit your prefences. If additional CLI API is needed to enable your 
scripting environment wrapper, please feel free to request such.


Building Tidy, and Tidy Versions
--------------------------------
The testing tool works by executing `tidy` (or `tidy.exe`, referred to only 
as `tidy` continuing) on your platform. Naturally, you don’t want to conduct 
testing using the normal, installed version of `tidy`, but rather version(s)
that you’ve built for testing.

By default, the `tidy` used will be in the standard build folder of the 
`tidy-html5` directory that is a sibling to this `tidy-html5-tests` 
directory. The complete relative path from `test.rb`, then, is:

```
../tidy-html5/build/cmake/tidy[.exe]
```

This makes it convenient when performing testing on both repositories when 
they’re checked out. However you can also specify another build of HTML Tidy 
as an optional argument, too.


Static Build Considerations
---------------------------
By default, HTML Tidy is built as a console application statically linked to 
LibTidy. Although the option to link against a dylib or dll exist when 
building, it’s suggested that you no longer do so, because you might put 
yourself into a situation where you’re testing multiple command line 
executables that are all linked to the same dynamic library!

Although not formally deprecated, you should consider dynamic linking 
deprecated and treat it that way. In a world where entire Java Runtime 
Environments are shipped _per program_, the benefits of dynamic linking no 
longer exist on any modern computer or operating system. In some cases, 
modern security hardening even prevents dynamic linking, and we’re likely to 
see such restrictions become more common in the future.


Running Test Tests
------------------

### Preparing the Environment

Assuming that you have a working Ruby interpreter, version 2.7 or so, upon 
`CD`-ing into the `tidy-html5-test` directory, you should execute `bundle 
install`, which ensures that any dependencies that your environment doesn’t 
already have will be downloaded.

### Executing the Program

In Windows shell and powershell, simply typing

~~~
test 
~~~

will run the tool. Usually. Probably. If not, try `ruby test.rb` in case 
your environment is not configured to work directly.

Unix and Unix-like operating systems (including WSL and other Unix-like 
environments for Windows) can run the program like such:

~~~
test.rb
~~~ 

### Testing
When used without any arguments, help will be provided. In general, though, 
you can do the following:

| Command                        | Effect                            |
|--------------------------------|-----------------------------------|
| `./test.rb test`               | Tests all cases in all test sets. |
| `./test.rb only <setname>`     | Tests only in the given test set. |
| `./test.rb case <case_number>` | Tests only on a single case.      |


Input Specification
-------------------

### Test Sets

“Test sets” are groups of individual tests that are thematically related,
such as accessibility checks, XML-specific tests, historical tests, etc.
Each set of cases consists of directories and a text file within the `cases/`
directory. Each test set shall consist of the following directories/files, where
`setname` indicates the name of the testing set, e.g., `testbase` (our default
set of case files).

- `setname/`, which contains the HTML files to tidy, and an optional
  configuration file for each case.

  - Test files shall have the format `case-basename@n<.html|.xml|.xhtml>`, 
    where `nnn` represents the test case name, and the `@n` metadata 
    represents the required shell exit status code that HTML Tidy should 
    produce after running the test case. The case name cannot contain 
    hyphens or the `@` symbol, and should represent something meaningful 
    such a a Github issue number.

  - Optional Tidy configuration files shall be named `case-basename.conf`.

  - In the absence of a configuration file, the file `config_default.conf` in
    each directory will be used instead.

  - `README<.txt|.md>`, which describes the test set.

- `setname-expects/`, which contains the expected output from HTML Tidy.
  - Files in the format `case-nnn<.html|.xml|.xhtml>` represent the expected 
    HTML file as generated by Tidy.
  - Files in the format `case-nnn.txt` represent the expected warning/error
    output from Tidy.
      
#### Example

```
cases/
   testbase/
      config_default.cong
      case-427821.html
      case-427821.conf
   testbase-expects/
      case-427821.html
      case-427821.txt
```


Output Specification
--------------------

The output specification is written such that it makes it trivial to easily
`diff` a `setname-expects` directory with the output of a test in order
to check for differences.

Test results consist of Tidy's HTML output and Tidy's warning/error output.

Each set of results consists of directories within the `cases/` directory.

- `setname-results` contains Tidy's HTML and warning/error output.
  - Files in the format `case-nnn.html` are the HTML file generated by Tidy.
  - Files in the format `case-nnn.txt` are the warning/error output from Tidy.

### Example

~~~
cases/
   testbase-results/
      case-427821.html
      case-427821.txt
~~~