Reference Tests

The referencetest module provides support for unit tests, allowing them to easily compare test results against saved “known to be correct” reference results.

This is typically useful for testing software that produces any of the following types of output:

  • a CSV file
  • a text file (for example: HTML, JSON, logfiles, graphs, tables, etc)
  • a string
  • a Pandas DataFrame.

The main features are:

  • If the comparison between a string and a file fails, the actual string is written to a file and a diff command is suggested for seeing the differences between the actual output and the expected output.

  • There is support for CSV files, allowing fine control over how the comparison is to be performed. This includes:

    • the ability to select which columns to compare (and which to exclude from the comparison).
    • the ability to compare metadata (types of fields) as well as values.
    • the ability to specify the precision (as number of decimal places) for the comparison of floating-point values.
    • clear reporting of where the differences are, if the comparison fails.
  • There is support for ignoring lines within the strings/files that contain particular patterns or regular expressions. This is typically useful for filtering out things like version numbers and timestamps that vary in the output from run to run, but which do not indicate a problem.

  • There is support for re-writing the reference output with the actual output. This, obviously, should be used only after careful checking that the new output is correct, either because the previous output was in fact wrong, or because the intended behaviour has changed.

  • It allows you to group your reference results into different kinds. This means you can keep different kinds of reference result files in different locations. It also means that you can selectively choose to only regenerate particular kinds of reference results, if they need to be updated because they turned out to have been wrong or if the intended behaviour has changed. Kinds are strings.

Prerequisites

These can be installed with:

pip install pandas
pip install pytest

The module provides interfaces for this to be called from unit-tests based on either the standard Python unittest framework, or on pytest.

Simple Examples

Simple unittest example:

For use with unittest, the ReferenceTest API is provided through the ReferenceTestCase class. This is an extension to the standard unittest.TestCase class, so that the ReferenceTest methods can be called directly from unittest tests.

This example shows how to write a test for a function that generates a CSV file:

from tdda.referencetest import ReferenceTestCase, tag
import my_module

class MyTest(ReferenceTestCase):
    @tag
    def test_my_csv_file(self):
        result = my_module.produce_a_csv_file(self.tmp_dir)
        self.assertCSVFileCorrect(result, 'result.csv')

MyTest.set_default_data_location('testdata')

if __name__ == '__main__':
    ReferenceTestCase.main()

To run the test:

python mytest.py

The test is tagged with @tag, meaning that it will be included if you run the tests with the --tagged option flag to specify that only tagged tests should be run:

python mytest.py --tagged

The first time you run the test, it will produce an error unless you have already created the expected (“reference”) results. You can create the reference results automatically

python mytest.py --write-all

Having generated the reference results, you should carefully examine the files it has produced in the data output location, to check that they are as expected.

Simple pytest example:

For use with pytest, the ReferenceTest API is provided through the referencepytest module. This is a module that can be imported directly from pytest tests, allowing them to access ReferenceTest methods and properties.

This example shows how to write a test for a function that generates a CSV file:

from tdda.referencetest import referencepytest, tag
import my_module

@tag
def test_my_csv_function(ref):
    resultfile = my_module.produce_a_csv_file(ref.tmp_dir)
    ref.assertCSVFileCorrect(resultfile, 'result.csv')

referencepytest.set_default_data_location('testdata')

You also need a conftest.py file, to define the fixtures and defaults:

import pytest
from tdda.referencetest import referencepytest

def pytest_addoption(parser):
    referencepytest.addoption(parser)

def pytest_collection_modifyitems(session, config, items):
    referencepytest.tagged(config, items)

@pytest.fixture(scope='module')
def ref(request):
    return referencepytest.ref(request)

referencepytest.set_default_data_location('testdata')

To run the test:

pytest

The test is tagged with @tag, meaning that it will be included if you run the tests with the --tagged option flag to specify that only tagged tests should be run:

pytest --tagged

The first time you run the test, it will produce an error unless you have already created the expected (“reference”) results. You can create the reference results automatically:

pytest --write-all -s

Having generated the reference results, you should examine the files it has produced in the data output location, to check that they are as expected.

Methods and Functions

class tdda.referencetest.referencetest.ReferenceTest(assert_fn)

The ReferenceTest class provides support for comparing results against a set of reference “known to be correct” results.

The functionality provided by this class can be used with:

  • the standard Python unittest framework, using the ReferenceTestCase class. This is a subclass of, and therefore a drop-in replacement for, unittest.TestCase. It extends that class with all of the methods from the ReferenceTest class.
  • the pytest framework, using the referencepytest module. This module provides all of the methods from the ReferenceTest class, exposed as functions that can be called directly from tests in a pytest suite.

In addition to the various test-assertion methods, the module also provides some useful instance variables. All of these can be set explicitly in test setup code, using the set_defaults() class method:

tmp_dir
The location where temporary files can be written to. It defaults to a system-specific temporary area.
print_fn
The function to use to display information while running tests, which should have the same signature as Python3’s standard print function (the __future__ print function in Python2).
verbose
Boolean verbose flag, to control reporting of errors while running tests. Reference tests tend to take longer to run than traditional unit tests, so it is often useful to be able to see information from failing tests as they happen, rather than waiting for the full report at the end.
all_fields_except(exclusions)

Helper function, for using with check_data, check_types and check_order parameters to assertion functions for Pandas DataFrames. It returns the names of all of the fields in the DataFrame being checked, apart from the ones given.

exclusions is a list of field names.

assertBinaryFileCorrect(actual_path, ref_path, kind=None)

Check that a binary file matches the contents from a reference binary file.

actual_path:
A path for a binary file.
ref_path:
The name of the reference binary file. The location of the reference file is determined by the configuration via set_data_location().
kind:
The reference kind, used to locate the reference file.
assertCSVFileCorrect(actual_path, ref_csv, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, **kwargs)

Check that a CSV file matches a reference one.

actual_path:
Actual CSV file.
ref_csv:
Name of reference CSV file. The location of the reference file is determined by the configuration via set_data_location().
kind:
(Optional) reference kind (a string; see above), used to locate the reference CSV file.
csv_read_fn:

(Optional) function to read a CSV file to obtain a pandas DataFrame. If None, then a default CSV loader is used.

The default CSV loader function is a wrapper around Pandas pd.read_csv(), with default options as follows:

  • index_col is None
  • infer_datetime_format is True
  • quotechar is "
  • quoting is csv.QUOTE_MINIMAL
  • escapechar is \ (backslash)
  • na_values are the empty string, "NaN", and "NULL"
  • keep_default_na is False
check_data:

(Optional) restriction of fields whose values should be compared. Possible values are:

  • None or True (to apply the comparison to all fields; this is the default).
  • False (to skip the comparison completely)
  • a list of field names (to check only these fields)
  • a function taking a DataFrame as its single parameter, and returning a list of field names to check.
check_types:
(Optional) restriction of fields whose types should be compared. See check_data (above) for possible values.
check_order:
(Optional) restriction of fields whose (relative) order should be compared. See check_data (above) for possible values.
check_extra_cols:
(Optional) restriction of extra fields in the actual dataset which, if found, will cause the check to fail. See check_data (above) for possible values.
sortby:

(Optional) specification of fields to sort by before comparing.

  • None or False (do not sort; this is the default)
  • True (to sort on all fields based on their order in the reference datasets; you probably don’t want to use this option)
  • a list of field names (to sort on these fields, in order)
  • a function taking a DataFrame (which will be the reference data frame) as its single parameter, and returning a list of field names to sort on.
condition:
(Optional) filter to be applied to datasets before comparing. It can be None, or can be a function that takes a DataFrame as its single parameter and returns a vector of booleans (to specify which rows should be compared).
precision:
(Optional) number of decimal places to use for floating-point comparisons. Default is not to perform rounding.
**kwargs:
Any additional named parameters are passed straight through to the csv_read_fn function.

Raises NotImplementedError if Pandas is not available.

assertCSVFilesCorrect(actual_paths, ref_csvs, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, **kwargs)

Check that a set of CSV files match corresponding reference ones.

actual_paths:
List of actual CSV files.
ref_csvs:
List of names of matching reference CSV files. The location of the reference files is determined by the configuration via set_data_location().
kind:
(Optional) reference kind (a string; see above), used to locate the reference CSV file.
csv_read_fn:

(Optional) function to read a CSV file to obtain a pandas DataFrame. If None, then a default CSV loader is used.

The default CSV loader function is a wrapper around Pandas pd.read_csv(), with default options as follows:

  • index_col is None
  • infer_datetime_format is True
  • quotechar is "
  • quoting is csv.QUOTE_MINIMAL
  • escapechar is \ (backslash)
  • na_values are the empty string, "NaN", and "NULL"
  • keep_default_na is False
check_data:

(Optional) restriction of fields whose values should be compared. Possible values are:

  • None or True (to apply the comparison to all fields; this is the default).
  • False (to skip the comparison completely)
  • a list of field names (to check only these fields)
  • a function taking a DataFrame as its single parameter, and returning a list of field names to check.
check_types:
(Optional) restriction of fields whose types should be compared. See check_data (above) for possible values.
check_order:
(Optional) restriction of fields whose (relative) order should be compared. See check_data (above) for possible values.
check_extra_cols:
(Optional) restriction of extra fields in the actual dataset which, if found, will cause the check to fail. See check_data (above) for possible values.
sortby:

(Optional) specification of fields to sort by before comparing.

  • None or False (do not sort; this is the default)
  • True (to sort on all fields based on their order in the reference datasets; you probably don’t want to use this option)
  • a list of field names (to sort on these fields, in order)
  • a function taking a DataFrame (which will be the reference data frame) as its single parameter, and returning a list of field names to sort on.
condition:
(Optional) filter to be applied to datasets before comparing. It can be None, or can be a function that takes a DataFrame as its single parameter and returns a vector of booleans (to specify which rows should be compared).
precision:
(Optional) number of decimal places to use for floating-point comparisons. Default is not to perform rounding.
**kwargs:
Any additional named parameters are passed straight through to the csv_read_fn function.

Raises NotImplementedError if Pandas is not available.

assertDataFrameCorrect(df, ref_csv, actual_path=None, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, **kwargs)

Check that an in-memory Pandas DataFrame matches a reference one from a saved reference CSV file.

df:
Actual DataFrame.
ref_csv:
Name of reference CSV file. The location of the reference file is determined by the configuration via set_data_location().
actual_path:
Optional parameter, giving path for file where actual DataFrame originated, used for error messages.
kind:
(Optional) reference kind (a string; see above), used to locate the reference CSV file.
csv_read_fn:

(Optional) function to read a CSV file to obtain a pandas DataFrame. If None, then a default CSV loader is used.

The default CSV loader function is a wrapper around Pandas pd.read_csv(), with default options as follows:

  • index_col is None
  • infer_datetime_format is True
  • quotechar is "
  • quoting is csv.QUOTE_MINIMAL
  • escapechar is \ (backslash)
  • na_values are the empty string, "NaN", and "NULL"
  • keep_default_na is False
check_data:

(Optional) restriction of fields whose values should be compared. Possible values are:

  • None or True (to apply the comparison to all fields; this is the default).
  • False (to skip the comparison completely)
  • a list of field names (to check only these fields)
  • a function taking a DataFrame as its single parameter, and returning a list of field names to check.
check_types:
(Optional) restriction of fields whose types should be compared. See check_data (above) for possible values.
check_order:
(Optional) restriction of fields whose (relative) order should be compared. See check_data (above) for possible values.
check_extra_cols:
(Optional) restriction of extra fields in the actual dataset which, if found, will cause the check to fail. See check_data (above) for possible values.
sortby:

(Optional) specification of fields to sort by before comparing.

  • None or False (do not sort; this is the default)
  • True (to sort on all fields based on their order in the reference datasets; you probably don’t want to use this option)
  • a list of field names (to sort on these fields, in order)
  • a function taking a DataFrame (which will be the reference data frame) as its single parameter, and returning a list of field names to sort on.
condition:
(Optional) filter to be applied to datasets before comparing. It can be None, or can be a function that takes a DataFrame as its single parameter and returns a vector of booleans (to specify which rows should be compared).
precision:
(Optional) number of decimal places to use for floating-point comparisons. Default is not to perform rounding.

Raises NotImplementedError if Pandas is not available.

assertDataFramesEqual(df, ref_df, actual_path=None, expected_path=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None)

Check that an in-memory Pandas DataFrame matches an in-memory reference one.

df:
Actual DataFrame.
ref_df:
Expected DataFrame.
actual_path:
(Optional) path for file where actual DataFrame originated, used for error messages.
expected_path:
(Optional) path for file where expected DataFrame originated, used for error messages.
check_data:

(Optional) restriction of fields whose values should be compared. Possible values are:

  • None or True (to apply the comparison to all fields; this is the default).
  • False (to skip the comparison completely)
  • a list of field names (to check only these fields)
  • a function taking a DataFrame as its single parameter, and returning a list of field names to check.
check_types:
(Optional) restriction of fields whose types should be compared. See check_data (above) for possible values.
check_order:
(Optional) restriction of fields whose (relative) order should be compared. See check_data (above) for possible values.
check_extra_cols:
(Optional) restriction of extra fields in the actual dataset which, if found, will cause the check to fail. See check_data (above) for possible values.
sortby:

(Optional) specification of fields to sort by before comparing.

  • None or False (do not sort; this is the default)
  • True (to sort on all fields based on their order in the reference datasets; you probably don’t want to use this option)
  • a list of field names (to sort on these fields, in order)
  • a function taking a DataFrame (which will be the reference data frame) as its single parameter, and returning a list of field names to sort on.
condition:
(Optional) filter to be applied to datasets before comparing. It can be None, or can be a function that takes a DataFrame as its single parameter and returns a vector of booleans (to specify which rows should be compared).
precision:
(Optional) number of decimal places to use for floating-point comparisons. Default is not to perform rounding.

Raises NotImplementedError if Pandas is not available.

assertFileCorrect(actual_path, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0)

Check that a text file matches the contents from a reference text file.

actual_path:
A path for a text file.
ref_path:
The name of the reference file. The location of the reference file is determined by the configuration via set_data_location().
kind:
The reference kind, used to locate the reference file.
lstrip:
If set to True, lines are left-stripped before the comparison is carried out.
rstrip:
If set to True, lines are right-stripped before the comparison is carried out.
ignore_substrings:
An optional list of substrings; lines containing any of these substrings will be ignored in the comparison.
ignore_patterns:
An optional list of regular expressions; lines will be considered to be the same if they only differ in substrings that match one of these regular expressions. The expressions must not contain parenthesised groups, and should only include explicit anchors if they need to refer to the whole line.
remove_lines
An optional list of substrings; lines containing any of these substrings will be completely removed before carrying out the comparison. This is the means by which you would exclude ‘optional’ content.
preprocess:
An optional function that takes a list of strings and preprocesses it in some way; this function will be applied to both the actual and expected.
max_permutation_cases:
An optional number specifying the maximum number of permutations allowed; if the actual and expected lists differ only in that their lines are permutations of each other, and the number of such permutations does not exceed this limit, then the two are considered to be identical.

This should be used for unstructured data such as logfiles, etc. For CSV files, use assertCSVFileCorrect() instead.

The ignore_lines parameter exists for backwards compatibility as an alias for remove_lines.

assertFilesCorrect(actual_paths, ref_paths, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0)

Check that a collection of text files matche the contents from matching collection of reference text files.

actual_paths:
A list of paths for text files.
ref_paths:
A list of names of the matching reference files. The location of the reference files is determined by the configuration via set_data_location().
kind:
The reference kind, used to locate the reference files. All the files must be of the same kind.
lstrip:
If set to True, lines are left-stripped before the comparison is carried out.
rstrip:
If set to True, lines are right-stripped before the comparison is carried out.
ignore_substrings:
An optional list of substrings; lines containing any of these substrings will be ignored in the comparison.
ignore_patterns:
An optional list of regular expressions; lines will be considered to be the same if they only differ in substrings that match one of these regular expressions. The expressions must not contain parenthesised groups, and should only include explicit anchors if they need to refer to the whole line.
remove_lines
An optional list of substrings; lines containing any of these substrings will be completely removed before carrying out the comparison. This is the means by which you would exclude ‘optional’ content.
preprocess:
An optional function that takes a list of strings and preprocesses it in some way; this function will be applied to both the actual and expected.
max_permutation_cases:
An optional number specifying the maximum number of permutations allowed; if the actual and expected lists differ only in that their lines are permutations of each other, and the number of such permutations does not exceed this limit, then the two are considered to be identical.

This should be used for unstructured data such as logfiles, etc. For CSV files, use assertCSVFileCorrect() instead.

The ignore_lines parameter exists for backwards compatibility as an alias for remove_lines.

assertStringCorrect(string, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0)

Check that an in-memory string matches the contents from a reference text file.

string:
The actual string.
ref_path:
The name of the reference file. The location of the reference file is determined by the configuration via set_data_location().
kind:
The reference kind, used to locate the reference file.
lstrip:
If set to True, both strings are left-stripped before the comparison is carried out. Note: the stripping is on a per-line basis.
rstrip:
If set to True, both strings are right-stripped before the comparison is carried out. Note: the stripping is on a per-line basis.
ignore_substrings:
An optional list of substrings; lines containing any of these substrings will be ignored in the comparison.
ignore_patterns:
An optional list of regular expressions; lines will be considered to be the same if they only differ in substrings that match one of these regular expressions. The expressions must not contain parenthesised groups, and should only include explicit anchors if they need to refer to the whole line.
remove_lines
An optional list of substrings; lines containing any of these substrings will be completely removed before carrying out the comparison. This is the means by which you would exclude ‘optional’ content.
preprocess:
An optional function that takes a list of strings and preprocesses it in some way; this function will be applied to both the actual and expected.
max_permutation_cases:
An optional number specifying the maximum number of permutations allowed; if the actual and expected lists differ only in that their lines are permutations of each other, and the number of such permutations does not exceed this limit, then the two are considered to be identical.

The ignore_lines parameter exists for backwards compatibility as an alias for remove_lines.

assertTextFileCorrect(actual_path, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0)

Check that a text file matches the contents from a reference text file.

actual_path:
A path for a text file.
ref_path:
The name of the reference file. The location of the reference file is determined by the configuration via set_data_location().
kind:
The reference kind, used to locate the reference file.
lstrip:
If set to True, lines are left-stripped before the comparison is carried out.
rstrip:
If set to True, lines are right-stripped before the comparison is carried out.
ignore_substrings:
An optional list of substrings; lines containing any of these substrings will be ignored in the comparison.
ignore_patterns:
An optional list of regular expressions; lines will be considered to be the same if they only differ in substrings that match one of these regular expressions. The expressions must not contain parenthesised groups, and should only include explicit anchors if they need to refer to the whole line.
remove_lines
An optional list of substrings; lines containing any of these substrings will be completely removed before carrying out the comparison. This is the means by which you would exclude ‘optional’ content.
preprocess:
An optional function that takes a list of strings and preprocesses it in some way; this function will be applied to both the actual and expected.
max_permutation_cases:
An optional number specifying the maximum number of permutations allowed; if the actual and expected lists differ only in that their lines are permutations of each other, and the number of such permutations does not exceed this limit, then the two are considered to be identical.

This should be used for unstructured data such as logfiles, etc. For CSV files, use assertCSVFileCorrect() instead.

The ignore_lines parameter exists for backwards compatibility as an alias for remove_lines.

assertTextFilesCorrect(actual_paths, ref_paths, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0)

Check that a collection of text files matche the contents from matching collection of reference text files.

actual_paths:
A list of paths for text files.
ref_paths:
A list of names of the matching reference files. The location of the reference files is determined by the configuration via set_data_location().
kind:
The reference kind, used to locate the reference files. All the files must be of the same kind.
lstrip:
If set to True, lines are left-stripped before the comparison is carried out.
rstrip:
If set to True, lines are right-stripped before the comparison is carried out.
ignore_substrings:
An optional list of substrings; lines containing any of these substrings will be ignored in the comparison.
ignore_patterns:
An optional list of regular expressions; lines will be considered to be the same if they only differ in substrings that match one of these regular expressions. The expressions must not contain parenthesised groups, and should only include explicit anchors if they need to refer to the whole line.
remove_lines
An optional list of substrings; lines containing any of these substrings will be completely removed before carrying out the comparison. This is the means by which you would exclude ‘optional’ content.
preprocess:
An optional function that takes a list of strings and preprocesses it in some way; this function will be applied to both the actual and expected.
max_permutation_cases:
An optional number specifying the maximum number of permutations allowed; if the actual and expected lists differ only in that their lines are permutations of each other, and the number of such permutations does not exceed this limit, then the two are considered to be identical.

This should be used for unstructured data such as logfiles, etc. For CSV files, use assertCSVFileCorrect() instead.

The ignore_lines parameter exists for backwards compatibility as an alias for remove_lines.

set_data_location(location, kind=None)

Declare the filesystem location for reference files of a particular kind. Typically you would subclass ReferenceTestCase and pass in these locations though its __init__ method when constructing an instance of ReferenceTestCase as a superclass.

If calls to assertTextFileCorrect() (etc) are made for kinds of reference data that hasn’t had their location defined explicitly, then the default location is used. This is the location declared for the None kind and this default must be specified.

This method overrides any global defaults set from calls to the ReferenceeTest.set_default_data_location() class-method.

If you haven’t even defined the None default, and you make calls to assertTextFileCorrect() (etc) using relative pathnames for the reference data files, then it can’t check correctness, so it will raise an exception.

classmethod set_default_data_location(location, kind=None)

Declare the default filesystem location for reference files of a particular kind. This sets the location for all instances of the class it is called on. Subclasses will inherit this default (unless they explicitly override it).

To set the location globally for all tests in all classes within an application, call this method on the ReferenceTest class.

The instance method set_data_location() can be used to set the per-kind data locations for an individual instance of a class.

If calls to assertTextFileCorrect() (etc) are made for kinds of reference data that hasn’t had their location defined explicitly, then the default location is used. This is the location declared for the None kind and this default must be specified.

If you haven’t even defined the None default, and you make calls to assertTextFileCorrect() (etc) using relative pathnames for the reference data files, then it can’t check correctness, so it will raise an exception.

classmethod set_defaults(**kwargs)

Set default parameters, at the class level. These defaults will apply to all instances of the class.

The following parameters can be set:

verbose:
Sets the boolean verbose flag globally, to control reporting of errors while running tests. Reference tests tend to take longer to run than traditional unit tests, so it is often useful to be able to see information from failing tests as they happen, rather than waiting for the full report at the end. Verbose is set to True by default.
print_fn: Sets the print function globally, to specify
the function to use to display information while running tests. The function have the same signature as Python3’s standard print function (the __future__ print function in Python2), a default print function is used which writes unbuffered to sys.stdout.
tmp_dir:
Sets the tmp_dir property globally, to specify the directory where temporary files are written. Temporary files are created whenever a text file check fails and a ‘preprocess’ function has been specified. It’s useful to be able to see the contents of the files after preprocessing has taken place, so preprocessed versions of the files are written to this directory, and their pathnames are included in the failure messages. If not explicitly set by set_defaults(), the environment variable TDDA_FAIL_DIR is used, or, if that is not defined, it defaults to /tmp, c:temp or whatever tempfile.gettempdir() returns, as appropriate.
classmethod set_regeneration(kind=None, regenerate=True)

Set the regeneration flag for a particular kind of reference file, globally, for all instances of the class.

If the regenerate flag is set to True, then the framework will regenerate reference data of that kind, rather than comparing.

All of the regeneration flags are set to False by default.

tdda.referencetest.referencetest.tag(test)

Decorator for tests, so that you can specify you only want to run a tagged subset of tests, with the -1 or –tagged option.

unittest Framework Support

This module provides the ReferenceTestCase class, which extends the standard unittest.TestCase test-case class, augmenting it with methods for checking correctness of files against reference data.

It also provides a main() function, which can be used to run (and regenerate) reference tests which have been implemented using subclasses of ReferenceTestCase.

For example:

from tdda.referencetest import ReferenceTestCase
import my_module

class TestMyClass(ReferenceTestCase):
    def test_my_csv_function(self):
        result = my_module.my_csv_function(self.tmp_dir)
        self.assertCSVFileCorrect(result, 'result.csv')

    def test_my_pandas_dataframe_function(self):
        result = my_module.my_dataframe_function()
        self.assertDataFrameCorrect(result, 'result.csv')

    def test_my_table_function(self):
        result = my_module.my_table_function()
        self.assertStringCorrect(result, 'table.txt', kind='table')

    def test_my_graph_function(self):
        result = my_module.my_graph_function()
        self.assertStringCorrect(result, 'graph.txt', kind='graph')

TestMyClass.set_default_data_location('testdata')

if __name__ == '__main__':
    ReferenceTestCase.main()

Tagged Tests

If the tests are run with the --tagged or -1 (the digit one) command-line option, then only tests that have been decorated with referencetest.tag, are run. This is a mechanism for allowing only a chosen subset of tests to be run, which is useful during development. The @tag decorator can be applied to either test classes or test methods.

If the tests are run with the --istagged or -0 (the digit zero) command-line option, then no tests are run; instead, the framework reports the full module names of any test classes that have been decorated with @tag, or which contain any tests that have been decorated with @tag.

For example:

from tdda.referencetest import ReferenceTestCase, tag
import my_module

class TestMyClass1(ReferenceTestCase):
    @tag
    def test_a(self):
        ...

    def test_b(self):
        ...

@tag
class TestMyClass2(ReferenceTestCase):
    def test_x(self):
        ...

    def test_y(self):
        ...

If run with python mytests.py --tagged, only the tagged tests are run (TestMyClass1.test_a, TestMyClass2.test_x and TestMyClass2.test_y).

Regeneration of Results

When its main is run with --write-all or --write (or -W or -w respectively), it causes the framework to regenerate reference data files. Different kinds of reference results can be regenerated by passing in a comma-separated list of kind names immediately after the --write option. If no list of kind names is provided, then all test results will be regenerated.

To regenerate all reference results (or generate them for the first time)

pytest -s --write-all

To regenerate just a particular kind of reference (e.g. table results)

python my_tests.py --write table

To regenerate a number of different kinds of reference (e.g. both table and graph results)

python my_tests.py --write table graph

unittest Integration Details

class tdda.referencetest.referencetestcase.ReferenceTestCase(*args, **kwargs)

Wrapper around the ReferenceTest class to allow it to operate as a test-case class using the unittest testing framework.

The ReferenceTestCase class is a mix-in of unittest.TestCase and ReferenceTest, so it can be used as the base class for unit tests, allowing the tests to use any of the standard unittest assert methods, and also use any of the referencetest assert extensions.

static main()

Wrapper around the unittest.main() entry point.

This is the same as the main() function, and is provided just as a convenience, as it means that tests using the ReferenceTestCase class only need to import that single class on its own.

tag()

Decorator for tests, so that you can specify you only want to run a tagged subset of tests, with the -1 or –tagged option.

class tdda.referencetest.referencetestcase.TaggedTestLoader(check, printer=None)

Subclass of TestLoader, which strips out any non-tagged tests.

getTestCaseNames(testCaseClass)

Return a sorted sequence of method names found within testCaseClass

loadTestsFromModule(*args, **kwargs)

Return a suite of all tests cases contained in the given module

loadTestsFromName(*args, **kwargs)

Return a suite of all tests cases given a string specifier.

The name may resolve either to a module, a test case class, a test method within a test case class, or a callable object which returns a TestCase or TestSuite instance.

The method optionally resolves the names relative to a given module.

loadTestsFromNames(*args, **kwargs)

Return a suite of all tests cases found using the given sequence of string specifiers. See ‘loadTestsFromName()’.

loadTestsFromTestCase(*args, **kwargs)

Return a suite of all tests cases contained in testCaseClass

tdda.referencetest.referencetestcase.main()

Wrapper around the unittest.main() entry point.

pytest Framework Support

This provides all of the methods in the ReferenceTest class, in a way that allows them to be used as pytest fixtures.

This allows these functions to be called from tests running from the pytest framework.

For example:

import my_module

def test_my_csv_function(ref):
    resultfile = my_module.my_csv_function(ref.tmp_dir)
    ref.assertCSVFileCorrect(resultfile, 'result.csv')

def test_my_pandas_dataframe_function(ref):
    resultframe = my_module.my_dataframe_function()
    ref.assertDataFrameCorrect(resultframe, 'result.csv')

def test_my_table_function(ref):
    result = my_module.my_table_function()
    ref.assertStringCorrect(result, 'table.txt', kind='table')

def test_my_graph_function(ref):
    result = my_module.my_graph_function()
    ref.assertStringCorrect(result, 'graph.txt', kind='graph')

class TestMyClass:
    def test_my_other_table_function(ref):
        result = my_module.my_other_table_function()
        ref.assertStringCorrect(result, 'table.txt', kind='table')

with a conftest.py containing:

from tdda.referencetest.pytestconfig import (pytest_addoption,
                                             pytest_collection_modifyitems,
                                             set_default_data_location,
                                             ref)

set_default_data_location('testdata')

This configuration enables the additional command-line options, and also provides a ref fixture, as an instance of the ReferenceTest class. Of course, for brevity, if you prefer, you can use

from tdda.referencetest.pytestconfig import *

rather than importing the four individual items if you are not customising anything yourself, but that is less flexible.

This example also sets a default data location which will apply to all reference fixtures. This means that any tests that use ref will automatically be able to locate their “expected results” reference data files.

Reference Fixtures

The default configuration provides a single fixture, ref.

To configure a large suite of tests so that tests do not all have to share a single common reference-data location, you can set up additional reference fixtures, configured differently. For example, to set up a fixure ref_special, whose reference data is stored in ../specialdata, you could include:

@pytest.fixture(scope='module')
def ref_special(request):
    r = referencepytest.ref(request)
    r.set_data_location('../specialdata')
    return r

Tests can use this additional fixture:

import my_special_module

def test_something(ref_special):
    result = my_special_module.something()
    ref_special.assertStringCorrect(resultfile, 'something.csv')

Tagged Tests

If the tests are run with the --tagged command-line option, then only tests that have been decorated with referencetest.tag, are run. This is a mechanism for allowing only a chosen subset of tests to be run, which is useful during development. The @tag decorator can be applied to test functions, test classes and test methods.

If the tests are run with the --istagged command-line option, then no tests are run; instead, the framework reports the full module names of any test classes or functions that have been decorated with @tag, or classes which contain any test methods that have been decorated with @tag.

For example:

from tdda.referencetest import tag

@tag
def test_a(ref):
    assert 'a' + 'a' == 'aa'

def test_b(ref):
    assert 'b' * 2 == 'bb'

@tag
class TestMyClass:
    def test_x(self):
        list('xxx') == ['x', 'x', 'x']

    def test_y(self):
        'y'.upper() == 'Y'

If run with pytest --tagged, only the tagged tests are run (test_a, TestMyClass.test_x and TestMyClass.test_y).

Regeneration of Results

When pytest is run with --write-all or --write, it causes the framework to regenerate reference data files. Different kinds of reference results can be regenerated by passing in a comma-separated list of kind names immediately after the --write option. If no list of kind names is provided, then all test results will be regenerated.

If the -s option is also provided (to disable pytest output capturing), it will report the names of all the files it has regenerated.

To regenerate all reference results (or generate them for the first time)

pytest -s --write-all

To regenerate just a particular kind of reference (e.g. table results)

pytest -s --write table

To regenerate a number of different kinds of reference (e.g. both table and graph results)

pytest -s --write table graph

pytest Integration Details

In addition to all of the methods from ReferenceTest, the following functions are provided, to allow easier integration with the pytest framework.

Typically your test code would not need to call any of these methods directly (apart from set_default_data_location()), as they are all enabled automatically if you import the default ReferenceTest configuration into your conftest.py file:

from tdda.referencetest.pytestconfig import *
tdda.referencetest.referencepytest.addoption(parser)

Support for the –write and –write-all command-line options.

A test’s conftest.py file should declare extra options by defining a pytest_addoption function which should just call this.

It extends pytest to include –write and –write-all option flags which can be used to control regeneration of reference results.

tdda.referencetest.referencepytest.ref(request)

Support for dependency injection via a pytest fixture.

A test’s conftest.py should define a fixture function for injecting a ReferenceTest instance, which should just call this function.

This allows tests to get access to a private instance of that class.

tdda.referencetest.referencepytest.set_default_data_location(location, kind=None)

This provides a mechanism for setting the default reference data location in the ReferenceTest class.

It takes the same parameters as tdda.referencetest.referencetest.ReferenceTest.set_default_data_location().

If you want the same data locations for all your tests, it can be easier to set them with calls to this function, rather than having to set them explicitly in each test (or using set_data_location() in your @pytest.fixture ref definition in your conftest.py file).

tdda.referencetest.referencepytest.set_defaults(**kwargs)

This provides a mechanism for setting default attributes in the ReferenceTest class.

It takes the same parameters as tdda.referencetest.referencetest.ReferenceTest.set_defaults(), and can be used for setting parameters such as the tmp_dir property.

If you want the same defaults for all your tests, it can be easier to set them with a call to this function, rather than having to set them explicitly in each test (or in your @pytest.fixture ref definition in your conftest.py file).

tdda.referencetest.referencepytest.tagged(config, items)

Support for @tag to mark tests to be run with –tagged or reported with –istagged.

It extends pytest to recognize the --tagged and --istagged command-line flags, to restrict testing to tagged tests only.

Examples

The tdda.referencetest module includes a set of examples, for both unittest and pytest.

To copy these examples to your own referencetest-examples subdirectory (or to a location of your choice), run the command:

tdda examples referencetest [mydirectory] 

Alternatively, you can copy all examples using the following command:

tdda examples

which will create three separate sub-directories.