Reference Tests

The referencetest module provides support for unit tests, allowing them to easily compare test results against saved "known to be correct" reference results.

This is typically useful for testing software that produces any of the following types of output:

  • a CSV file

  • a text file (for example: HTML, JSON, logfiles, graphs, tables, etc)

  • a string

  • a Pandas or Polars DataFrame.

The main features are:

  • If the comparison between a string and a file fails, the actual string is written to a file and a diff command is suggested for seeing the differences between the actual output and the expected output.

  • There is support for CSV files, allowing fine control over how the comparison is to be performed. This includes:

    • the ability to select which columns to compare (and which to exclude from the comparison).

    • the ability to compare metadata (types of fields) as well as values.

    • the ability to specify the precision (as number of decimal places) for the comparison of floating-point values.

    • clear reporting of where the differences are, if the comparison fails.

  • There is support for ignoring lines within the strings/files that contain particular patterns or regular expressions. This is typically useful for filtering out things like version numbers and timestamps that vary in the output from run to run, but which do not indicate a problem.

  • There is support for re-writing the reference output with the actual output. This, obviously, should be used only after careful checking that the new output is correct, either because the previous output was in fact wrong, or because the intended behaviour has changed.

  • It allows you to group your reference results into different kinds. This means you can keep different kinds of reference result files in different locations. It also means that you can selectively choose to only regenerate particular kinds of reference results, if they need to be updated because they turned out to have been wrong or if the intended behaviour has changed. Kinds are strings.

The module provides interfaces for this to be called from unit-tests based on either the standard Python unittest framework, or on pytest.

Examples

Example Using unittest:

For use with unittest, the ReferenceTest API is provided through the ReferenceTestCase class. This is an extension to the standard unittest.TestCase class, so that the ReferenceTest methods can be called directly from unittest tests.

This example shows how to write a test for a function that generates a CSV file:

from tdda.referencetest import ReferenceTestCase, tag
import my_module

class MyTest(ReferenceTestCase):
    @tag
    def test_my_csv_file(self):
        result = my_module.produce_a_csv_file(self.tmp_dir)
        self.assertCSVFileCorrect(result, 'result.csv')

MyTest.set_default_data_location('testdata')

if __name__ == '__main__':
    ReferenceTestCase.main()

To run the test:

python mytest.py

The test is tagged with @tag, meaning that it will be included if you run the tests with the --tagged option flag to specify that only tagged tests should be run:

python mytest.py --tagged

The first time you run the test, it will produce an error unless you have already created the expected ("reference") results. You can create the reference results automatically

python mytest.py --write-all

Having generated the reference results, you should carefully examine the files it has produced in the data output location, to check that they are as expected.

Example Using pytest:

For use with pytest, the ReferenceTest API is provided through the referencepytest module. This is a module that can be imported directly from pytest tests, allowing them to access ReferenceTest methods and properties.

This example shows how to write a test for a function that generates a CSV file:

from tdda.referencetest import referencepytest, tag
import my_module

@tag
def test_my_csv_function(ref):
    resultfile = my_module.produce_a_csv_file(ref.tmp_dir)
    ref.assertCSVFileCorrect(resultfile, 'result.csv')

referencepytest.set_default_data_location('testdata')

You also need a conftest.py file, to define the fixtures and defaults:

import pytest
from tdda.referencetest import referencepytest

def pytest_addoption(parser):
    referencepytest.addoption(parser)

def pytest_collection_modifyitems(session, config, items):
    referencepytest.tagged(config, items)

@pytest.fixture(scope='module')
def ref(request):
    return referencepytest.ref(request)

referencepytest.set_default_data_location('testdata')

To run the test:

pytest

The test is tagged with @tag, meaning that it will be included if you run the tests with the --tagged option flag to specify that only tagged tests should be run:

pytest --tagged

The first time you run the test, it will produce an error unless you have already created the expected ("reference") results. You can create the reference results automatically:

pytest --write-all -s

Having generated the reference results, you should examine the files it has produced in the data output location, to check that they are as expected.

Methods and Functions

class tdda.referencetest.referencetest.ReferenceTest(assert_fn)

Provides support for comparing results against reference “known to be correct” results.

Can be used with:

  • the standard Python unittest framework, via the ReferenceTestCase class, which is a drop-in replacement for unittest.TestCase extended with all ReferenceTest methods.

  • the pytest framework, via the referencepytest module, which exposes all ReferenceTest methods as functions callable directly from pytest tests.

In addition to the assertion methods, the class provides useful instance variables that can be set via the set_defaults class method.

all_fields_except(exclusions)

Return all field names in the DataFrame except those specified.

Helper for use with the check_data, check_types and check_order parameters of the DataFrame assertion methods.

Parameters:

exclusions – A list of field names to exclude.

assertBinaryFileCorrect(actual_path, ref_path, kind=None)

Check that a binary file matches the contents from a reference binary file.

Parameters:
  • actual_path – Path to the actual binary file.

  • ref_path – The name of the reference binary file. The location of the reference file is determined by the configuration via set_data_location().

  • kind – The reference kind, used to locate the reference file.

assertCSVFileCorrect(actual_path, ref_csv, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)

Legacy convenience method with second parameter called ref_csv. Just calls assertStoredDataFrameCorrect.

assertCSVFilesCorrect(actual_paths, ref_csvs, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)

Legacy method that just calls assertStoredDataFramesCorrect.

assertDataFrameCorrect(df, ref_path, actual_path=None, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, backend=None, **kwargs)

Check that an in-memory DataFrame matches a saved reference DataFrame on disk (parquet or CSV).

The actual DataFrame may be Pandas or Polars; the engine is inferred from it unless overridden by the engine parameter.

Parameters:
  • df – Actual DataFrame (Pandas or Polars).

  • ref_path – Name of the reference file, which can be a .parquet file or a CSV file. The location of the reference file is determined by the configuration via set_data_location(). Renamed from csv_path in version 2.2.

  • actual_path – Optional path for the file where the actual DataFrame originated, used for error messages.

  • kind – Optional reference kind (a string), used to locate the reference file.

  • csv_read_fn

    Optional function to read a CSV file to obtain a DataFrame. If None, a default CSV loader is used.

    The default CSV loader is a wrapper around pd.read_csv() with the following options:

    • index_col is None

    • infer_datetime_format is True

    • quotechar is "

    • quoting is csv.QUOTE_MINIMAL

    • escapechar is \ (backslash)

    • na_values are the empty string, "NaN" and "NULL"

    • keep_default_na is False

  • check_data – See assertDataFramesEquivalent for details.

  • check_types – See assertDataFramesEquivalent for details.

  • check_order – See assertDataFramesEquivalent for details.

  • check_extra_cols – See assertDataFramesEquivalent for details.

  • sortby – See assertDataFramesEquivalent for details.

  • condition – See assertDataFramesEquivalent for details.

  • precision – See assertDataFramesEquivalent for details.

  • type_matching – See assertDataFramesEquivalent for details.

  • fuzzy_nulls – See assertDataFramesEquivalent for details.

  • engine – See assertDataFramesEquivalent for details.

  • backend – See assertDataFramesEquivalent for details.

  • **kwargs – Additional keyword arguments passed to csv_read_fn.

assertDataFramesEqual(df, ref_df, actual_path=None, expected_path=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, backend=None)

Check that an in-memory DataFrame matches an in-memory reference one.

Both df and ref_df may be Pandas or Polars DataFrames. If they are of different types, both are converted to the engine specified by the engine parameter, or to the default engine from configuration if engine is not supplied.

Parameters:
  • df – Actual DataFrame (Pandas or Polars).

  • ref_df – Expected DataFrame (Pandas or Polars).

  • actual_path – Optional path for the file where the actual DataFrame originated, used for error messages.

  • expected_path – Optional path for the file where the expected DataFrame originated, used for error messages.

  • check_data

    Optional restriction of fields whose values should be compared. Possible values are:

    • None or True to apply the comparison to all fields (this is the default).

    • False to skip the comparison completely.

    • a list of field names to check only those fields.

    • a function taking a DataFrame as its single parameter and returning a list of field names to check.

  • check_types – Optional restriction of fields whose types should be compared. See check_data for possible values.

  • check_order – Optional restriction of fields whose (relative) order should be compared. See check_data for possible values.

  • check_extra_cols – Optional restriction of extra fields in the actual dataset which, if found, will cause the check to fail. See check_data for possible values.

  • sortby

    Optional specification of fields to sort by before comparing. Possible values are:

    • None or False to not sort (this is the default).

    • True to sort on all fields based on their order in the reference dataset (rarely useful).

    • a list of field names to sort on, in order.

    • a function taking the reference DataFrame as its single parameter and returning a list of field names to sort on.

  • condition – Optional filter to apply to datasets before comparing. Can be None, or a function that takes a DataFrame as its single parameter and returns a vector of booleans specifying which rows to compare.

  • precision – Optional number of decimal places to use for floating-point comparisons. Default is 7.

  • type_matching – How to match field types: 'strict', 'medium', or 'loose' (also 'permissive'). Default is 'strict'.

  • engine – DataFrame engine to use for comparison: 'pandas' or 'polars'. Required when df and ref_df are of different types; otherwise inferred from the DataFrames.

  • fuzzy_nulls – If True, treat different null types (such as pd.NaN and None) as equivalent when comparing. Default is False.

  • backend – Pandas backend: 'numpy_nullable', 'pyarrow', or 'original'.

Note

assertDataFramesEqual and assertDataFramesEquivalent are identical; two names are provided for flexibility and as legacy support.

assertDataFramesEquivalent(df, ref_df, actual_path=None, expected_path=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, backend=None)

Check that an in-memory DataFrame matches an in-memory reference one.

Both df and ref_df may be Pandas or Polars DataFrames. If they are of different types, both are converted to the engine specified by the engine parameter, or to the default engine from configuration if engine is not supplied.

Parameters:
  • df – Actual DataFrame (Pandas or Polars).

  • ref_df – Expected DataFrame (Pandas or Polars).

  • actual_path – Optional path for the file where the actual DataFrame originated, used for error messages.

  • expected_path – Optional path for the file where the expected DataFrame originated, used for error messages.

  • check_data

    Optional restriction of fields whose values should be compared. Possible values are:

    • None or True to apply the comparison to all fields (this is the default).

    • False to skip the comparison completely.

    • a list of field names to check only those fields.

    • a function taking a DataFrame as its single parameter and returning a list of field names to check.

  • check_types – Optional restriction of fields whose types should be compared. See check_data for possible values.

  • check_order – Optional restriction of fields whose (relative) order should be compared. See check_data for possible values.

  • check_extra_cols – Optional restriction of extra fields in the actual dataset which, if found, will cause the check to fail. See check_data for possible values.

  • sortby

    Optional specification of fields to sort by before comparing. Possible values are:

    • None or False to not sort (this is the default).

    • True to sort on all fields based on their order in the reference dataset (rarely useful).

    • a list of field names to sort on, in order.

    • a function taking the reference DataFrame as its single parameter and returning a list of field names to sort on.

  • condition – Optional filter to apply to datasets before comparing. Can be None, or a function that takes a DataFrame as its single parameter and returns a vector of booleans specifying which rows to compare.

  • precision – Optional number of decimal places to use for floating-point comparisons. Default is 7.

  • type_matching – How to match field types: 'strict', 'medium', or 'loose' (also 'permissive'). Default is 'strict'.

  • engine – DataFrame engine to use for comparison: 'pandas' or 'polars'. Required when df and ref_df are of different types; otherwise inferred from the DataFrames.

  • fuzzy_nulls – If True, treat different null types (such as pd.NaN and None) as equivalent when comparing. Default is False.

  • backend – Pandas backend: 'numpy_nullable', 'pyarrow', or 'original'.

Note

assertDataFramesEqual and assertDataFramesEquivalent are identical; two names are provided for flexibility and as legacy support.

assertFileCorrect(actual_path, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0, encoding=None)

Check that a text file matches the contents from a reference text file.

For CSV files, use assertStoredDataFrameCorrect instead.

Parameters:
  • actual_path – Path to the actual text file.

  • ref_path – The name of the reference file. The location of the reference file is determined by the configuration via set_data_location().

  • kind – See assertStringCorrect for details.

  • lstrip – See assertStringCorrect for details.

  • rstrip – See assertStringCorrect for details.

  • ignore_substrings – See assertStringCorrect for details.

  • ignore_patterns – See assertStringCorrect for details.

  • remove_lines – See assertStringCorrect for details.

  • preprocess – See assertStringCorrect for details.

  • max_permutation_cases – See assertStringCorrect for details.

  • encoding – Optional character encoding for reading the file.

Note

ignore_lines is a legacy alias for remove_lines. assertFileCorrect is a legacy alias for this method.

assertFilesCorrect(actual_paths, ref_paths, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0, encodings=None)

Check that a collection of text files match the contents from a matching collection of reference text files.

For CSV files, use assertStoredDataFramesCorrect instead.

Parameters:
  • actual_paths – A list of paths for text files.

  • ref_paths – A list of names of the matching reference files. The location of the reference files is determined by the configuration via set_data_location().

  • kind – See assertStringCorrect for details.

  • lstrip – See assertStringCorrect for details.

  • rstrip – See assertStringCorrect for details.

  • ignore_substrings – See assertStringCorrect for details.

  • ignore_patterns – See assertStringCorrect for details.

  • remove_lines – See assertStringCorrect for details.

  • preprocess – See assertStringCorrect for details.

  • max_permutation_cases – See assertStringCorrect for details.

  • encodings – Optional list of character encodings, one per file.

Note

ignore_lines is a legacy alias for remove_lines. assertFilesCorrect is a legacy alias for this method.

assertOnDiskDataFrameCorrect(actual_path, ref_path, kind='parquet', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)

Check that a DataFrame stored on disk (as a parquet or CSV file) matches a reference DataFrame, also stored on disk.

Parameters:
  • actual_path – Path to the actual serialized DataFrame.

  • ref_path – Path to the reference serialized DataFrame. The location of the reference file is determined by the configuration via set_data_location().

  • kind – Optional reference kind (a string), used to locate the reference file.

  • csv_read_fn

    Optional function to read a CSV file to obtain a DataFrame. If None, a default CSV loader is used.

    The default CSV loader is a wrapper around pd.read_csv() with the following options:

    • index_col is None

    • infer_datetime_format is True

    • quotechar is "

    • quoting is csv.QUOTE_MINIMAL

    • escapechar is \ (backslash)

    • na_values are the empty string, "NaN" and "NULL"

    • keep_default_na is False

  • check_data – See assertDataFramesEquivalent for details.

  • check_types – See assertDataFramesEquivalent for details.

  • check_order – See assertDataFramesEquivalent for details.

  • check_extra_cols – See assertDataFramesEquivalent for details.

  • sortby – See assertDataFramesEquivalent for details.

  • condition – See assertDataFramesEquivalent for details.

  • precision – See assertDataFramesEquivalent for details.

  • type_matching – See assertDataFramesEquivalent for details.

  • fuzzy_nulls – See assertDataFramesEquivalent for details.

  • engine – See assertDataFramesEquivalent for details.

  • **kwargs – Additional keyword arguments passed to csv_read_fn.

Note

assertOnDiskDataFrameCorrect is a legacy alias for assertStoredDataFrameCorrect.

Note

If the format is CSV, the CSV file is loaded as a DataFrame using the default engine, or whichever is supplied (pandas or polars), with tdda.serial.csv_to_polars or tdda.serial.csv_to_pandas.

assertOnDiskDataFramesCorrect(actual_paths, ref_paths, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)

Check that a set of serialized DataFrames in files match corresponding reference ones.

Parameters:
  • actual_paths – List of paths to actual serialized DataFrames (parquet or CSV).

  • ref_paths – List of paths to matching reference serialized DataFrames (parquet or CSV). The location of the reference files is determined by the configuration via set_data_location().

  • kind – Optional reference kind (a string), used to locate the reference files.

  • csv_read_fn

    Optional function to read a CSV file to obtain a DataFrame. If None, a default CSV loader is used.

    The default CSV loader is a wrapper around pd.read_csv() with the following options:

    • index_col is None

    • infer_datetime_format is True

    • quotechar is "

    • quoting is csv.QUOTE_MINIMAL

    • escapechar is \ (backslash)

    • na_values are the empty string, "NaN" and "NULL"

    • keep_default_na is False

  • check_data – See assertDataFramesEquivalent for details.

  • check_types – See assertDataFramesEquivalent for details.

  • check_order – See assertDataFramesEquivalent for details.

  • check_extra_cols – See assertDataFramesEquivalent for details.

  • sortby – See assertDataFramesEquivalent for details.

  • condition – See assertDataFramesEquivalent for details.

  • precision – See assertDataFramesEquivalent for details.

  • type_matching – See assertDataFramesEquivalent for details.

  • fuzzy_nulls – See assertDataFramesEquivalent for details.

  • engine – See assertDataFramesEquivalent for details.

  • **kwargs – Additional keyword arguments passed to csv_read_fn.

Note

assertOnDiskDataFramesCorrect is a legacy alias for assertStoredDataFramesCorrect.

Note

If the format is CSV, the CSV file is loaded as a DataFrame using the default engine, or whichever is supplied (pandas or polars), with tdda.serial.csv_to_polars or tdda.serial.csv_to_pandas.

assertStoredDataFrameCorrect(actual_path, ref_path, kind='parquet', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)

Check that a DataFrame stored on disk (as a parquet or CSV file) matches a reference DataFrame, also stored on disk.

Parameters:
  • actual_path – Path to the actual serialized DataFrame.

  • ref_path – Path to the reference serialized DataFrame. The location of the reference file is determined by the configuration via set_data_location().

  • kind – Optional reference kind (a string), used to locate the reference file.

  • csv_read_fn

    Optional function to read a CSV file to obtain a DataFrame. If None, a default CSV loader is used.

    The default CSV loader is a wrapper around pd.read_csv() with the following options:

    • index_col is None

    • infer_datetime_format is True

    • quotechar is "

    • quoting is csv.QUOTE_MINIMAL

    • escapechar is \ (backslash)

    • na_values are the empty string, "NaN" and "NULL"

    • keep_default_na is False

  • check_data – See assertDataFramesEquivalent for details.

  • check_types – See assertDataFramesEquivalent for details.

  • check_order – See assertDataFramesEquivalent for details.

  • check_extra_cols – See assertDataFramesEquivalent for details.

  • sortby – See assertDataFramesEquivalent for details.

  • condition – See assertDataFramesEquivalent for details.

  • precision – See assertDataFramesEquivalent for details.

  • type_matching – See assertDataFramesEquivalent for details.

  • fuzzy_nulls – See assertDataFramesEquivalent for details.

  • engine – See assertDataFramesEquivalent for details.

  • **kwargs – Additional keyword arguments passed to csv_read_fn.

Note

assertOnDiskDataFrameCorrect is a legacy alias for assertStoredDataFrameCorrect.

Note

If the format is CSV, the CSV file is loaded as a DataFrame using the default engine, or whichever is supplied (pandas or polars), with tdda.serial.csv_to_polars or tdda.serial.csv_to_pandas.

assertStoredDataFramesCorrect(actual_paths, ref_paths, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)

Check that a set of serialized DataFrames in files match corresponding reference ones.

Parameters:
  • actual_paths – List of paths to actual serialized DataFrames (parquet or CSV).

  • ref_paths – List of paths to matching reference serialized DataFrames (parquet or CSV). The location of the reference files is determined by the configuration via set_data_location().

  • kind – Optional reference kind (a string), used to locate the reference files.

  • csv_read_fn

    Optional function to read a CSV file to obtain a DataFrame. If None, a default CSV loader is used.

    The default CSV loader is a wrapper around pd.read_csv() with the following options:

    • index_col is None

    • infer_datetime_format is True

    • quotechar is "

    • quoting is csv.QUOTE_MINIMAL

    • escapechar is \ (backslash)

    • na_values are the empty string, "NaN" and "NULL"

    • keep_default_na is False

  • check_data – See assertDataFramesEquivalent for details.

  • check_types – See assertDataFramesEquivalent for details.

  • check_order – See assertDataFramesEquivalent for details.

  • check_extra_cols – See assertDataFramesEquivalent for details.

  • sortby – See assertDataFramesEquivalent for details.

  • condition – See assertDataFramesEquivalent for details.

  • precision – See assertDataFramesEquivalent for details.

  • type_matching – See assertDataFramesEquivalent for details.

  • fuzzy_nulls – See assertDataFramesEquivalent for details.

  • engine – See assertDataFramesEquivalent for details.

  • **kwargs – Additional keyword arguments passed to csv_read_fn.

Note

assertOnDiskDataFramesCorrect is a legacy alias for assertStoredDataFramesCorrect.

Note

If the format is CSV, the CSV file is loaded as a DataFrame using the default engine, or whichever is supplied (pandas or polars), with tdda.serial.csv_to_polars or tdda.serial.csv_to_pandas.

assertStringCorrect(string, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0)

Check that an in-memory string matches the contents from a reference text file.

Parameters:
  • string – The actual string.

  • ref_path – The name of the reference file. The location of the reference file is determined by the configuration via set_data_location().

  • kind – The reference kind, used to locate the reference file.

  • lstrip – If True, whitespace is stripped from the start of each line before comparison.

  • rstrip – If True, whitespace is stripped from the end of each line before comparison.

  • ignore_substrings – An optional list of substrings; lines containing any of these substrings will be ignored in the comparison.

  • ignore_patterns – An optional list of regular expressions; lines will be considered the same if they differ only in substrings that match one of these expressions. Expressions should only include explicit anchors if they need to refer to the whole line. Only the matched portion is ignored; any text to the left or right must be identical in both strings.

  • remove_lines – An optional list of substrings; lines containing any of these substrings will be removed before comparison.

  • preprocess – An optional function that takes a list of strings and preprocesses it; applied to both the actual and expected strings before comparison.

  • max_permutation_cases – An optional number specifying the maximum number of permutations to allow; if the actual and expected lists differ only in line order, and the number of such permutations does not exceed this limit, the two are considered identical.

Note

The ignore_lines parameter is a backwards-compatible alias for remove_lines.

assertTextFileCorrect(actual_path, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0, encoding=None)

Check that a text file matches the contents from a reference text file.

For CSV files, use assertStoredDataFrameCorrect instead.

Parameters:
  • actual_path – Path to the actual text file.

  • ref_path – The name of the reference file. The location of the reference file is determined by the configuration via set_data_location().

  • kind – See assertStringCorrect for details.

  • lstrip – See assertStringCorrect for details.

  • rstrip – See assertStringCorrect for details.

  • ignore_substrings – See assertStringCorrect for details.

  • ignore_patterns – See assertStringCorrect for details.

  • remove_lines – See assertStringCorrect for details.

  • preprocess – See assertStringCorrect for details.

  • max_permutation_cases – See assertStringCorrect for details.

  • encoding – Optional character encoding for reading the file.

Note

ignore_lines is a legacy alias for remove_lines. assertFileCorrect is a legacy alias for this method.

assertTextFilesCorrect(actual_paths, ref_paths, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0, encodings=None)

Check that a collection of text files match the contents from a matching collection of reference text files.

For CSV files, use assertStoredDataFramesCorrect instead.

Parameters:
  • actual_paths – A list of paths for text files.

  • ref_paths – A list of names of the matching reference files. The location of the reference files is determined by the configuration via set_data_location().

  • kind – See assertStringCorrect for details.

  • lstrip – See assertStringCorrect for details.

  • rstrip – See assertStringCorrect for details.

  • ignore_substrings – See assertStringCorrect for details.

  • ignore_patterns – See assertStringCorrect for details.

  • remove_lines – See assertStringCorrect for details.

  • preprocess – See assertStringCorrect for details.

  • max_permutation_cases – See assertStringCorrect for details.

  • encodings – Optional list of character encodings, one per file.

Note

ignore_lines is a legacy alias for remove_lines. assertFilesCorrect is a legacy alias for this method.

set_data_location(location, kind=None)

Declare the filesystem location for reference files of a particular kind for this instance.

Overrides any global defaults set via ReferenceTest.set_default_data_location().

If an assertion is made for a kind whose location has not been defined explicitly, the default location (declared for kind None) is used. This default must be specified. If it is not set and relative pathnames are used, an exception is raised.

Parameters:
  • location – Filesystem path to the directory containing reference files of this kind.

  • kind – The reference kind this location applies to. None sets the default location used when no specific kind is matched.

classmethod set_default_data_location(location, kind=None)

Declare the default filesystem location for reference files of a particular kind, applying to all instances of the class.

Subclasses inherit this default unless they explicitly override it. To set the location globally for all test classes in an application, call this on the ReferenceTest class directly.

Use the instance method set_data_location() to set per-kind locations for an individual instance.

If an assertion is made for a kind whose location has not been defined explicitly, the default location (declared for kind None) is used. This default must be specified. If it is not set and relative pathnames are used, an exception is raised.

Parameters:
  • location – Filesystem path to the directory containing reference files of this kind.

  • kind – The reference kind this location applies to. None sets the default location used when no specific kind is matched.

classmethod set_defaults(**kwargs)

Set default parameters at the class level, applying to all instances.

Parameters:

**kwargs

Keyword arguments. Supported keys are:

  • verbose: Boolean flag controlling reporting of errors while running tests. Reference tests tend to take longer than traditional unit tests, so seeing failures as they happen is often useful. Default is True.

  • print_fn: Function to use to display information while running tests. Must have the same signature as Python’s built-in print. Defaults to unbuffered output to sys.stdout.

  • tmp_dir: Directory where temporary files are written. Temporary files are created when a text file check fails and a preprocess function has been specified, so the preprocessed versions can be inspected. If not set, the TDDA_FAIL_DIR environment variable is used, or tempfile.gettempdir() as a fallback.

classmethod set_regeneration(kind=None, regenerate=True)

Set the regeneration flag for a particular kind of reference file, globally, for all instances of the class.

If the regenerate flag is set to True, then the framework will regenerate reference data of that kind, rather than comparing.

All of the regeneration flags are set to False by default.

tdda.referencetest.referencetest.tag(test)

Decorator for tests, so that you can specify you only want to run a tagged subset of tests, with the -1 or –tagged option.

unittest Framework Support

This module provides the ReferenceTestCase class, which extends the standard unittest.TestCase test-case class, augmenting it with methods for checking correctness of files against reference data.

It also provides a main() function, which can be used to run (and regenerate) reference tests which have been implemented using subclasses of ReferenceTestCase.

For example:

from tdda.referencetest import ReferenceTestCase
import my_module

class TestMyClass(ReferenceTestCase):
    def test_my_csv_function(self):
        result = my_module.my_csv_function(self.tmp_dir)
        self.assertCSVFileCorrect(result, 'result.csv')

    def test_my_pandas_dataframe_function(self):
        result = my_module.my_dataframe_function()
        self.assertDataFrameCorrect(result, 'result.csv')

    def test_my_table_function(self):
        result = my_module.my_table_function()
        self.assertStringCorrect(result, 'table.txt', kind='table')

    def test_my_graph_function(self):
        result = my_module.my_graph_function()
        self.assertStringCorrect(result, 'graph.txt', kind='graph')

TestMyClass.set_default_data_location('testdata')

if __name__ == '__main__':
    ReferenceTestCase.main()

Tagged Tests

If the tests are run with the --tagged or -1 (the digit one) command-line option, then only tests that have been decorated with referencetest.tag, are run. This is a mechanism for allowing only a chosen subset of tests to be run, which is useful during development. The @tag decorator can be applied to either test classes or test methods.

If the tests are run with the --istagged or -0 (the digit zero) command-line option, then no tests are run; instead, the framework reports the full module names of any test classes that have been decorated with @tag, or which contain any tests that have been decorated with @tag.

For example:

from tdda.referencetest import ReferenceTestCase, tag
import my_module

class TestMyClass1(ReferenceTestCase):
    @tag
    def test_a(self):
        ...

    def test_b(self):
        ...

@tag
class TestMyClass2(ReferenceTestCase):
    def test_x(self):
        ...

    def test_y(self):
        ...

If run with python mytests.py --tagged, only the tagged tests are run (TestMyClass1.test_a, TestMyClass2.test_x and TestMyClass2.test_y).

The -9 (or --untag) option removes all @tag decorators from test source files, allowing a clean slate.

Failure Logging and Auto-tagging

Running with -F (or --log-failures) causes the names of any failing or erroring tests to be logged to a file. After the run, the tdda tag command reads that log and adds @tag decorators to the failing tests automatically.

A typical unittest-style workflow for focusing on failures is:

python tests.py -9      # Remove any existing @tag decorators
python tests.py -F      # Run tests, logging failures
tdda tag                # Add @tag to failing tests
python tests.py -1      # Run only tagged (failing) tests

When all tests pass, clean up the tags:

python tests.py -9

The equivalent workflow with pytest:

pytest --untag          # Remove any existing @tag decorators
pytest --log-failures   # Run tests, logging failures
tdda tag                # Add @tag to failing tests
pytest --tagged         # Run only tagged (failing) tests

When all tests pass, clean up the tags:

pytest --untag

Regeneration of Results

When its main is run with --write-all or --write (or -W or -w respectively), it causes the framework to regenerate reference data files. Different kinds of reference results can be regenerated by passing in a comma-separated list of kind names immediately after the --write option. If no list of kind names is provided, then all test results will be regenerated.

To regenerate all reference results (or generate them for the first time)

pytest -s --write-all

To regenerate just a particular kind of reference (e.g. table results)

python my_tests.py --write table

To regenerate a number of different kinds of reference (e.g. both table and graph results)

python my_tests.py --write table graph
class tdda.referencetest.referencetestcase.ReferenceTestCase(*args, **kwargs)

Wrapper around the ReferenceTest class to allow it to operate as a test-case class using the unittest testing framework.

The ReferenceTestCase class is a mix-in of unittest.TestCase and ReferenceTest, so it can be used as the base class for unit tests, allowing the tests to use any of the standard unittest assert methods, and also use any of the referencetest assert extensions.

static main(module=None, argv=None, testtdda=False, report=None, **kw)

Wrapper around the unittest.main() entry point.

This is the same as the main() function, and is provided just as a convenience, as it means that tests using the ReferenceTestCase class only need to import that single class on its own.

tag()

Decorator for tests, so that you can specify you only want to run a tagged subset of tests, with the -1 or –tagged option.

tdda.referencetest.referencetestcase.main()

Wrapper around the unittest.main() entry point.

pytest Framework Support

This provides all of the methods in the ReferenceTest class, in a way that allows them to be used as pytest fixtures.

This allows these functions to be called from tests running from the pytest framework.

For example:

import my_module

def test_my_csv_function(ref):
    resultfile = my_module.my_csv_function(ref.tmp_dir)
    ref.assertCSVFileCorrect(resultfile, 'result.csv')

def test_my_pandas_dataframe_function(ref):
    resultframe = my_module.my_dataframe_function()
    ref.assertDataFrameCorrect(resultframe, 'result.csv')

def test_my_table_function(ref):
    result = my_module.my_table_function()
    ref.assertStringCorrect(result, 'table.txt', kind='table')

def test_my_graph_function(ref):
    result = my_module.my_graph_function()
    ref.assertStringCorrect(result, 'graph.txt', kind='graph')

class TestMyClass:
    def test_my_other_table_function(ref):
        result = my_module.my_other_table_function()
        ref.assertStringCorrect(result, 'table.txt', kind='table')

with a conftest.py containing:

from tdda.referencetest.pytestconfig import (pytest_addoption,
                                             pytest_collection_modifyitems,
                                             set_default_data_location,
                                             ref)

set_default_data_location('testdata')

This configuration enables the additional command-line options, and also provides a ref fixture, as an instance of the ReferenceTest class. Of course, for brevity, if you prefer, you can use:

from tdda.referencetest.pytestconfig import *

rather than importing the four individual items if you are not customising anything yourself, but that is less flexible.

This example also sets a default data location which will apply to all reference fixtures. This means that any tests that use ref will automatically be able to locate their "expected results" reference data files.

Reference Fixtures

The default configuration provides a single fixture, ref.

To configure a large suite of tests so that tests do not all have to share a single common reference-data location, you can set up additional reference fixtures, configured differently. For example, to set up a fixure ref_special, whose reference data is stored in ../specialdata, you could include:

@pytest.fixture(scope='module')
def ref_special(request):
    r = referencepytest.ref(request)
    r.set_data_location('../specialdata')
    return r

Tests can use this additional fixture:

import my_special_module

def test_something(ref_special):
    result = my_special_module.something()
    ref_special.assertStringCorrect(resultfile, 'something.csv')

Tagged Tests (pytest)

If the tests are run with the --tagged command-line option, then only tests that have been decorated with referencetest.tag, are run. This is a mechanism for allowing only a chosen subset of tests to be run, which is useful during development. The @tag decorator can be applied to test functions, test classes and test methods.

If the tests are run with the --istagged command-line option, then no tests are run; instead, the framework reports the full module names of any test classes or functions that have been decorated with @tag, or classes which contain any test methods that have been decorated with @tag.

For example:

from tdda.referencetest import tag

@tag
def test_a(ref):
    assert 'a' + 'a' == 'aa'

def test_b(ref):
    assert 'b' * 2 == 'bb'

@tag
class TestMyClass:
    def test_x(self):
        list('xxx') == ['x', 'x', 'x']

    def test_y(self):
        'y'.upper() == 'Y'

If run with pytest --tagged, only the tagged tests are run (test_a, TestMyClass.test_x and TestMyClass.test_y).

Regeneration of Results (pytest)

When pytest is run with --write-all or --write, it causes the framework to regenerate reference data files. Different kinds of reference results can be regenerated by passing in a comma-separated list of kind names immediately after the --write option. If no list of kind names is provided, then all test results will be regenerated.

If the -s option is also provided (to disable pytest output capturing), it will report the names of all the files it has regenerated.

To regenerate all reference results (or generate them for the first time)

pytest -s --write-all

To regenerate just a particular kind of reference (e.g. table results)

pytest -s --write table

To regenerate a number of different kinds of reference (e.g. both table and graph results)

pytest -s --write table graph

pytest Integration Details

In addition to all of the methods from ReferenceTest, the following functions are provided, to allow easier integration with the pytest framework.

Typically your test code would not need to call any of these methods directly (apart from set_default_data_location()), as they are all enabled automatically if you import the default ReferenceTest configuration into your conftest.py file:

from tdda.referencetest.pytestconfig import *
tdda.referencetest.referencepytest.ref(request)

Support for dependency injection via a pytest fixture.

A test's conftest.py should define a fixture function for injecting a ReferenceTest instance, which should just call this function.

This allows tests to get access to a private instance of that class.

tdda.referencetest.referencepytest.set_default_data_location(location, kind=None)

This provides a mechanism for setting the default reference data location in the ReferenceTest class.

It takes the same parameters as tdda.referencetest.referencetest.ReferenceTest.set_default_data_location().

If you want the same data locations for all your tests, it can be easier to set them with calls to this function, rather than having to set them explicitly in each test (or using set_data_location() in your @pytest.fixture ref definition in your conftest.py file).

tdda.referencetest.referencepytest.set_defaults(**kwargs)

This provides a mechanism for setting default attributes in the ReferenceTest class.

It takes the same parameters as tdda.referencetest.referencetest.ReferenceTest.set_defaults(), and can be used for setting parameters such as the tmp_dir property.

If you want the same defaults for all your tests, it can be easier to set them with a call to this function, rather than having to set them explicitly in each test (or in your @pytest.fixture ref definition in your conftest.py file).

Reference Test Examples

The tdda.referencetest module includes a set of examples, for both unittest and pytest.

To copy these examples, run the command:

tdda examples referencetest [directory]

If directory is not supplied referencetest-examples will be used.

Alternatively, you can copy all examples using the following command:

tdda examples

which will create a number of separate subdirectories.