Reference Tests
The referencetest module provides support for unit tests,
allowing them to easily compare test results against saved
"known to be correct" reference results.
This is typically useful for testing software that produces any of the following types of output:
a CSV file
a text file (for example: HTML, JSON, logfiles, graphs, tables, etc)
a string
a Pandas or Polars DataFrame.
The main features are:
If the comparison between a string and a file fails, the actual string is written to a file and a
diffcommand is suggested for seeing the differences between the actual output and the expected output.There is support for CSV files, allowing fine control over how the comparison is to be performed. This includes:
the ability to select which columns to compare (and which to exclude from the comparison).
the ability to compare metadata (types of fields) as well as values.
the ability to specify the precision (as number of decimal places) for the comparison of floating-point values.
clear reporting of where the differences are, if the comparison fails.
There is support for ignoring lines within the strings/files that contain particular patterns or regular expressions. This is typically useful for filtering out things like version numbers and timestamps that vary in the output from run to run, but which do not indicate a problem.
There is support for re-writing the reference output with the actual output. This, obviously, should be used only after careful checking that the new output is correct, either because the previous output was in fact wrong, or because the intended behaviour has changed.
It allows you to group your reference results into different kinds. This means you can keep different kinds of reference result files in different locations. It also means that you can selectively choose to only regenerate particular kinds of reference results, if they need to be updated because they turned out to have been wrong or if the intended behaviour has changed. Kinds are strings.
The module provides interfaces for this to be called from unit-tests
based on either the standard Python unittest framework,
or on pytest.
Examples
Example Using unittest:
For use with unittest, the
ReferenceTest API is provided
through the ReferenceTestCase
class. This is an extension to the standard unittest.TestCase
class, so that the ReferenceTest methods can be called directly from
unittest tests.
This example shows how to write a test for a function that generates a CSV file:
from tdda.referencetest import ReferenceTestCase, tag
import my_module
class MyTest(ReferenceTestCase):
@tag
def test_my_csv_file(self):
result = my_module.produce_a_csv_file(self.tmp_dir)
self.assertCSVFileCorrect(result, 'result.csv')
MyTest.set_default_data_location('testdata')
if __name__ == '__main__':
ReferenceTestCase.main()
To run the test:
python mytest.py
The test is tagged with @tag, meaning that it will be included if
you run the tests with the --tagged option flag to specify that only
tagged tests should be run:
python mytest.py --tagged
The first time you run the test, it will produce an error unless you have already created the expected ("reference") results. You can create the reference results automatically
python mytest.py --write-all
Having generated the reference results, you should carefully examine the files it has produced in the data output location, to check that they are as expected.
Example Using pytest:
For use with pytest, the
ReferenceTest API is provided
through the referencepytest module. This is
a module that can be imported directly from pytest tests, allowing them
to access ReferenceTest
methods and properties.
This example shows how to write a test for a function that generates a CSV file:
from tdda.referencetest import referencepytest, tag
import my_module
@tag
def test_my_csv_function(ref):
resultfile = my_module.produce_a_csv_file(ref.tmp_dir)
ref.assertCSVFileCorrect(resultfile, 'result.csv')
referencepytest.set_default_data_location('testdata')
You also need a conftest.py file, to define the fixtures and defaults:
import pytest
from tdda.referencetest import referencepytest
def pytest_addoption(parser):
referencepytest.addoption(parser)
def pytest_collection_modifyitems(session, config, items):
referencepytest.tagged(config, items)
@pytest.fixture(scope='module')
def ref(request):
return referencepytest.ref(request)
referencepytest.set_default_data_location('testdata')
To run the test:
pytest
The test is tagged with @tag, meaning that it will be included if
you run the tests with the --tagged option flag to specify that only
tagged tests should be run:
pytest --tagged
The first time you run the test, it will produce an error unless you have already created the expected ("reference") results. You can create the reference results automatically:
pytest --write-all -s
Having generated the reference results, you should examine the files it has produced in the data output location, to check that they are as expected.
Methods and Functions
- class tdda.referencetest.referencetest.ReferenceTest(assert_fn)
Provides support for comparing results against reference “known to be correct” results.
Can be used with:
the standard Python
unittestframework, via theReferenceTestCaseclass, which is a drop-in replacement forunittest.TestCaseextended with allReferenceTestmethods.the
pytestframework, via thereferencepytestmodule, which exposes allReferenceTestmethods as functions callable directly from pytest tests.
In addition to the assertion methods, the class provides useful instance variables that can be set via the
set_defaultsclass method.- all_fields_except(exclusions)
Return all field names in the DataFrame except those specified.
Helper for use with the
check_data,check_typesandcheck_orderparameters of the DataFrame assertion methods.- Parameters:
exclusions – A list of field names to exclude.
- assertBinaryFileCorrect(actual_path, ref_path, kind=None)
Check that a binary file matches the contents from a reference binary file.
- Parameters:
actual_path – Path to the actual binary file.
ref_path – The name of the reference binary file. The location of the reference file is determined by the configuration via
set_data_location().kind – The reference kind, used to locate the reference file.
- assertCSVFileCorrect(actual_path, ref_csv, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)
Legacy convenience method with second parameter called ref_csv. Just calls
assertStoredDataFrameCorrect.
- assertCSVFilesCorrect(actual_paths, ref_csvs, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)
Legacy method that just calls
assertStoredDataFramesCorrect.
- assertDataFrameCorrect(df, ref_path, actual_path=None, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, backend=None, **kwargs)
Check that an in-memory DataFrame matches a saved reference DataFrame on disk (parquet or CSV).
The actual DataFrame may be Pandas or Polars; the engine is inferred from it unless overridden by the
engineparameter.- Parameters:
df – Actual DataFrame (Pandas or Polars).
ref_path – Name of the reference file, which can be a .parquet file or a CSV file. The location of the reference file is determined by the configuration via
set_data_location(). Renamed fromcsv_pathin version 2.2.actual_path – Optional path for the file where the actual DataFrame originated, used for error messages.
kind – Optional reference kind (a string), used to locate the reference file.
csv_read_fn –
Optional function to read a CSV file to obtain a DataFrame. If
None, a default CSV loader is used.The default CSV loader is a wrapper around
pd.read_csv()with the following options:index_colisNoneinfer_datetime_formatisTruequotecharis"quotingiscsv.QUOTE_MINIMALescapecharis\(backslash)na_valuesare the empty string,"NaN"and"NULL"keep_default_naisFalse
check_data – See
assertDataFramesEquivalentfor details.check_types – See
assertDataFramesEquivalentfor details.check_order – See
assertDataFramesEquivalentfor details.check_extra_cols – See
assertDataFramesEquivalentfor details.sortby – See
assertDataFramesEquivalentfor details.condition – See
assertDataFramesEquivalentfor details.precision – See
assertDataFramesEquivalentfor details.type_matching – See
assertDataFramesEquivalentfor details.fuzzy_nulls – See
assertDataFramesEquivalentfor details.engine – See
assertDataFramesEquivalentfor details.backend – See
assertDataFramesEquivalentfor details.**kwargs – Additional keyword arguments passed to
csv_read_fn.
- assertDataFramesEqual(df, ref_df, actual_path=None, expected_path=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, backend=None)
Check that an in-memory DataFrame matches an in-memory reference one.
Both
dfandref_dfmay be Pandas or Polars DataFrames. If they are of different types, both are converted to the engine specified by theengineparameter, or to the default engine from configuration ifengineis not supplied.- Parameters:
df – Actual DataFrame (Pandas or Polars).
ref_df – Expected DataFrame (Pandas or Polars).
actual_path – Optional path for the file where the actual DataFrame originated, used for error messages.
expected_path – Optional path for the file where the expected DataFrame originated, used for error messages.
check_data –
Optional restriction of fields whose values should be compared. Possible values are:
NoneorTrueto apply the comparison to all fields (this is the default).Falseto skip the comparison completely.a list of field names to check only those fields.
a function taking a DataFrame as its single parameter and returning a list of field names to check.
check_types – Optional restriction of fields whose types should be compared. See
check_datafor possible values.check_order – Optional restriction of fields whose (relative) order should be compared. See
check_datafor possible values.check_extra_cols – Optional restriction of extra fields in the actual dataset which, if found, will cause the check to fail. See
check_datafor possible values.sortby –
Optional specification of fields to sort by before comparing. Possible values are:
NoneorFalseto not sort (this is the default).Trueto sort on all fields based on their order in the reference dataset (rarely useful).a list of field names to sort on, in order.
a function taking the reference DataFrame as its single parameter and returning a list of field names to sort on.
condition – Optional filter to apply to datasets before comparing. Can be
None, or a function that takes a DataFrame as its single parameter and returns a vector of booleans specifying which rows to compare.precision – Optional number of decimal places to use for floating-point comparisons. Default is 7.
type_matching – How to match field types:
'strict','medium', or'loose'(also'permissive'). Default is'strict'.engine – DataFrame engine to use for comparison:
'pandas'or'polars'. Required whendfandref_dfare of different types; otherwise inferred from the DataFrames.fuzzy_nulls – If
True, treat different null types (such aspd.NaNandNone) as equivalent when comparing. Default isFalse.backend – Pandas backend:
'numpy_nullable','pyarrow', or'original'.
Note
assertDataFramesEqualandassertDataFramesEquivalentare identical; two names are provided for flexibility and as legacy support.
- assertDataFramesEquivalent(df, ref_df, actual_path=None, expected_path=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, backend=None)
Check that an in-memory DataFrame matches an in-memory reference one.
Both
dfandref_dfmay be Pandas or Polars DataFrames. If they are of different types, both are converted to the engine specified by theengineparameter, or to the default engine from configuration ifengineis not supplied.- Parameters:
df – Actual DataFrame (Pandas or Polars).
ref_df – Expected DataFrame (Pandas or Polars).
actual_path – Optional path for the file where the actual DataFrame originated, used for error messages.
expected_path – Optional path for the file where the expected DataFrame originated, used for error messages.
check_data –
Optional restriction of fields whose values should be compared. Possible values are:
NoneorTrueto apply the comparison to all fields (this is the default).Falseto skip the comparison completely.a list of field names to check only those fields.
a function taking a DataFrame as its single parameter and returning a list of field names to check.
check_types – Optional restriction of fields whose types should be compared. See
check_datafor possible values.check_order – Optional restriction of fields whose (relative) order should be compared. See
check_datafor possible values.check_extra_cols – Optional restriction of extra fields in the actual dataset which, if found, will cause the check to fail. See
check_datafor possible values.sortby –
Optional specification of fields to sort by before comparing. Possible values are:
NoneorFalseto not sort (this is the default).Trueto sort on all fields based on their order in the reference dataset (rarely useful).a list of field names to sort on, in order.
a function taking the reference DataFrame as its single parameter and returning a list of field names to sort on.
condition – Optional filter to apply to datasets before comparing. Can be
None, or a function that takes a DataFrame as its single parameter and returns a vector of booleans specifying which rows to compare.precision – Optional number of decimal places to use for floating-point comparisons. Default is 7.
type_matching – How to match field types:
'strict','medium', or'loose'(also'permissive'). Default is'strict'.engine – DataFrame engine to use for comparison:
'pandas'or'polars'. Required whendfandref_dfare of different types; otherwise inferred from the DataFrames.fuzzy_nulls – If
True, treat different null types (such aspd.NaNandNone) as equivalent when comparing. Default isFalse.backend – Pandas backend:
'numpy_nullable','pyarrow', or'original'.
Note
assertDataFramesEqualandassertDataFramesEquivalentare identical; two names are provided for flexibility and as legacy support.
- assertFileCorrect(actual_path, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0, encoding=None)
Check that a text file matches the contents from a reference text file.
For CSV files, use
assertStoredDataFrameCorrectinstead.- Parameters:
actual_path – Path to the actual text file.
ref_path – The name of the reference file. The location of the reference file is determined by the configuration via
set_data_location().kind – See
assertStringCorrectfor details.lstrip – See
assertStringCorrectfor details.rstrip – See
assertStringCorrectfor details.ignore_substrings – See
assertStringCorrectfor details.ignore_patterns – See
assertStringCorrectfor details.remove_lines – See
assertStringCorrectfor details.preprocess – See
assertStringCorrectfor details.max_permutation_cases – See
assertStringCorrectfor details.encoding – Optional character encoding for reading the file.
Note
ignore_linesis a legacy alias forremove_lines.assertFileCorrectis a legacy alias for this method.
- assertFilesCorrect(actual_paths, ref_paths, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0, encodings=None)
Check that a collection of text files match the contents from a matching collection of reference text files.
For CSV files, use
assertStoredDataFramesCorrectinstead.- Parameters:
actual_paths – A list of paths for text files.
ref_paths – A list of names of the matching reference files. The location of the reference files is determined by the configuration via
set_data_location().kind – See
assertStringCorrectfor details.lstrip – See
assertStringCorrectfor details.rstrip – See
assertStringCorrectfor details.ignore_substrings – See
assertStringCorrectfor details.ignore_patterns – See
assertStringCorrectfor details.remove_lines – See
assertStringCorrectfor details.preprocess – See
assertStringCorrectfor details.max_permutation_cases – See
assertStringCorrectfor details.encodings – Optional list of character encodings, one per file.
Note
ignore_linesis a legacy alias forremove_lines.assertFilesCorrectis a legacy alias for this method.
- assertOnDiskDataFrameCorrect(actual_path, ref_path, kind='parquet', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)
Check that a DataFrame stored on disk (as a parquet or CSV file) matches a reference DataFrame, also stored on disk.
- Parameters:
actual_path – Path to the actual serialized DataFrame.
ref_path – Path to the reference serialized DataFrame. The location of the reference file is determined by the configuration via
set_data_location().kind – Optional reference kind (a string), used to locate the reference file.
csv_read_fn –
Optional function to read a CSV file to obtain a DataFrame. If
None, a default CSV loader is used.The default CSV loader is a wrapper around
pd.read_csv()with the following options:index_colisNoneinfer_datetime_formatisTruequotecharis"quotingiscsv.QUOTE_MINIMALescapecharis\(backslash)na_valuesare the empty string,"NaN"and"NULL"keep_default_naisFalse
check_data – See
assertDataFramesEquivalentfor details.check_types – See
assertDataFramesEquivalentfor details.check_order – See
assertDataFramesEquivalentfor details.check_extra_cols – See
assertDataFramesEquivalentfor details.sortby – See
assertDataFramesEquivalentfor details.condition – See
assertDataFramesEquivalentfor details.precision – See
assertDataFramesEquivalentfor details.type_matching – See
assertDataFramesEquivalentfor details.fuzzy_nulls – See
assertDataFramesEquivalentfor details.engine – See
assertDataFramesEquivalentfor details.**kwargs – Additional keyword arguments passed to
csv_read_fn.
Note
assertOnDiskDataFrameCorrectis a legacy alias forassertStoredDataFrameCorrect.Note
If the format is CSV, the CSV file is loaded as a DataFrame using the default engine, or whichever is supplied (pandas or polars), with tdda.serial.csv_to_polars or tdda.serial.csv_to_pandas.
- assertOnDiskDataFramesCorrect(actual_paths, ref_paths, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)
Check that a set of serialized DataFrames in files match corresponding reference ones.
- Parameters:
actual_paths – List of paths to actual serialized DataFrames (parquet or CSV).
ref_paths – List of paths to matching reference serialized DataFrames (parquet or CSV). The location of the reference files is determined by the configuration via
set_data_location().kind – Optional reference kind (a string), used to locate the reference files.
csv_read_fn –
Optional function to read a CSV file to obtain a DataFrame. If
None, a default CSV loader is used.The default CSV loader is a wrapper around
pd.read_csv()with the following options:index_colisNoneinfer_datetime_formatisTruequotecharis"quotingiscsv.QUOTE_MINIMALescapecharis\(backslash)na_valuesare the empty string,"NaN"and"NULL"keep_default_naisFalse
check_data – See
assertDataFramesEquivalentfor details.check_types – See
assertDataFramesEquivalentfor details.check_order – See
assertDataFramesEquivalentfor details.check_extra_cols – See
assertDataFramesEquivalentfor details.sortby – See
assertDataFramesEquivalentfor details.condition – See
assertDataFramesEquivalentfor details.precision – See
assertDataFramesEquivalentfor details.type_matching – See
assertDataFramesEquivalentfor details.fuzzy_nulls – See
assertDataFramesEquivalentfor details.engine – See
assertDataFramesEquivalentfor details.**kwargs – Additional keyword arguments passed to
csv_read_fn.
Note
assertOnDiskDataFramesCorrectis a legacy alias forassertStoredDataFramesCorrect.Note
If the format is CSV, the CSV file is loaded as a DataFrame using the default engine, or whichever is supplied (pandas or polars), with tdda.serial.csv_to_polars or tdda.serial.csv_to_pandas.
- assertStoredDataFrameCorrect(actual_path, ref_path, kind='parquet', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)
Check that a DataFrame stored on disk (as a parquet or CSV file) matches a reference DataFrame, also stored on disk.
- Parameters:
actual_path – Path to the actual serialized DataFrame.
ref_path – Path to the reference serialized DataFrame. The location of the reference file is determined by the configuration via
set_data_location().kind – Optional reference kind (a string), used to locate the reference file.
csv_read_fn –
Optional function to read a CSV file to obtain a DataFrame. If
None, a default CSV loader is used.The default CSV loader is a wrapper around
pd.read_csv()with the following options:index_colisNoneinfer_datetime_formatisTruequotecharis"quotingiscsv.QUOTE_MINIMALescapecharis\(backslash)na_valuesare the empty string,"NaN"and"NULL"keep_default_naisFalse
check_data – See
assertDataFramesEquivalentfor details.check_types – See
assertDataFramesEquivalentfor details.check_order – See
assertDataFramesEquivalentfor details.check_extra_cols – See
assertDataFramesEquivalentfor details.sortby – See
assertDataFramesEquivalentfor details.condition – See
assertDataFramesEquivalentfor details.precision – See
assertDataFramesEquivalentfor details.type_matching – See
assertDataFramesEquivalentfor details.fuzzy_nulls – See
assertDataFramesEquivalentfor details.engine – See
assertDataFramesEquivalentfor details.**kwargs – Additional keyword arguments passed to
csv_read_fn.
Note
assertOnDiskDataFrameCorrectis a legacy alias forassertStoredDataFrameCorrect.Note
If the format is CSV, the CSV file is loaded as a DataFrame using the default engine, or whichever is supplied (pandas or polars), with tdda.serial.csv_to_polars or tdda.serial.csv_to_pandas.
- assertStoredDataFramesCorrect(actual_paths, ref_paths, kind='csv', csv_read_fn=None, check_data=None, check_types=None, check_order=None, condition=None, sortby=None, precision=None, type_matching=None, fuzzy_nulls=False, engine=None, **kwargs)
Check that a set of serialized DataFrames in files match corresponding reference ones.
- Parameters:
actual_paths – List of paths to actual serialized DataFrames (parquet or CSV).
ref_paths – List of paths to matching reference serialized DataFrames (parquet or CSV). The location of the reference files is determined by the configuration via
set_data_location().kind – Optional reference kind (a string), used to locate the reference files.
csv_read_fn –
Optional function to read a CSV file to obtain a DataFrame. If
None, a default CSV loader is used.The default CSV loader is a wrapper around
pd.read_csv()with the following options:index_colisNoneinfer_datetime_formatisTruequotecharis"quotingiscsv.QUOTE_MINIMALescapecharis\(backslash)na_valuesare the empty string,"NaN"and"NULL"keep_default_naisFalse
check_data – See
assertDataFramesEquivalentfor details.check_types – See
assertDataFramesEquivalentfor details.check_order – See
assertDataFramesEquivalentfor details.check_extra_cols – See
assertDataFramesEquivalentfor details.sortby – See
assertDataFramesEquivalentfor details.condition – See
assertDataFramesEquivalentfor details.precision – See
assertDataFramesEquivalentfor details.type_matching – See
assertDataFramesEquivalentfor details.fuzzy_nulls – See
assertDataFramesEquivalentfor details.engine – See
assertDataFramesEquivalentfor details.**kwargs – Additional keyword arguments passed to
csv_read_fn.
Note
assertOnDiskDataFramesCorrectis a legacy alias forassertStoredDataFramesCorrect.Note
If the format is CSV, the CSV file is loaded as a DataFrame using the default engine, or whichever is supplied (pandas or polars), with tdda.serial.csv_to_polars or tdda.serial.csv_to_pandas.
- assertStringCorrect(string, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0)
Check that an in-memory string matches the contents from a reference text file.
- Parameters:
string – The actual string.
ref_path – The name of the reference file. The location of the reference file is determined by the configuration via
set_data_location().kind – The reference kind, used to locate the reference file.
lstrip – If
True, whitespace is stripped from the start of each line before comparison.rstrip – If
True, whitespace is stripped from the end of each line before comparison.ignore_substrings – An optional list of substrings; lines containing any of these substrings will be ignored in the comparison.
ignore_patterns – An optional list of regular expressions; lines will be considered the same if they differ only in substrings that match one of these expressions. Expressions should only include explicit anchors if they need to refer to the whole line. Only the matched portion is ignored; any text to the left or right must be identical in both strings.
remove_lines – An optional list of substrings; lines containing any of these substrings will be removed before comparison.
preprocess – An optional function that takes a list of strings and preprocesses it; applied to both the actual and expected strings before comparison.
max_permutation_cases – An optional number specifying the maximum number of permutations to allow; if the actual and expected lists differ only in line order, and the number of such permutations does not exceed this limit, the two are considered identical.
Note
The
ignore_linesparameter is a backwards-compatible alias forremove_lines.
- assertTextFileCorrect(actual_path, ref_path, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0, encoding=None)
Check that a text file matches the contents from a reference text file.
For CSV files, use
assertStoredDataFrameCorrectinstead.- Parameters:
actual_path – Path to the actual text file.
ref_path – The name of the reference file. The location of the reference file is determined by the configuration via
set_data_location().kind – See
assertStringCorrectfor details.lstrip – See
assertStringCorrectfor details.rstrip – See
assertStringCorrectfor details.ignore_substrings – See
assertStringCorrectfor details.ignore_patterns – See
assertStringCorrectfor details.remove_lines – See
assertStringCorrectfor details.preprocess – See
assertStringCorrectfor details.max_permutation_cases – See
assertStringCorrectfor details.encoding – Optional character encoding for reading the file.
Note
ignore_linesis a legacy alias forremove_lines.assertFileCorrectis a legacy alias for this method.
- assertTextFilesCorrect(actual_paths, ref_paths, kind=None, lstrip=False, rstrip=False, ignore_substrings=None, ignore_patterns=None, remove_lines=None, ignore_lines=None, preprocess=None, max_permutation_cases=0, encodings=None)
Check that a collection of text files match the contents from a matching collection of reference text files.
For CSV files, use
assertStoredDataFramesCorrectinstead.- Parameters:
actual_paths – A list of paths for text files.
ref_paths – A list of names of the matching reference files. The location of the reference files is determined by the configuration via
set_data_location().kind – See
assertStringCorrectfor details.lstrip – See
assertStringCorrectfor details.rstrip – See
assertStringCorrectfor details.ignore_substrings – See
assertStringCorrectfor details.ignore_patterns – See
assertStringCorrectfor details.remove_lines – See
assertStringCorrectfor details.preprocess – See
assertStringCorrectfor details.max_permutation_cases – See
assertStringCorrectfor details.encodings – Optional list of character encodings, one per file.
Note
ignore_linesis a legacy alias forremove_lines.assertFilesCorrectis a legacy alias for this method.
- set_data_location(location, kind=None)
Declare the filesystem location for reference files of a particular kind for this instance.
Overrides any global defaults set via
ReferenceTest.set_default_data_location().If an assertion is made for a kind whose location has not been defined explicitly, the default location (declared for kind
None) is used. This default must be specified. If it is not set and relative pathnames are used, an exception is raised.- Parameters:
location – Filesystem path to the directory containing reference files of this kind.
kind – The reference kind this location applies to.
Nonesets the default location used when no specific kind is matched.
- classmethod set_default_data_location(location, kind=None)
Declare the default filesystem location for reference files of a particular kind, applying to all instances of the class.
Subclasses inherit this default unless they explicitly override it. To set the location globally for all test classes in an application, call this on the
ReferenceTestclass directly.Use the instance method
set_data_location()to set per-kind locations for an individual instance.If an assertion is made for a kind whose location has not been defined explicitly, the default location (declared for kind
None) is used. This default must be specified. If it is not set and relative pathnames are used, an exception is raised.- Parameters:
location – Filesystem path to the directory containing reference files of this kind.
kind – The reference kind this location applies to.
Nonesets the default location used when no specific kind is matched.
- classmethod set_defaults(**kwargs)
Set default parameters at the class level, applying to all instances.
- Parameters:
**kwargs –
Keyword arguments. Supported keys are:
verbose: Boolean flag controlling reporting of errors while running tests. Reference tests tend to take longer than traditional unit tests, so seeing failures as they happen is often useful. Default isTrue.print_fn: Function to use to display information while running tests. Must have the same signature as Python’s built-inprint. Defaults to unbuffered output tosys.stdout.tmp_dir: Directory where temporary files are written. Temporary files are created when a text file check fails and apreprocessfunction has been specified, so the preprocessed versions can be inspected. If not set, theTDDA_FAIL_DIRenvironment variable is used, ortempfile.gettempdir()as a fallback.
- classmethod set_regeneration(kind=None, regenerate=True)
Set the regeneration flag for a particular kind of reference file, globally, for all instances of the class.
If the regenerate flag is set to
True, then the framework will regenerate reference data of that kind, rather than comparing.All of the regeneration flags are set to False by default.
- tdda.referencetest.referencetest.tag(test)
Decorator for tests, so that you can specify you only want to run a tagged subset of tests, with the -1 or –tagged option.
unittest Framework Support
This module provides the
ReferenceTestCase class,
which extends the
standard unittest.TestCase test-case class, augmenting it
with methods for checking correctness of files against reference data.
It also provides a main() function, which can be used to run (and
regenerate) reference tests which have been implemented using subclasses
of ReferenceTestCase.
For example:
from tdda.referencetest import ReferenceTestCase
import my_module
class TestMyClass(ReferenceTestCase):
def test_my_csv_function(self):
result = my_module.my_csv_function(self.tmp_dir)
self.assertCSVFileCorrect(result, 'result.csv')
def test_my_pandas_dataframe_function(self):
result = my_module.my_dataframe_function()
self.assertDataFrameCorrect(result, 'result.csv')
def test_my_table_function(self):
result = my_module.my_table_function()
self.assertStringCorrect(result, 'table.txt', kind='table')
def test_my_graph_function(self):
result = my_module.my_graph_function()
self.assertStringCorrect(result, 'graph.txt', kind='graph')
TestMyClass.set_default_data_location('testdata')
if __name__ == '__main__':
ReferenceTestCase.main()
Tagged Tests
If the tests are run with the --tagged or -1 (the digit one)
command-line option, then only tests that have been decorated with
referencetest.tag, are run. This is a mechanism for allowing
only a chosen subset of tests to be run, which is useful during
development. The @tag decorator can be applied to either test
classes or test methods.
If the tests are run with the --istagged or -0 (the digit
zero) command-line option, then no tests are run; instead, the
framework reports the full module names of any test classes that have
been decorated with @tag, or which contain any tests that have been
decorated with @tag.
For example:
from tdda.referencetest import ReferenceTestCase, tag
import my_module
class TestMyClass1(ReferenceTestCase):
@tag
def test_a(self):
...
def test_b(self):
...
@tag
class TestMyClass2(ReferenceTestCase):
def test_x(self):
...
def test_y(self):
...
If run with python mytests.py --tagged, only the tagged tests are
run (TestMyClass1.test_a, TestMyClass2.test_x and
TestMyClass2.test_y).
The -9 (or --untag) option removes all @tag decorators from
test source files, allowing a clean slate.
Failure Logging and Auto-tagging
Running with -F (or --log-failures) causes the names of any
failing or erroring tests to be logged to a file. After the run,
the tdda tag command reads that log and adds @tag decorators
to the failing tests automatically.
A typical unittest-style workflow for focusing on failures is:
python tests.py -9 # Remove any existing @tag decorators
python tests.py -F # Run tests, logging failures
tdda tag # Add @tag to failing tests
python tests.py -1 # Run only tagged (failing) tests
When all tests pass, clean up the tags:
python tests.py -9
The equivalent workflow with pytest:
pytest --untag # Remove any existing @tag decorators
pytest --log-failures # Run tests, logging failures
tdda tag # Add @tag to failing tests
pytest --tagged # Run only tagged (failing) tests
When all tests pass, clean up the tags:
pytest --untag
Regeneration of Results
When its main is run with --write-all or --write (or -W or -w
respectively), it causes the framework to regenerate reference data
files. Different kinds of reference results can be regenerated by
passing in a comma-separated list of kind names immediately after
the --write option. If no list of kind names is provided, then all
test results will be regenerated.
To regenerate all reference results (or generate them for the first time)
pytest -s --write-all
To regenerate just a particular kind of reference (e.g. table results)
python my_tests.py --write table
To regenerate a number of different kinds of reference (e.g. both table and graph results)
python my_tests.py --write table graph
- class tdda.referencetest.referencetestcase.ReferenceTestCase(*args, **kwargs)
Wrapper around the
ReferenceTestclass to allow it to operate as a test-case class using theunittesttesting framework.The
ReferenceTestCaseclass is a mix-in ofunittest.TestCaseandReferenceTest, so it can be used as the base class for unit tests, allowing the tests to use any of the standardunittestassert methods, and also use any of thereferencetestassert extensions.- static main(module=None, argv=None, testtdda=False, report=None, **kw)
Wrapper around the
unittest.main()entry point.This is the same as the
main()function, and is provided just as a convenience, as it means that tests using theReferenceTestCaseclass only need to import that single class on its own.
- tag()
Decorator for tests, so that you can specify you only want to run a tagged subset of tests, with the -1 or –tagged option.
- tdda.referencetest.referencetestcase.main()
Wrapper around the
unittest.main()entry point.
pytest Framework Support
This provides all of the methods in the
ReferenceTest class,
in a way that allows them to be used as pytest fixtures.
This allows these functions to be called from tests running from the
pytest framework.
For example:
import my_module
def test_my_csv_function(ref):
resultfile = my_module.my_csv_function(ref.tmp_dir)
ref.assertCSVFileCorrect(resultfile, 'result.csv')
def test_my_pandas_dataframe_function(ref):
resultframe = my_module.my_dataframe_function()
ref.assertDataFrameCorrect(resultframe, 'result.csv')
def test_my_table_function(ref):
result = my_module.my_table_function()
ref.assertStringCorrect(result, 'table.txt', kind='table')
def test_my_graph_function(ref):
result = my_module.my_graph_function()
ref.assertStringCorrect(result, 'graph.txt', kind='graph')
class TestMyClass:
def test_my_other_table_function(ref):
result = my_module.my_other_table_function()
ref.assertStringCorrect(result, 'table.txt', kind='table')
with a conftest.py containing:
from tdda.referencetest.pytestconfig import (pytest_addoption,
pytest_collection_modifyitems,
set_default_data_location,
ref)
set_default_data_location('testdata')
This configuration enables the additional command-line options,
and also provides a ref fixture, as an instance of the
ReferenceTest class.
Of course, for brevity, if you prefer, you can use:
from tdda.referencetest.pytestconfig import *
rather than importing the four individual items if you are not customising anything yourself, but that is less flexible.
This example also sets a default data location which will apply to
all reference fixtures. This means that any tests that use ref will
automatically be able to locate their "expected results" reference data
files.
Reference Fixtures
The default configuration provides a single fixture, ref.
To configure a large suite of tests so that tests do not all have to
share a single common reference-data location, you can set up additional
reference fixtures, configured differently. For example, to set up a fixure
ref_special, whose reference data is stored in ../specialdata, you
could include:
@pytest.fixture(scope='module')
def ref_special(request):
r = referencepytest.ref(request)
r.set_data_location('../specialdata')
return r
Tests can use this additional fixture:
import my_special_module
def test_something(ref_special):
result = my_special_module.something()
ref_special.assertStringCorrect(resultfile, 'something.csv')
Tagged Tests (pytest)
If the tests are run with the --tagged
command-line option, then only tests that have been decorated with
referencetest.tag, are run. This is a mechanism for allowing
only a chosen subset of tests to be run, which is useful during
development. The @tag decorator can be applied to test functions,
test classes and test methods.
If the tests are run with the --istagged command-line option,
then no tests are run; instead, the
framework reports the full module names of any test classes or functions
that have been decorated with @tag, or classes which contain any
test methods that have been decorated with @tag.
For example:
from tdda.referencetest import tag
@tag
def test_a(ref):
assert 'a' + 'a' == 'aa'
def test_b(ref):
assert 'b' * 2 == 'bb'
@tag
class TestMyClass:
def test_x(self):
list('xxx') == ['x', 'x', 'x']
def test_y(self):
'y'.upper() == 'Y'
If run with pytest --tagged, only the tagged tests are
run (test_a, TestMyClass.test_x and TestMyClass.test_y).
Regeneration of Results (pytest)
When pytest is run with --write-all or --write, it causes
the framework to regenerate reference data files. Different kinds of
reference results can be regenerated by passing in a comma-separated list
of kind names immediately after the --write option. If no list
of kind names is provided, then all test results will be regenerated.
If the -s option is also provided (to disable pytest
output capturing), it will report the names of all the files it has
regenerated.
To regenerate all reference results (or generate them for the first time)
pytest -s --write-all
To regenerate just a particular kind of reference (e.g. table results)
pytest -s --write table
To regenerate a number of different kinds of reference (e.g. both table and graph results)
pytest -s --write table graph
pytest Integration Details
In addition to all of the methods from
ReferenceTest,
the following functions are provided, to allow easier integration
with the pytest framework.
Typically your test code would not need to call any of these methods
directly (apart from set_default_data_location()), as they are
all enabled automatically if you import the default ReferenceTest
configuration into your conftest.py file:
from tdda.referencetest.pytestconfig import *
- tdda.referencetest.referencepytest.ref(request)
Support for dependency injection via a
pytestfixture.A test's
conftest.pyshould define a fixture function for injecting aReferenceTestinstance, which should just call this function.This allows tests to get access to a private instance of that class.
- tdda.referencetest.referencepytest.set_default_data_location(location, kind=None)
This provides a mechanism for setting the default reference data location in the
ReferenceTestclass.It takes the same parameters as
tdda.referencetest.referencetest.ReferenceTest.set_default_data_location().If you want the same data locations for all your tests, it can be easier to set them with calls to this function, rather than having to set them explicitly in each test (or using
set_data_location()in your@pytest.fixturerefdefinition in yourconftest.pyfile).
- tdda.referencetest.referencepytest.set_defaults(**kwargs)
This provides a mechanism for setting default attributes in the
ReferenceTestclass.It takes the same parameters as
tdda.referencetest.referencetest.ReferenceTest.set_defaults(), and can be used for setting parameters such as thetmp_dirproperty.If you want the same defaults for all your tests, it can be easier to set them with a call to this function, rather than having to set them explicitly in each test (or in your
@pytest.fixturerefdefinition in yourconftest.pyfile).
Reference Test Examples
The tdda.referencetest module includes a set of examples,
for both unittest and pytest.
To copy these examples, run the command:
tdda examples referencetest [directory]
If directory is not supplied referencetest-examples will be used.
Alternatively, you can copy all examples using the following command:
tdda examples
which will create a number of separate subdirectories.