The tdda package provides Python support for test-driven data analysis (see 1-page summary with references, or the blog)

  • The tdda.referencetest library is used to support the creation of reference tests, based on either unittest or pytest.

  • The tdda.constraints library is used to discover constraints from a (Pandas) DataFrame, write them out as JSON, and to verify that datasets meet the constraints in the constraints file. It also supports tables in a variety of relation databases. There is also a command-line utility for discovering and verifying constraints, and detecting failing records.

  • The tdda.rexpy library is a tool for automatically inferring regular expressions from a column in a Pandas DataFrame or from a (Python) list of examples. There is also a command-line utility for Rexpy.

Although the library is provided as a Python package, and can be called through its Python API, it also provides command-line tools.