Test-Driven Data Analysis (Python TDDA library)¶
Version 2.0.00. (Installation)
The TDDA module helps with the testing of data and of code that manipulates data. It serves as a concrete implementation of the ideas discussed on the test-driven data analysis blog. When installed, the module offers a suite of command-line tools that can be used with data from any source, not just Python. It also provideds enhanced test methods for Python code, and the new Gentest functionality enables automatic generation of test programs for arbitrary code (not just Python code). There is also a full Python API for all functionality.
Test-driven data analysis is closely related to reproducible research, but with more of a focus on automated testing. It is best seen as overlapping and partly complementary to reproducible research.
The major components of the TDDA module are:
- Automatic Constraint Generation and Verification: The package includes command-line tools and API calls for
- discovery of constraints that are satisified by (example)
- verification that a dataset satisfies a set of constraints.
The constraints can have been generated automatically,
constructed manually, or (most commonly) consist of
generated constraints that have been
subsequently refined by hand —
- detection of records, fields and values that fail to satisify
constraints (anomaly detection) —
- discovery of constraints that are satisified by (example) data —
- Reference Testing: The TDDA library offers extensions to
pytestfor managing the testing of data analysis pipelines, where the results are typically much larger, and more complex, and more variable than for many other sorts of programs.
- Automatic Generation of Regular Expressions from Examples: There
is command-line tool (and API) for automatically inferring
from (structured) textual data —
rexpy. This was developed as part of constraint generation, but has broader utility.
- Automatic Test Generation (Experimental): From version 2.0 on, the TDDA library also includes experimental features for automatically generating tests for almost any command-line based program or script. The code to be tested can take the form of a shell script or any other command-line code, and can be written in any language or mix of languages.
- Automatic Constraint Generation, Data Verification & Anomaly Detection
- Gentest: Automatic Test Generation for Unix & Linux Commands/Scripts
- Reference Tests
- TDDA’s Constraints API
- TDDA’s API for Rexpy
- Microsoft Windows Configuration
- Recent Changes
- Talks & Filmed Tutorials about TDDA etc (Nick Radcliffe)
- TDDA Library (PyCon DE, Eberhard Hansis, 2019)
- Tutorial Video Screencasts on Exercises
- Tutorials YouTube Channel
- Paper: Automatic Constraint Generation and Verification
- 1-page summary of ideas
- Quick-reference Guide / Cheat Sheet
- TDDA Blog
- Twitter tdda0
- Slack (mail/DM on twitter for invitation)
- Source Repository (Github)