.. tdda documentation master file, created by sphinx-quickstart on Tue Feb 14 15:12:30 2017. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Test-Driven Data Analysis (Python TDDA library) =============================================== Version |version|. (`Installation `_) The TDDA module helps with the testing of data and of code that manipulates data. The major components of the TDDA module are: .. image:: image/tdda-six-features.png :alt: Top Line: Machines illustrating the constraint discovering functionality, which takes data in and produces constraints as output; rexpy, which takes strings in and produces regular expressions as output, and gentest, which takes code in and produces tests as output. Bottom Line: tdda diff (data-frame comparison), tdda.serial (metadata for CSV and other flat files), and tdda.ReferenceTest, for semantic testing of complex results. * **Data Validation and Automatic Constraint Generation:** The package includes command-line tools and API calls for - *discovery* of constraints that are satisfied by (example) data --- ``tdda discover``; - *verification* that a dataset satisfies a set of constraints. The constraints can have been generated automatically, constructed manually, or (most commonly) consist of generated constraints that have been subsequently refined by hand --- ``tdda verify``; - *detection* of records, fields and values that fail to satisify constraints (anomaly detection) --- ``tdda detect``. Supported data sources include parquet files, database tables, and flat (CSV) files. * **Reference Testing:** The TDDA library offers extensions to ``unittest`` and ``pytest`` for managing the testing of data analysis pipelines, where the results are typically much larger, and more complex, and more variable than for many other sorts of programs. * **Inference of Regular Expressions from Examples:** There is a command-line tool (and API) for automatically inferring `regular expressions `_ from (structured) textual data --- ``rexpy``. This was developed as part of constraint generation, but has broader utility. * **Automatic Test Generation:** The TDDA library includes the ability to generate tests for almost any command-line based program or script. The code to be tested can take the form of a shell script or any other command-line code, and can be written in any language or mix of languages. * **Metadata tools for Flat Files:** The ``tdda.serial`` modules and ``tdda serial`` command assist with more reliable reading and writing of flat files (CSV etc.) using metadata, both in ``tdda``‘s own format, and with the `CSVW `_ (``.csvw``) and `Frictionless `_ formats. * **DataFrame Diff utilities:** A new ``tdda diff`` utility allows parameterized difference detection between data frames stored in parquet files and flat file. The ``tdda`` library serves as a concrete implementation of the ideas discussed in: * `Test-Driven Data Analysis, `_ by Nicholas J. Radcliffe, CRC Press (book, available from all good booksellers and all sellers of good books). * the `Test-Driven Data Analysis `_ blog. When installed, the module offers a suite of command-line tools that can be used with data from any source, not just Python. It also provides enhanced test methods for Python code, and the *Gentest* functionality enables automatic generation of test programs for arbitrary code (not just Python code). There is also a full Python API for all functionality. *Test-driven data analysis* is closely related to `reproducible research `_, but with more of a focus on automated testing. It is best seen as overlapping and partly complementary to reproducible research. Contents ======== .. toctree:: :maxdepth: 2 overview.md installation.md constraints.md gentest.md referencetest.md rexpy-regex.md tddadiff.md serialformat.md constraints-api.md rexpy-api.md serial-api.md utils-api.md cli.md configuration.md windows.md tests.md examples.md changes.md Resources ========= * `Talks & Filmed Tutorials about TDDA etc (Nick Radcliffe) `_ * `TDDA Library (PyCon DE, Eberhard Hansis, 2019) `_ * `Tutorial Video Screencasts on Exercises `_ * `Tutorials YouTube Channel `_ * `Paper: Automatic Constraint Generation and Verification `_ * `1-page summary of ideas `_ * `Quick-reference Guide / Cheat Sheet `_ * `TDDA Blog `_ * `Mastodon @tdda@mathstodon.xyz `_ * `Source Repository (Github) `_ Indexes and Search ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`