# Command Line Reference ## `tdda` ### NAME `tdda` — test-driven data analysis ### SYNOPSIS ``` tdda discover Generate constraints for data validation tdda verify Verify (validate) data against constraints tdda detect Detect data that fails constraints tdda examples Copy the tdda example data and code tdda gentest Auto-generate Python tests for code in any language tdda diff Find difference in datasets in parquet or CSV files tdda ls List fields in a dataset tdda cat Display rows from a dataset as a rich table tdda head Display the first N rows of a dataset tdda tail Display the last N rows of a dataset tdda sample Display N random rows from a dataset tdda serial Convert or infer flat-file metadata in tdda.serial, CSVW, or Frictionless formats tdda tag Tag tests that failed in the last reference test run tdda config Show TDDA configuration tdda version Print the TDDA version number tdda help Print this help tdda help COMMAND Print help on COMMAND (e.g. discover, verify) tdda installman Install tdda man pages tdda test Run the tdda library's self-tests. ``` ### OPTIONS `-v`, `--version` Print version number (same as tdda version) `-h`, `-?`, `--help` Print this help ### SEE ALSO `rexpy(1)`, `tdda-installman(1)` [TDDA Book](https://book.tdda.info) --- ## `tdda discover` ### NAME `tdda discover` — automatically generate constraints for data ### SYNOPSIS ``` tdda discover [-h] [-?] [-7] [--no-config] [--colour] [--no-colour] [-x] [-X] [-g] [-G] [-r REPORT ...] [-o REPORT_PATH] [--no-md] [--allowed] [--no-allowed] [--required] [--no-required] [--no-ar] [--pandas] [--polars] [--backend BACKEND] INPUT [CONSTRAINTS] ``` ### POSITIONAL ARGUMENTS *INPUT* is one of: - a CSV file or other flat file (e.g. `.csv`, `.txt`, `.psv`), optionally using `:` format to specify flat-file metadata (see the help for `tdda serial`) - a data frame in a Parquet file (`.parquet`) e.g. from pandas, polars, R - a table from PostgreSQL databases (e.g. `postgres:tablename`) - a table from MySQL databases (e.g. `mysql:tablename`) - a table from SQLite databases (e.g. `sqlite:tablename`) - Standard input (stdin): Use `-` to read from stdin (Use `tdda help serial`, `tdda serial --help`, or `man tdda-serial` for more information.) *CONSTRAINTS* Name of the (JSON) constraints file to create. - Will use `.tdda` extension if no extension is specified. - Can be missing or `-` to write to standard output. ### DESCRIPTION The `tdda discover` command is used to find constraints that are satisfied (in most cases) by the input ("training") data provided. ### OPTIONS The following options are available. `*` indicates options that are the default behaviours `-h`, `--help` Show this help message and exit `-?`, `--?` Same as `-h` or `--help` `-7`, `--ascii` Report without using special characters `-N`, `--no-config` Skip loading `~/.tdda.toml` `--colour` Use colour in terminal output * `--no-colour` Do not use colour in terminal output `-x`, `--rex` Include regular expression generation `-X`, `--no-rex` Exclude regular expression generation * `-g`, `--group-rex` Group regular expression generation `-G`, `--no-group-rex` Do not group regular expression generation * `-r`, `--report` [*REPORT* ...] Report formats to write, space-separated. Formats: `html`, `md` (`markdown`), `txt` (`text`), `json`, `yaml`, `toml`. The stem of the output file is taken from *REPORT_PATH* if `-o` is given, otherwise from *CONSTRAINTS*. `-o`, `--report-path` *REPORT_PATH* Stem path for report files (extension is replaced by the format). `--no-md` Do not create metadata in constraints file `--allowed` Create allowed-fields constraint (default) `--no-allowed` Do not create allowed-fields constraint `--required` Create required-fields constraint (default) `--no-required` Do not create required-fields constraint `--no-allowed-required` Same as `--no-allowed --no-required` `--no-ar` Same as `--no-allowed --no-required` `--pandas`, `--pd` Use Pandas as DataFrame engine. * `--polars`, `--pl` Use Polars as DataFrame engine. `--backend`, `-B` *BACKEND* Backend choice for Pandas (when dataframe engine is Pandas) `n` for numpy_nullable * `a` for pyarrow `o` for original. ### EXAMPLES The example data can be obtained by running 'tdda examples', which will create various directories, including constraints_examples, containing the source data for these examples. 1) `tdda discover elements.parquet elements.tdda` This command will read data from elements.parquet and (attempt to) find constraints satisfied by every record, and the data collectively. By default this can include minimum and maximum constraints on field values or lengths, nullability constraints, uniqueness constraints, sign constraints, and allow-values constraints. The results will be written to `elements.tdda` in a JSON format, including metadata. The output constraints file, `elements.tdda` can be used with `tdda verify` to verify that another dataset with the same structure satisfies the constraints, or with `tdda detect` to find which records and/or values fail to satisfy the constraints. The `.tdda` file can be edited (carefully) by hand, or programmatically, to add, remove, tighten, or loosen constraints. 2) `tdda discover elements.csv` This command is almost the same as the first except that it reads data from the CSV file specified, and writes the constraints to the screen (standard output). The CSV structure and field types will normally be inferred (possibly incorrectly) by TDDA, and if the inference is bad, the command may fail. If you use: `tdda discover elements.csv:format.serial` metadata in `format.serial` will be used to guide the DataFrame creation. If you use `tdda discover elements.csv:` it will look for any associated metadata for `elements.csv` using naming conventions described in the help for `tdda serial`. 3) `tdda discover --rex md.serial:elements.parquet` This is similar to the last two except that: - regular expression inference is requested (`--rex`) for text fields. Rexpy will be used to attempt to infer one or a few regular expressions that characterize each field in the input data. - a metadata file to be used to interpret the `.csv` file is provided explicitly. 4) `tdda discover elements.parquet elements.tdda -r html -o elements` This discovers constraints as in example 1, and also writes an HTML report to `elements.html`. 5) `tdda discover elements.parquet elements.tdda -r md json txt -o elements` This discovers constraints as in example 1, and also writes reports to `elements.md`, `elements.json`, and `elements.txt`. 6) `tdda discover --rex postgres:elements` This is similar again except that now the postgres:specifier will be interpreted as a database connection file in the user's home directory, with the name `~/.dbCredential.postgres`. This file should contain connection information for a supported database. The extension `.postgres` does not itself mean that this is a PostgreSQL database, though that is a common convention. Use one of `tdda help db` `tdda help database` to get help with the database connection file format. ### SEE ALSO `tdda-verify(1)`, `tdda-detect(1)`, `tdda-serial(1)` [Test Driven Data Analysis](https://book.tdda.info), book by Nicholas J. Radcliffe, chapters 2-7. --- ## `tdda verify` ### NAME `tdda verify` — Verify that constraints are satisfied by data ### SYNOPSIS ``` tdda verify [-h] [-?] [-7] [--no-config] [--colour] [--no-colour] [--epsilon EPSILON] [-a] [-f] [--dense] [-t {strict,loose}] [--verify-required-fields] [--verify-allowed-fields] [--no-verify-required-fields] [--no-verify-allowed-fields] [--varf] [--no-varf] [--pandas] [--polars] [--backend BACKEND] INPUT [CONSTRAINTS] ``` ### POSITIONAL ARGUMENTS *INPUT* is one of: - a CSV file or other flat file (e.g. `.csv`, `.txt`, `.psv`), optionally using `:` format to specify flat-file metadata (see the help for `tdda serial`) - a data frame in a Parquet file (`.parquet`) e.g. from pandas, polars, R - a table from PostgreSQL databases (e.g. `postgres:tablename`) - a table from MySQL databases (e.g. `mysql:tablename`) - a table from SQLite databases (e.g. `sqlite:tablename`) - Standard input (stdin): Use `-` to read from stdin *CONSTRAINTS*, if provided, is a JSON `.tdda` file containing constraints. If no constraints file is provided, a file with the same path as the input file, with a `.tdda` extension will be tried. ### DESCRIPTION The `tdda verify` command is used to check that data conforms to the constraints specified. Any constraints not satisfied by the data are reported, together with summary statistics. The `tdda verify` command does *not* report which records and values cause constraints to be violated: the companion command `tdda detect` performs this function. ### OPTIONS `-h`, `--help` Show this help message and exit `-?`, `--?` Same as `-h` or `--help` `-7`, `--ascii` Report without using special characters `-N`, `--no-config` Skip loading `~/.tdda.toml` `--colour` Use colour in terminal output `--no-colour` Do not use colour in terminal output `--epsilon` *EPSILON* Epsilon fuzziness (tolerance for comparisons) `-a`, `--all` Report all fields, even if there are no failures `-f`, `--fields` Report only fields with failures `--dense` Compact output: less vertical space used `-t`, `--type_checking` {*strict*,*loose*} "loose" means consider all numeric types equivalent `--verify-required-fields`, `--vrf` Force verify of required fields `--verify-allowed-fields`, `--vaf` Force verify of allowed fields `--no-verify-required-fields`, `--no-vrf` Force no verication of required fields `--no-verify-allowed-fields`, `--no-vaf` Force no verification of allowed fields `--varf`, `--vraf` Force verification of allowed and required fields `--no-varf`, `--no-vraf` Force no verification of allowed and required fields `--pandas`, `--pd` Use Pandas as DataFrame engine. `--polars`, `--pl` Use Polars as DataFrame engine. `--backend`, `-B` *BACKEND* Backend choice for Pandas (when dataframe engine is Pandas) `n` for numpy_nullable * `a` for pyarrow `o` for original. ### EXAMPLES The example data can be obtained by running `tdda examples`, which will create various directories, including `constraints_examples`, containing source data for these examples. 1) `tdda verify elements.parquet elements.tdda` This command reads data from `elements.parquet` and checks it against the constraints in `elements.tdda`, reporting any constraints that are not satisfied. ### SEE ALSO `tdda-detect(1)`, `tdda-discover(1)`, `tdda-serial(1)` [Test Driven Data Analysis](https://book.tdda.info), book by Nicholas J. Radcliffe, chapters 2-7. --- ## `tdda detect` ### NAME `tdda detect` — Detect data that does not obey supplied constraints ### SYNOPSIS ``` tdda detect [-h] [-?] [-7] [--no-config] [--colour] [--no-colour] [-epsilon EPSILON] [-o REPORT_PATH] [-a] [-f] [-t {strict,loose}] [--write-all-records] [--per-constraint] [--no-per-constraint] [--no-original-fields] [--original-fields] [--no-output-fields] [--output-fields [OUTPUT_FIELDS ...]] [-r [REPORT ...]] [--interleave] [--no-interleave] [--index] [--int] [--key [KEY ...]] [--dense] [--verify-required-fields] [--verify-allowed-fields] [--no-verify-required-fields] [--no-verify-allowed-fields] [--varf] [--no-varf] [--pandas] [--polars] [--backend BACKEND] INPUT [CONSTRAINTS [OUTPUT]] ``` ### POSITIONAL ARGUMENTS *INPUT* is one of: - a CSV file or other flat file (e.g. `.csv`, `.txt`, `.psv`), optionally using `:` format to specify flat-file metadata (see the help for `tdda serial`) - a data frame in a Parquet file (`.parquet`) e.g. from pandas, polars, R - a table from PostgreSQL databases (e.g. `postgres:tablename`) - a table from MySQL databases (e.g. `mysql:tablename`) - a table from SQLite databases (e.g. `sqlite:tablename`) - Standard input (stdin): Use `-` to read from stdin *CONSTRAINTS*, if provided, is a JSON `.tdda` file containing constraints. If no constraints file is provided, a file with the same path as the input file, with a `.tdda` extension will be tried. *OUTPUT* specifies the destination for detected records. This is usually a file if the input was a file (e.g. a `.csv` file or a `parquet` file), but does not have to be the same type. If the input is a database table, the output is always a database table in the same database. ### DESCRIPTION The `tdda detect` command finds and reports data that fails to satisfy the constraints in the *CONSTRAINTS* file specified. It also performs all the same functions as `tdda verify`. ### OPTIONS `-h`, `--help` Show this help message and exit `-?`, `--?` Same as `-h` or `--help` `-7`, `--ascii` Report without using special characters `-N`, `--no-config` Skip loading `~/.tdda.toml` `--colour` Use colour in terminal output `--no-colour` Do not use colour in terminal output `--epsilon` *EPSILON* Epsilon fuzziness (tolerance for comparisons) `-a`, `--all` Report all fields, even if there are no failures `-f`, `--fields` Report only fields with failures `-r`, `--report` [*REPORT* ...] Report formats to write, space-separated. Formats: `html`, `md` (`markdown`), `txt` (`text`), `json`, `yaml`, `toml`. The stem of the output file is taken from *REPORT_PATH* if `-o` is given, otherwise from *OUTPUT*. `-t`, `--type_checking` {*strict*,*loose*} "loose" means consider all numeric types equivalent `-o`, `--report-path` *REPORT_PATH* Stem path for report files (extension is replaced by the format). `--write-all-records` Include passing records `--per-constraint` Write one flag column per failing constraint in addition to n_failures. Set by default. `--no-per-constraint` Do not write out any per-constraint flag columns `--no-original-fields` Do not write out original fields columns `--original-fields` Write out original fields columns (default) `--no-output-fields` Do not write out any original fields in the output. By default, all original columns will be included. `--output-fields` [*OUTPUT_FIELDS* ...] Specify original columns to write out. `--interleave` Interleave ok columns with original fields. `--no-interleave` Do not interleave ok columns with original fields. `--index` Include a row-number index in the output file when detecting. Rows are usually numbered from 1, unless the input file already has an index. `--int` Write out boolean fields as integers, with 1 for true and 0 for false. `--key [KEY ...]` Key or key fields to use when reporting failures `--dense` Compact output: less vertical space used `--verify-required-fields`, `--vrf` Force verify of required fields `--verify-allowed-fields`, `--vaf` Force verify of allowed fields `--no-verify-required-fields`, `--no-vrf` Force no verication of required fields `--no-verify-allowed-fields`, `--no-vaf` Force no verification of allowed fields `--varf`, `--vraf` Force verification of allowed and required fields `--no-varf`, `--no-vraf` Force no verification of allowed and required fields `--pandas`, `--pd` Use Pandas as DataFrame engine. `--polars`, `--pl` Use Polars as DataFrame engine. `--backend`, `-B` *BACKEND* Backend choice for Pandas (when dataframe engine is Pandas) `n` for numpy_nullable * `a` for pyarrow `o` for original. ### EXAMPLES The example data can be obtained by running `tdda examples`, which will create various directories, including `constraints_examples`, containing source data for these examples. 1) `tdda detect elements.parquet elements.tdda elements-failures.parquet` This command reads data from `elements.parquet`, checks it against the constraints in `elements.tdda`, and writes records with one or more constraint failures to `elements-failures.parquet`. 2) `tdda detect elements.parquet elements.tdda elements-failures.parquet -r html -o elements` As above, and also writes an HTML report to `elements.html`. 3) `tdda detect elements.parquet elements.tdda elements-failures.parquet -r md json txt -o elements` As above, and also writes reports to `elements.md`, `elements.json`, and `elements.txt`. ### SEE ALSO `tdda-verify(1)`, `tdda-discover(1)`, `tdda-serial(1)` [Test Driven Data Analysis](https://book.tdda.info), book by Nicholas J. Radcliffe, chapters 2-7. --- ## `tdda diff` ### NAME `tdda diff` — compare csv or parquet files ### SYNOPSIS ``` tdda diff [--fields FIELD1,FIELD2,...] [--xfields FIELD1,FIELD2,... ] [--horizontal] [-H] [--vertical] [-V] [--find-md] [--no-md] [--maxdiffs N] [--key FIELD] [--mono] [--bw] [--colours COLOURS] [-c COLOURS] [--dps N] [--precision N] [--AE] [--LR] [--angles] [--pm] [--prefixes PREFIXES] [-N] [--no-config] [--strict] [--medium] [--loose] [--permissive] LEFT RIGHT ``` ### POSITIONAL ARGUMENTS *LEFT* The first dataset to be compared, as a parquet or flat file (e.g. CSV), optionally using `:` format to specify flat-file metadata (see the help for `tdda serial`). (Normally thought of as left or actual) *RIGHT* The second dataset to be compared as a parquet or flat file (e.g. CSV), optionally using `:` format to specify flat-file metadata (see the help for `tdda serial`). (Normally thought of as right, expected, reference, etc.) ### DESCRIPTION The `tdda diff` command compares two tabular datasets in CSV or Parquet files and shows some or all differences. It uses the same underlying functionality as the `tdda.referencetest` assertions such as `assertDataFramesEqual`, and provides similar control over what differences to consider, e.g. which fields, and strictness of type and numeric comparisons. It also provides a number of options for controlling the display of differences. By default, comparisons are row-based and consider all fields (columns), as typed values after reading. ### OPTIONS `*` indicates options that are the default behaviours `--fields` *FIELD1,FIELD2*,... Check only these fields (comma-separated list) `--xfields` *FIELD1,FIELD2*,... Check all fields except these (comma-separated list) `--horizontal`, `-H` Horizontal display (left and right, side by side) `--vertical`, `-V` Vertical display (left above right) `--find-md` Attempt to find associated metadata for flat files automatically, without requiring `:` colon syntax in the path. `--no-md`, `--no-find-md` Do not attempt to find associated metadata for flat files (default). `--key` *FIELD* Use this field as a join key when reporting differences. `--maxdiffs` *N* Maximum number of differences to show. `--mono` Show monochrome output with different values in bold and shared values dimmed. `--bw` Show black and white output with different values in bold and shared values in the terminal's default style. `--colours` *COLOURS*, `-c` *COLOURS* Use colours specified e.g. `-c red-blue` `--dps` *N* Number of decimal places to show for floating-point values. Also sets precision if not specified separately. `--precision` *N* Precision for floating point comparisons. Two floats `a` and `b` will be considered equal if `abs(a - b) < 1e-`*N*. `--AE` Use `A:` and `E:` as labels for the two datasets (actual/expected) `--LR` Use `L:` and `R:` as labels for the two datasets (left/right) `--angles` Use `<` and `>` as labels for the two datasets `--pm` Use `+` and `-` as labels for the two datasets `--prefixes` *PREFIXES* Use prefixes specified as labels for the two datasets e.g. `--prefixes "actual:-ref:"` or `"actual: -ref: "` to include spaces `-N`, `--no-config` Use default configuration (ignore `~/.tdda.toml`) `--strict` Use strict type comparisons `--medium` Use medium-strictness type comparisons `--loose` Use loose (permissive) type comparisons `--permissive` Use loose (permissive) type comparisons `--pandas`, `--pd` Use Pandas as DataFrame engine. * `--polars`, `--pl` Use Polars as DataFrame engine. `--backend`, `-B` *BACKEND* Backend choice for Pandas (when dataframe engine is Pandas) `n` for numpy_nullable * `a` for pyarrow `o` for original. `--help`, `-?`, `--?` Show help on `tdda diff`. ### EXAMPLES Data suitable for all examples can be obtained with `tdda examples diff` 1) `tdda diff a.csv a.csv` This is the simplest form of the command. It will read `a.csv` and convert it to a data frame, using the default back end (Pandas). 2) `tdda diff a.csv b.csv --vertical` Compare two CSV files, stacking left and right values vertically rather than side by side. Useful when there are many columns or long values. 3) `tdda diff before.parquet after.parquet --key Income,Expenditure` Compare two Parquet files using a composite join key. The fields `Income` and `Expenditure` must form a primary key in both datasets. Rows are matched by key rather than by position. 4) `tdda diff actual.csv expected.csv --AE --bw` Compare two CSV files using `A:` and `E:` as markers for actual and expected, with monochrome bold highlighting instead of colour. 5) `tdda diff foo.csv: bar.csv:` Compare two CSV files, asking TDDA to find associated metadata files for each using naming conventions (e.g. `@.serial` or `foo-metadata.json` in the same directory). 6) `tdda diff foo.csv bar.txt:money.serial` Compare `foo.csv` (loaded with default settings) against `bar.txt`, using `money.serial` as the metadata file describing its format. 7) `tdda diff a.parquet b.csv --loose --dps 3` Compare a Parquet file against a CSV file with loose type matching and floating-point values compared to 3 decimal places. --- ## `tdda ls` ### NAME `tdda ls` — List fields in a dataset ### SYNOPSIS ``` tdda ls [-h] [-1|--one-line] [-l] [--pandas] [--polars] [--backend BACKEND] INPUT ``` ### POSITIONAL ARGUMENTS *INPUT* is one of: - a CSV file (or `.tsv`, `.psv`, `.txt`) - a Parquet file (`.parquet`) - a flat file with colon syntax to trigger metadata lookup (e.g. `foo.csv:`) - a flat file with an explicit metadata path (e.g. `foo.csv:foo.serial`) ### DESCRIPTION The `tdda ls` command lists the fields in a dataset. Without `--long`, it prints a one-line summary followed by the field names, right-aligned. With `--long`, it prints a one-line summary followed by a table showing each field's dtype, minimum value, maximum value, and null count. For flat files, a second line reports how the file was read and which metadata file was used, if any. ### OPTIONS `-h`, `-?`, `--help` Show this help message and exit `-1`, `--one-line` List all field names on one line, space-separated `-l`, `--long` Show dtype, min, max, and null count per field `--pandas`, `--pd` Use Pandas as DataFrame engine (default) `--polars`, `--pl` Use Polars as DataFrame engine `--backend`, `-B` *BACKEND* Backend choice for Pandas `n` for numpy_nullable * `a` for pyarrow `o` for original ### EXAMPLES The example data can be obtained by running `tdda examples`, which will create various directories, including `serial_examples`. 1) `tdda ls accounts1k.parquet` List the fields in `accounts1k.parquet`. 2) `tdda ls -l accounts1k.csv:` Show field details for `accounts1k.csv`, using any associated metadata file found automatically. 3) `tdda ls -l accounts1k.csv --polars` Show field details using Polars. ### SEE ALSO `tdda-diff(1)`, `tdda-serial(1)`, `tdda-verify(1)` --- ## `tdda cat` ### NAME `tdda cat` — Display rows from a dataset as a rich table ### SYNOPSIS ``` tdda cat [-h] [N | -N | +N] [-s | -S] [--fields FIELDS] [--xfields FIELDS] [-r N [--seed SEED]] [--pandas] [--polars] [--backend BACKEND] INPUT [FIELD ...] ``` ### POSITIONAL ARGUMENTS *INPUT* is one of: - a CSV file (or `.tsv`, `.psv`, `.txt`) - a Parquet file (`.parquet`) - a flat file with colon syntax to trigger metadata lookup (e.g. `foo.csv:`) - a flat file with an explicit metadata path (e.g. `foo.csv:foo.serial`) *FIELD* ... Field names (or `fnmatch` wildcard patterns) to display. Fields appear in the order given. Equivalent to `--fields`; both may be combined. Wildcards must be quoted in the shell. ### DESCRIPTION The `tdda cat` command displays rows from a dataset as a rich table. Without a row count, all rows are shown. `N` or `-N` First N rows `+N` Last N rows Null values are shown as `∅`. ### OPTIONS `-h`, `-?`, `--help` Show this help message and exit `--fields` *FIELDS* Show only these fields. *FIELDS* is a comma- or space-separated list of field names or `fnmatch` wildcard patterns (e.g. `eu_*`, `[a-z]*`). Fields appear in the order specified. Requires quoting in the shell when using spaces or wildcards. `--xfields` *FIELDS* Exclude these fields. Same format as `--fields`. Fields appear in dataset order. `-s` Short headers: column width driven by data; headers split at word boundaries (punctuation and lowercase→uppercase transitions) and packed onto as few lines as possible. `-S` Short headers: as `-s` but split anywhere (mid-word) to fit the data width. `-r` *N*, `--random` *N* Show *N* random rows instead of a slice. `--seed` *SEED* Random seed for `-r`. If omitted, a seed is chosen automatically and printed. `--pandas`, `--pd` Use Pandas as DataFrame engine (default) `--polars`, `--pl` Use Polars as DataFrame engine `--backend`, `-B` *BACKEND* Backend choice for Pandas `n` for numpy_nullable * `a` for pyarrow `o` for original ### EXAMPLES 1) `tdda cat accounts1k.parquet` Display all rows from `accounts1k.parquet`. 2) `tdda cat -10 accounts1k.csv:` Display the first 10 rows, using any associated metadata file. 3) `tdda cat +10 accounts1k.csv:` Display the last 10 rows. 4) `tdda cat --fields 'name,balance' accounts1k.csv:` Display only the `name` and `balance` fields. 5) `tdda cat --fields 'amount*' --xfields '*_raw' accounts1k.csv:` Display fields matching `amount*`, excluding those ending in `_raw`. 6) `tdda cat -r 20 --seed 42 accounts1k.csv:` Display 20 random rows with a fixed seed. 7) `tdda cat -s accounts1k.csv:` Display all rows with compact multi-line headers, splitting at word boundaries (`open_date` → `open date`, `accountType` → `account Type`). ### SEE ALSO `tdda-head(1)`, `tdda-tail(1)`, `tdda-sample(1)`, `tdda-ls(1)`, `tdda-diff(1)`, `tdda-serial(1)` --- ## `tdda head` ### NAME `tdda head` — Display the first N rows of a dataset ### SYNOPSIS ``` tdda head [-h] [N] [-s | -S] [--fields FIELDS] [--xfields FIELDS] [--pandas] [--polars] [--backend BACKEND] INPUT [FIELD ...] ``` ### POSITIONAL ARGUMENTS *INPUT* Dataset path (CSV, Parquet, or colon syntax). *FIELD* ... Field names (or `fnmatch` wildcard patterns) to display. Fields appear in the order given. Equivalent to `--fields`; both may be combined. Wildcards must be quoted in the shell. ### DESCRIPTION The `tdda head` command displays the first N rows of a dataset (default 10) as a rich table. Null values are shown as `∅`. ### OPTIONS `-h`, `-?`, `--help` Show this help message and exit `N` Number of rows to show (default 10) `--fields` *FIELDS* Show only these fields. *FIELDS* is a comma- or space-separated list of field names or `fnmatch` wildcard patterns (e.g. `eu_*`, `[a-z]*`). Fields appear in the order specified. Requires quoting in the shell when using spaces or wildcards. `--xfields` *FIELDS* Exclude these fields. Same format as `--fields`. Fields appear in dataset order. `-s` Short headers: column width driven by data; headers split at word boundaries and packed onto as few lines as possible. See `tdda-cat(1)` for details. `-S` Short headers: split anywhere to fit data width. `--pandas`, `--pd` Use Pandas as DataFrame engine (default) `--polars`, `--pl` Use Polars as DataFrame engine `--backend`, `-B` *BACKEND* Backend choice for Pandas `n` for numpy_nullable * `a` for pyarrow `o` for original ### EXAMPLES 1) `tdda head accounts1k.parquet` Display the first 10 rows of `accounts1k.parquet`. 2) `tdda head 20 accounts1k.csv:` Display the first 20 rows, using any associated metadata file. 3) `tdda head --fields 'name,balance' accounts1k.csv:` Display only `name` and `balance` for the first 10 rows. 4) `tdda head -s 20 accounts1k.csv:` Display the first 20 rows with compact multi-line headers. ### SEE ALSO `tdda-cat(1)`, `tdda-tail(1)`, `tdda-sample(1)`, `tdda-ls(1)`, `tdda-diff(1)`, `tdda-serial(1)` --- ## `tdda tail` ### NAME `tdda tail` — Display the last N rows of a dataset ### SYNOPSIS ``` tdda tail [-h] [N] [-s | -S] [--fields FIELDS] [--xfields FIELDS] [--pandas] [--polars] [--backend BACKEND] INPUT [FIELD ...] ``` ### POSITIONAL ARGUMENTS *INPUT* Dataset path (CSV, Parquet, or colon syntax). *FIELD* ... Field names (or `fnmatch` wildcard patterns) to display. Fields appear in the order given. Equivalent to `--fields`; both may be combined. Wildcards must be quoted in the shell. ### DESCRIPTION The `tdda tail` command displays the last N rows of a dataset (default 10) as a rich table. Null values are shown as `∅`. ### OPTIONS `-h`, `-?`, `--help` Show this help message and exit `N` Number of rows to show (default 10) `--fields` *FIELDS* Show only these fields. *FIELDS* is a comma- or space-separated list of field names or `fnmatch` wildcard patterns (e.g. `eu_*`, `[a-z]*`). Fields appear in the order specified. Requires quoting in the shell when using spaces or wildcards. `--xfields` *FIELDS* Exclude these fields. Same format as `--fields`. Fields appear in dataset order. `-s` Short headers: column width driven by data; headers split at word boundaries and packed onto as few lines as possible. See `tdda-cat(1)` for details. `-S` Short headers: split anywhere to fit data width. `--pandas`, `--pd` Use Pandas as DataFrame engine (default) `--polars`, `--pl` Use Polars as DataFrame engine `--backend`, `-B` *BACKEND* Backend choice for Pandas `n` for numpy_nullable * `a` for pyarrow `o` for original ### EXAMPLES 1) `tdda tail accounts1k.parquet` Display the last 10 rows of `accounts1k.parquet`. 2) `tdda tail 20 accounts1k.csv:` Display the last 20 rows, using any associated metadata file. 3) `tdda tail --fields 'name,balance' accounts1k.csv:` Display only `name` and `balance` for the last 10 rows. 4) `tdda tail -s 20 accounts1k.csv:` Display the last 20 rows with compact multi-line headers. ### SEE ALSO `tdda-cat(1)`, `tdda-head(1)`, `tdda-sample(1)`, `tdda-ls(1)`, `tdda-diff(1)`, `tdda-serial(1)` --- ## `tdda sample` ### NAME `tdda sample` — Display N random rows from a dataset ### SYNOPSIS ``` tdda sample [-h] [N] [--seed SEED] [-s | -S] [--fields FIELDS] [--xfields FIELDS] [--pandas] [--polars] [--backend BACKEND] INPUT [FIELD ...] ``` ### POSITIONAL ARGUMENTS *INPUT* Dataset path (CSV, Parquet, or colon syntax). *FIELD* ... Field names (or `fnmatch` wildcard patterns) to display. Fields appear in the order given. Equivalent to `--fields`; both may be combined. Wildcards must be quoted in the shell. ### DESCRIPTION The `tdda sample` command displays N randomly selected rows from a dataset (default 10) as a rich table. When no `--seed` is given, a random seed is chosen automatically and printed so the result can be reproduced. Null values are shown as `∅`. ### OPTIONS `-h`, `-?`, `--help` Show this help message and exit `N` Number of random rows to show (default 10) `--seed` *SEED* Random seed. If omitted, a seed is chosen automatically and printed. `--fields` *FIELDS* Show only these fields. *FIELDS* is a comma- or space-separated list of field names or `fnmatch` wildcard patterns (e.g. `eu_*`, `[a-z]*`). Fields appear in the order specified. Requires quoting in the shell when using spaces or wildcards. `--xfields` *FIELDS* Exclude these fields. Same format as `--fields`. Fields appear in dataset order. `-s` Short headers: column width driven by data; headers split at word boundaries and packed onto as few lines as possible. See `tdda-cat(1)` for details. `-S` Short headers: split anywhere to fit data width. `--pandas`, `--pd` Use Pandas as DataFrame engine (default) `--polars`, `--pl` Use Polars as DataFrame engine `--backend`, `-B` *BACKEND* Backend choice for Pandas `n` for numpy_nullable * `a` for pyarrow `o` for original ### EXAMPLES 1) `tdda sample accounts1k.parquet` Display 10 random rows from `accounts1k.parquet`, printing the seed used. 2) `tdda sample 50 accounts1k.csv:` Display 50 random rows, using any associated metadata file. 3) `tdda sample 20 --seed 42 accounts1k.csv:` Display 20 random rows with a fixed seed (reproducible). 4) `tdda sample --fields 'name,balance' accounts1k.csv:` Display 10 random rows showing only `name` and `balance`. 5) `tdda sample -s 20 --seed 42 accounts1k.csv:` Display 20 random rows with compact multi-line headers. ### SEE ALSO `tdda-cat(1)`, `tdda-head(1)`, `tdda-tail(1)`, `tdda-ls(1)`, `tdda-diff(1)`, `tdda-serial(1)` --- ## `tdda serial` ### NAME `tdda serial` — Converts and generates serial metadata files. ### SYNOPSIS ``` tdda serial [FLAGS] inmetadata outmetadata tdda serial --to FMT [FLAGS] inmetadata outmetadata Converts metadata from one metadata format, in inpath, to another, in outpath. tdda serial [FLAGS] indata outmetadata Creates metadata for indata in outmetadata tdda serial [FLAGS] inmetadata script.py Creates Python code for reading a file in the format in inmetadata as Python. Often, a reading library would be specified, e.g. tdda serial a.serial a.py --to pd.r which specifies that the Python script should use pandas.read_csv. Supported formats FMT: SHORT FORM LONG FORM/Description . tdda.serial pd.r pandas.read_csv pd.w pandas.DataFrame.to_csv pl.r polars.read_csv pl.w polars.DataFrame.write_csv csv.r python.csv.reader csv.w python.csv.writer csvw CSVW fl frictionless fless frictionless fl.r frictionless.resource fl.p frictionless.package Multiple formats can be separated by commas. Format is usually inferred from filename if following common conventions for tdda.serial, CSVW, and frictionless. ``` ### OPTIONS `--to FMT` Specify output metadata format (see list of formats above) `-B BE, --backend BE` Specify backend for Pandas flavours: `n`: `numpy_nullable` `a`: `pyarrow` `o`: `original` Pandas backend. `--for FILE` Filename for data to use when generating CSVW or Frictionless data. (Can also be used for `tdda.serial` and `.py` output) `-N, --no-config` Use default configuration (ignore `~/.tdda.toml`) `-g, --gen, --generate` Generate (infer) metadata for flat file `-q, --quiet` Quiet output `-v, --verbose` Verbose output `-V, --Verbose` More verbose output ### Options used primarily or exclusively with `--generate`/`--gen`/`-g` `--sep D, --delimiter D` Specify `D` as the field separator. `--quote-char Q, --quote Q` Specify `Q` as the quote character. (Q is always `"` or `'` in practice.) `--nulls S` Specify null indicator, or comma-separated list of null indicators. `--escape` Use backslash as escape character. **NOTE:** Always backslash: does not take argument. `--no-escape` Do not support backslash escaping with `-g`. **NOTE:** This only affects quotes, separators, and backslashes. Standard escapes for control sequences (\t, \n, \r, \f) are always supported. `--stutter` Specify quote stuttering. Usually an alternative to `--escape`. `--no-stutter` Do not use quote stuttering. Usually used with `--escape`. `--encoding ENC, -e ENC` Specify `ENC` as encoding. `--date-format D` Specify `D` as the (file-wide default) date format. `--datetime-format D` Specify `D` as the (file-wide default) format for `datetime` fields. `--sample-lines N, -n N` Use (up to) `N` sample lines when inferring metadata. `--single-field, -1` Inform the metadata inferred that the file contains only a single field (column). `--include-path` Include `path` in `.serial` output `--exclude-path` Do not include in `.serial` output `--quoting Q` Set `quoting` to `Q`. `Q` must be one of: `QUOTE_ALL` `QUOTE_MINIMAL` `QUOTE_NONNUMERIC` `QUOTE_NONE` `QUOTE_NOTNULL` `QUOTE_STRINGS` `QUOTE_STRINGS_ONLY` `--use-literal-dates` Specifies that date formats should be written to `.serial` files with unambiguous literal examples such as `2000-12-31T12:34:56`. `--use-yyyy-dates` Specifies that date formats should be written to `.serial` files in the form exemplified by `YYYY-MM-DD HH:MM:SS`. `--use-pc-dates` Specifies that date formats should be written to `.serial` files in Python `strftime`-compatible % formats, exemplified by `%Y-%m-%dT%H:%M:%S`. ### EXAMPLES 1) `tdda serial a.csv a.serial` Generate tdda.serial metadata describing format of `a.csv` in `a.serial` 2) `tdda serial --to . a.csv a.serial` Same as previous, explicitly specifying the default, `tdda.serial`, output format (`.` is short for `tdda.serial` format). 3) `tdda serial a.csv a-metadata.json` Generate CSVW metadata describing format of `a.csv` in `a-metadata.json` 4) `tdda serial --to csvw a.csv a.json` Same as previous, explicitly specifying format with non-standard output name 5) `tdda serial a.serial a-metadata.json` Converts `tdda.serial` metadata to CSVW 6) `tdda serial a-metadata.json a.serial` Converts CSVW metadata to `tdda.serial` ### USING SERIAL METADATA WITH TDDA COMMANDS For all tdda command-line commands, and in most places within API calls where CSV or other flat file is specified, there is the option to specify the file format using `tdda.serial` files, CSVW files, or Frictionless files. This is based on the `:` (colon) specifier. When specifying a path to a CSV (or other flat) file: * If the path is used by itself, the `tdda` library will use either `tdda.serial.csv_to_pandas` or `tdda.serial.csv_to_polars` to read it into a DataFrame. The default is currently pandas (with the `numpy_nullable` back end), but this can be configured (see `tdda config`) or, in many cases controlled with command line flags (`--polars`, `--pandas`, `--backend BACKEND` (for Pandas only)). * If the path ends in a colon (e.g. `foo.csv:`), TDDA will search for metadata in the same directory as the file and, if it finds one, pass that to the appropriate `csv_to_...` function for more accurate DataFrame generation. * In doing this, it will look for the following in priority order, given a file `foo.csv`: - `foo.csv.serial` (`tdda.serial` metadata) - `foo.serial` (`tdda.serial` metadata). This is actually more common than the previous form, but if there are multiple files with different extensions, the former is more specific, so is checked first. - Anything that matches foo using `@` as a wildcard, e.g. `@.serial`, `f@.serial`, `f@o.serial`, `@oo.serial`. (`@` acts like `*` in the shell, while avoiding needing `*` in filenames, which can be awkward.) - `foo-metadata.json`, `foo-csvmetadata.json`, `foo-csv-metadata.json`, `foo.csvmetadata.json`, `foo.csv-metadata.json` (all of which are common conventions for CSVW metadata files). - The same CSVW patterns with `@` wildcards - `foo.serial.json`, `foo.serial.yaml`, `foo.resource.json`, `foo.resource.yaml`, `foo.package.json`, `foo.package.yaml`, all of which are common for Frictionless metadata files. - The same patterns for `serial` or `package` frictionless files with `@` wildcards. Wildcards are not searched in `resource` files, because in frictionless these always correspond to a single data file. * If the path contains a colon, the part to the right of the colon will be interpreted as a metadata file. So `foo.csv:bar.serial` will use `bar.serial`. ### BUGS The `tdda serial` functionality is fairly new, and there are probably still bugs and undesirable features in the implementation. ### SEE ALSO [Test Driven Data Analysis](https://book.tdda.info), book by Nicholas J. Radcliffe, chapter 8. --- ## `tdda gentest` ### NAME `tdda gentest` — Gentest writes tests, so you don't have to.™ ### SYNOPSIS ``` tdda gentest Runs the Gentest Wizard tdda gentest 'SHELL COMMAND' [OPTIONS] [test_output.py] [REFERENCE_FILE ...] ``` ### POSITIONAL ARGUMENTS *SHELL COMMAND* is the command to be tested. It should normally be enclosed in single quotes. It can be any terminal command — a shell built-in, a shell script, an R program, a Python program, or anything else that can be run from the terminal. *test_output.py* is the name of the Python test script to generate. If not specified, Gentest derives a name from the command. *REFERENCE_FILE ...* are optional additional files or directories that Gentest should monitor for files created or modified during command execution. ### DESCRIPTION Gentest will create Python tests, using the tdda's reference-testing capabilities, for terminal-based programs written in any language. For example, the shell command can be a built-in shell command or can run a shell script, an R program, or of course a Python program. It has a wizard, invoked just by typing `gentest`, that prompts for the information it needs before generating the tests. Alternatively, the command to be tested and optionally other parameters can all be specified on the command line. Gentest's tests: - Runs the provided command more than once (by default) - Captures output to `stdout` and `stderr` - Captures the exit code - Notices any files created in the directory or subdirectories or other specified places - Uses variations in output and other heuristics to identify parts of the output that appear variable and uses `rexpy` to write reference tests that only test things that appear to be fixed and not system dependent. - Writes a Python test script, using `tdda.referencetest`, that contains a set of tests of the shell command specified. The test script can then, of course, be edited by hand. The test script, when run, executes the command again and checks that its behaviour is as expected (i.e., is “the same” as when Gentest ran originally, except for the variations allowed in the reference test specifications). ### OPTIONS `-h, --help` Show this help message and exit `-?, --?` Same as -h or --help `-m N, --max-files N` Max files to track `-r, --relative-paths` Show relative paths wherever possible `-n N, --iterations N` Number of times, `N`, to run the command (default 2) `-O, --no-stdout` Do not generate a test checking output to STDOUT `-E, --no-stderr` Do not generate a test checking output to STDERR `-Z, --non-zero-exit` Do not require exit status to be 0 `-C, --no-clobber` Do not overwrite existing test script or reference directory `-N, --no-config` Use default configuration (ignore `~/.tdda.toml`) ### EXAMPLES 1) `tdda gentest` Runs the Gentest wizard, which presents a dialogue something like this (where all suggested answers, in square brackets, are accepted by hitting `RETURN`). (Obviously, this is an improbably simple command test; it's usually a command to run a script or program. ``` $ tdda gentest Enter shell command to be tested: echo "Hey, cats!" Enter name for test script [test_echo__Hey__cats__]: Check all files written under $(pwd)?: [y]: Check all files written under (gentest's) $TMPDIR?: [y]: Enter other files/directories to be checked, one per line, then a blank line: Check stdout?: [y]: Check stderr?: [y]: Exit code should be zero?: [y]: Clobber (overwrite) previous outputs (if they exist)?: [y]: Number of times to run script?: [2]: Running command 'echo "Hey, cats!"' to generate output (run 1 of 2). Saved (non-empty) output to stdout to /home/tdda/ref/echo__Hey__cats__/STDOUT. Saved (empty) output to stderr to /home/tdda/ref/echo__Hey__cats__/STDERR. Running command 'echo "Hey, cats!"' to generate output (run 2 of 2). Saved (non-empty) output to stdout to /home/tdda/ref/echo__Hey__cats__/2/STDOUT. Saved (empty) output to stderr to /home/tdda/ref/echo__Hey__cats__/2/STDERR. Test script written as /home/tdda/test_echo__Hey__cats__.py Command execution took: 0.022s SUMMARY: Directory to run in: /home/tdda Shell command: echo "Hey, cats!" Test script generated: /home/tdda/test_echo__Hey__cats__.py Reference files: (none) Check stdout: yes (was 'Hey, cats!\n') Check stderr: yes (was empty) Expected exit code: 0 Clobbering permitted: yes Number of times script ran: 2 Number of tests written: 4 ``` 2) `tdda gentest 'echo "Hey, cats!"' 'test_echo.py' -n 3` Same as above except that the command and a custom name for the test script has been supplied, so the wizard does not run, and the number of times to run the command has been increased to three. The test script produced is almost identical except for the number of times the command is run. 3) `tdda gentest 'diff verifier1.txt verifier2.txt' -Z` Gentest will normally fail if the program produces a non-zero exit code, generally indicating an error. Commands like `diff`, however, produce a non-zero exit code (1) when there are differences. The `-Z` option (or `--non-zero-exit`) allows the exit code to be non-zero, and Gentest generates a test that checks it is the expected value (1, in this case, if the two verifier files should be different). ### SEE ALSO `rexpy(1)`, `tdda-diff(1)` [Test Driven Data Analysis](https://book.tdda.info), book by Nicholas J. Radcliffe, chapter 9, and chapter 9-12 for reference testing more generally. --- ## `tdda tag` ### NAME `tdda tag` — tag tests that failed in the last reference test run ### SYNOPSIS ``` tdda tag ``` ### DESCRIPTION The `tdda tag` command reads the log of failing tests written by the most recent logged `tdda.referencetest` run and adds `@tag` decorators to those tests in their source files. Tagged tests can then be run in isolation, allowing a rapid edit-test cycle focused on failing tests. A logged run of `tdda.referencetest` uses `--log-failures` or (for unittest-style tests only) `-F`. ### WORKFLOW A typical workflow with `unittest`-style tests (`ReferenceTestCase`) is: `python tests.py -9 # Remove any existing @tag decorators` `python tests.py -F # Run tests, logging failures` `tdda tag # Add @tag to failing tests` `python tests.py -1 # Run only tagged (failing) tests` When all tests are passing: `python tests.py -9 # Remove @tag decorators` The equivalent workflow with `pytest` is: `pytest --untag # Remove any existing @tag decorators` `pytest --log-failures # Run tests, logging failures` `tdda tag # Add @tag to failing tests` `pytest --tagged # Run only tagged (failing) tests` When all tests are passing: `pytest --untag # Remove @tag decorators` ### SEE ALSO `tdda(1)` --- ## `tdda examples` ### NAME `tdda examples` — Creates example data for TDDA ### SYNOPSIS ``` tdda examples [OUTDIR] tdda examples [MODULE...] [OUTDIR] tdda examples all [OUTDIR] ``` ### POSITIONAL ARGUMENTS *MODULE* can be any of: - `referencetest` - `constraints` - `rexpy` - `gentest` - `book` If not specified, all the first four will be created, without requiring internet access. *OUTDIR* is an optional directory in which to write the example directories; by default this will be the current working directory (.). If `all` is specified, or `book` is included, the `tdda-book-examples` will be downloaded from GitHub, which does require internet access. ### DESCRIPTION Write out example code and data for all examples, by default, or for a particular module if specified. If no module is specified, examples for all four are written out. Examples are created in subdirectories of *OUTDIR* (default: the current directory `.`). ### EXAMPLES 1) `tdda examples` Creates the referencetest, constraints, rexpy, and gentest examples in `.` 2) `tdda examples gentest` Creates `examples_gentest` in `.` 3) `tdda examples gentest book` Creates gentest and book examples in `.` 4) `tdda examples all` Creates all the examples, four from local files and the book examples from GitHub in `.` --- ## `tdda version` ### NAME `tdda version` — Reports the (active) installed version of tdda ### SYNOPSIS ``` tdda version ``` ### DESCRIPTION Reports the version number of the (active) TDDA tools. ### EXAMPLES `tdda version` --- ## `tdda config` ### NAME `tdda config` — Shows config settings ### SYNOPSIS ``` tdda config [--annotated|-a] [--current|-c] [--default|-d] [--file|-f] tdda config [--annotated|-a] current|default|file ``` ### DESCRIPTION Shows configuration information. Use: `-c`, `--current`, or `current` for the current configuration `-d`, `--default`, or `default` for the default configuration `-f`, `--file`, or `file` for the configuration file location and contents. With no argument, it shows the current configuration. Use `-a` or `--annotated` with any of the above to show allowed values alongside each parameter. ### EXAMPLES `tdda config` `tdda config -c` `tdda config -d` `tdda config -f` ### PARAMETERS #### `null_rep` Used to show nulls in some contexts. **Default:** `"∅"` **Allowed:** Any string #### `colour` Controls whether output is colourized. **Default:** `true` **Allowed:** `true`, `false` #### `engine` Controls whether pandas or polars is used for CSV files by default. **Default:** `"pandas"` **Allowed:** `"pandas"`, `"polars"` #### `pandas_backend` Controls default backend for CSV loading etc. **Default:** `"numpy_nullable"` **Allowed:** `"numpy_nullable"` (or `"n"`), `"pyarrow"` (or `"a"`), `"original"` (or `"o"`) ### PARAMETERS (referencetest) #### `left_colour` Colour for left (actual) side of diffs. **Default:** `"red"` **Allowed:** A named ANSI colour (red, bright_red etc.) or an RGB hex colour with leading # such as #FF0000 for pure red. Interpreted by the rich library. #### `right_colour` Colour for right (expected) side of diffs. **Default:** `"green"` **Allowed:** A named ANSI colour (red, bright_red etc.) or an RGB hex colour with leading # such as #FF0000 for pure red. Interpreted by the rich library. #### `failure_colour` Colour used to highlight failures. **Default:** `"red"` **Allowed:** A named ANSI colour (red, bright_red etc.) or an RGB hex colour with leading # such as #FF0000 for pure red. Interpreted by the rich library. #### `mono` Use bold instead of colour for diffs. **Default:** `false` **Allowed:** `true`, `false` #### `bw` Black and white mode: no colour or bold. **Default:** `false` **Allowed:** `true`, `false` #### `left_prefix` Prefix string for left (actual) diff lines. **Default:** `"< "` **Allowed:** Any string #### `right_prefix` Prefix string for right (expected) diff lines. **Default:** `"> "` **Allowed:** Any string #### `vertical` Show diffs vertically rather than side by side. **Default:** `false` **Allowed:** `true`, `false` #### `force_val_prefixes` Always show left/right prefixes on diff lines. **Default:** `false` **Allowed:** `true`, `false` #### `type_checking` How strictly to check types in reference test comparisons. **Default:** `"strict"` **Allowed:** `"strict"`, `"medium"`, `"loose"` #### `log_failures` Log failing test IDs to file for use with `tdda tag`. **Default:** `false` **Allowed:** `true`, `false` ### PARAMETERS (constraints) #### `interleave` Interleave pass and fail results in verify output. **Default:** `true` **Allowed:** `true`, `false` #### `per_constraint` Report results per constraint rather than per field. **Default:** `true` **Allowed:** `true`, `false` #### `detect_passes` Include passing fields in detect output. **Default:** `true` **Allowed:** `true`, `false` #### `report_formats` List of additional report formats to generate. **Default:** `[]` **Allowed:** Any subset of `"html"`, `"md"`, `"txt"`, `"json"`, `"yaml"`, `"toml"` #### `write_all_records` Write all records to detect output, not just failures. **Default:** `false` **Allowed:** `true`, `false` #### `int_bools` Use integers (0/1) rather than booleans in detect output. **Default:** `false` **Allowed:** `true`, `false` #### `verify_required_fields` Verify that all required fields are present. **Default:** unset **Allowed:** `true`, `false` #### `verify_allowed_fields` Verify that no fields are present outside the allowed set. **Default:** unset **Allowed:** `true`, `false` #### `write_required_fields` Discover should include the required-fields constraint. **Default:** `false` **Allowed:** `true`, `false` #### `write_allowed_fields` Discover should include an allowed-fields constraint. **Default:** `false` **Allowed:** `true`, `false` ### PARAMETERS (tddadiff) #### `type_checking` How strictly to check types when comparing dataframes. **Default:** `"medium"` **Allowed:** `"strict"`, `"medium"`, `"loose"` #### `find_md` Infer metadata when comparing dataframes with tdda diff. **Default:** `true` **Allowed:** `true`, `false` ### PARAMETERS (serial) #### `md_inpath` Path(s) to search for serial metadata files; relative paths are resolved relative to the CSV file. **Default:** `"./_write.serial"` --- ## `tdda test` ### NAME `tdda test` — Run the tdda library's self-tests ### SYNOPSIS ``` tdda test ``` ### DESCRIPTION Runs tdda's (internal) self-tests. **NOTE:** It is hard to guarantee that all will pass on all systems given that dependencies are not tightly pinned. It is not necessarily a problem if some tests fail, but is a concern if a very large number fail. ### SEE ALSO `tdda(1)` --- ## `tdda help` ### NAME `tdda help` — Provides help on `tdda` and its sub-commands. ### SYNOPSIS ``` tdda help tdda help COMMAND ``` ### POSITIONAL ARGUMENTS *COMMAND* can be any of: `discover` `verify` `detect` `examples` `gentest` `diff` `serial` `tag` `config` `help` `version` `test` `installman` ### DESCRIPTION Shows help on a tdda subcommand or topic. Taking inspiration from `git`, if the man pages are installed (see `tdda installman`), help on main commands can also be obtained with `man tdda-COMMAND` For example: `man tdda-discover` Help can also be obtained on each command with `--help`, `-h` or `-?`, e.g. `tdda discover --help` ### EXAMPLES `tdda help` Shows this help `tdda help gentest` Shows help on gentest ### SEE ALSO `tdda-installman(1)` --- ## `tdda installman` ### NAME `tdda installman` — install tdda man pages ### SYNOPSIS ``` tdda installman [--system] ``` ### DESCRIPTION Installs the `tdda` man pages so they can be accessed with the `man` command. Once installed, the main `tdda` man page is available as: man tdda Man pages for `tdda` subcommands are available as: `man tdda-COMMAND` For example: `man tdda-discover` `man tdda-gentest` The `rexpy` man page is accessed as: `man rexpy` By default, man pages are installed to `~/.local/share/man/man1`. On MacOS, this directory may not be in the default man search path; if so, `tdda installman` will print the line to add to your shell config file to make the man pages available in new shells. With `--system`, man pages are installed to `/usr/local/share/man/man1`, which is in the default search path on most systems but may require running with `sudo`. On Windows, man pages are not supported; consider running `tdda` under WSL (Windows Subsystem for Linux). ### OPTIONS `--system`, `-s` Install system-wide to `/usr/local/share/man/man1` (may require sudo). ### EXAMPLES 1) `tdda installman` Install man pages to `~/.local/share/man/man1`. 2) `tdda installman --system` Install man pages system-wide (may require sudo). ### SEE ALSO `tdda-help(1)` --- ## `rexpy` ### NAME `rexpy` — infer regular expressions from example strings ### SYNOPSIS ``` rexpy [FLAGS] [INPUTFILE [OUTPUTFILE]] ``` ### DESCRIPTION `rexpy` reads a list of strings (one per line) and infers one or more regular expressions that characterize them. If *INPUTFILE* is provided it should contain one string per line; otherwise lines are read from standard input. If *OUTPUTFILE* is provided, the regular expressions found will be written there (one per line); otherwise they will be printed to standard output. ### OPTIONS `-h`, `--header` Discard the first line as a header. `-?`, `--help` Print usage information and exit. `-g`, `--group` Generate capture groups for each variable fragment of each regular expression, i.e. surround variable components with parentheses. e.g. ` ^[A-Z]+\-[0-9]+$` becomes `^([A-Z]+)\-([0-9]+)$` `-q`, `--quote` Display regular expressions as double-quoted, escaped strings, suitable for use in Unix shells, JSON, and string literals in many programming languages. e.g. ` ^[A-Z]+\-[0-9]+$` becomes `"^[A-Z]+\-[0-9]+$"` `--portable`, `--grep` Produce maximally portable regular expressions (e.g. `[0-9]` rather than `\d`). This is the default. `--java` Produce Java-style regular expressions (e.g. `\p{Digit}`). `--posix` Produce POSIX-compliant regular expressions (e.g. `[[:digit:]]` rather than `\d`). `--perl` Produce Perl-style regular expressions (e.g. `\d`). `-u`, `--underscore` Allow underscore to be treated as a letter. Mostly useful for matching identifiers. Also `-_`. `-d`, `--dot`, `--period` Allow dot to be treated as a letter. Mostly useful for matching identifiers. Also `-.`. `-m`, `--minus`, `--hyphen`, `--dash` Allow minus to be treated as a letter. Mostly useful for matching identifiers. `-vlf`, `--variable` Use variable-length fragments. `-flf`, `--fixed` Use fixed-length fragments. `-v`, `--version` Print the version number. `-V`, `--verbose` Set verbosity level to 1. `-VV`, `--Verbose` Set verbosity level to 2. ### SEE ALSO `tdda(1)`, `tdda-discover(1)`