CSV data summaries in Airlock
Posted:
When viewing a CSV file in a workspace or release request, users can now view a table with some summary information about each column in the table.
The summary table shows, for each column:
- Detected column type (text, numeric, mixed)
- Total number of rows
- Total number of rows with numeric values
- Number of missing/null values
- Number of redacted values
For numeric columns, it also automates some common checks that output checkers typically perform in order to confirm statistical disclosure measures have been applied, such as small number redaction and rounding:
- Minimum value
- Minimum non-zero value
- Maximum value
- Sum of all values (useful for checking a column of percentages sums to 100%)
- Rounding – whether all values in the column are:
- divisible by 5
- divisible by 6 (equivalent to midpoint 6 derived rounding)
- midpoint 6 rounded
- Link to documentation
- Pull request: opensafely-core/airlock#1136