Skip to main content

Using compressed data files by default

Posted:

Using compressed files is now the default recommendation in documentation and templates. On the backends, where datasets can be very large, using uncompressed files significantly slows execution and consumes more disk space.

The research-template has been updated to generate csv.gz files from cohortextractor by default, and
the examples and Getting Started documentation have been updated to match.

In addition, recommendations for using compressed formats for further data files in python, R and Stata has been updated.

Because of the change in filename, if you have a workspace with a large amount of data in uncompressed CSV files, ask tech-support about moving to compressed CSVs, and we can help do this efficiently.

Link to documentation