Skip to main content

Using compressed data files by default

02 February 2023

Using compressed files is now the default recommendation in documentation and templates. On the backends, where datasets can be very large, using uncompressed files significantly slows execution and consumes more disk space.

The research-template has been updated to generate csv.gz files from cohortextractor by default, and the examples and Getting Started documentation have been updated to match.

In addition, recommendations for using compressed formats for further data files in python, R and Stata has been updated.

Because of the change in filename, if you have a workspace with a large amount of data in uncompressed CSV files, ask tech-support about moving to compressed CSVs, and we can help do this efficiently.