Skip to main content

Log File Security Policy

1: Purpose

This policy outlines the requirements Users must adhere to, to ensure log files used to check code for projects within the OpenSAFELY service do not contain any disclosive data. 

This policy contains the following sections:

  1. Purpose
  2. General
  3. Scope
  4. Policy requirements
  5. Further information
  6. Version history

2: General

Log files can be used to check that code is running as expected and when errors are encountered, to debug code. They contain the outputs which might be used to see when running code locally in a terminal or a development environment like RStudio or VS Code. They are generated and saved automatically whenever an OpenSAFELY job runs. This means that there is no need to manually save such log files as outputs of OpenSAFELY jobs.

When running code on dummy data, they can be found at ˋmetadata/<action_name>.logˋ in a workspace. When running on real data in the secure environment, they can be found in Airlock, also in the metadata folder.

Log files generated from code that is run in real patient data have the potential to contain small amounts of pseudonymised patient level data. There are various types of command that might lead to data being saved in a log file. Examples of such commands include:

Type of commandExamples
RPythonStata
Print the first 10 rows of a tablehead(dataframe, 10)dataframe.head(10)list in 1/10
Print the last 10 rows of a tabletail(dataframe, 10)dataframe.tail(10)
Print part/whole tableprint(dataframe)dataframelist
Print the value of a cellDataframe[row,column]
summary(dataframe)
dataframe[row,column]
Print list of valuesprint()
dataframe[column]
dataframe$column

3: Scope

It is the responsibility of Users to ensure that log files generated from running code on real patient data do not contain disclosive data.

4: Policy requirements

Such commands as evidenced above must only be used on dummy data and it is strongly advised to minimise the volume of dummy data (number of columns and rows) as far as possible. 

Such commands must not be used on real patient data. To prevent this from happening, Users must not push code containing such commands to GitHub, so that it remains on the User’s codespace or local computer. 

All code that is pushed to GitHub is auditable. Projects that are found to contain commands that print patient level data to a log will trigger a discussion with the Users of that project about how to prevent it happening again, and whether the inclusion of these commands constitutes a breach of OpenSAFELY Policies and/or Data Access Agreement.
Log files are not considered for release from the secure environment, unless in exceptional circumstances, due to the higher risk of containing disclosive data, and the difficulty of checking such log files See guidance here.

5: Further information

OpenSAFELY automatically generates a log for each job that is run. Users only see the last 10kb of these logs – typically a few hundred lines, rather than the full file.

These logs are produced by Users’ own analysis code running inside the OpenSAFELY secure environment. They are used to diagnose errors. In rare cases a user might accidentally log a very small amount of refined, pseudonymised patient-level information, such as a few rows printed during debugging. As described in the national DPIA, this would only be a small subset of information, derived from complete EHR records, about a small number of arbitrarily sampled patients, with all identities fully pseudonymised, and only accessible inside the secure environment to users already approved by NHS England. No record-level data can be released from this environment.

Even in this rare scenario the multi-layered controls still apply. Access requires VPN and multi-factor authentication, and only approved users can view logs. As explained in our public walkthrough of the security model, these safeguards significantly limit any risk arising from accidental logging.

Truncating the displayed portion of logs reduces this already minimal possibility even further, while preserving the information that users rely on to understand and fix errors. Only the on-screen view is truncated. The full log is still stored securely, and OpenSAFELY support can provide access in the exceptional cases where this is required. Contact your co-pilot or tech-support if you would like to discuss this. 

Users must not take any steps to subvert the log file filtering process.

Technical information

  • Log files contain the outputs from `stdout` and ˋstderrˋ.
  • Log files for ehrQL actions contain details of how long each query took to run, which may help with determining what is taking time when running an ehrQL job.