Introducing Airlock | Bennett Institute for Applied Data Science

To conduct research in the OpenSAFELY platform, researchers submit requests to run jobs against sensitive health data on a secure server. These jobs produce outputs – tables and charts – which are also stored on a secure server.

We’ve recently built a tool, Airlock, which streamlines the process of viewing the outputs of a research job, and of releasing some of the outputs so that they can be shared with a wider audience.

In this blog post I’ll outline how Airlock works, and talk about how we make sure our documentation stays in sync with the code.

Full documentation about using Airlock is available here. You can also watch Tom give a live demo of Airlock in this video from the OpenSAFELY Symposium in November.

Airlock is a web application: users access it via a web browser running on a remote desktop on a secure server. When you log in, you see a list of the workspaces associated with your OpenSAFELY project.

A screenshot of the workspaces available to a hypothetical user called ‘Rachel Researcher’. She has one workspace called ‘my-workspace’.

A workspace contains the outputs of the jobs that you have run against data, organised in a familiar directory structure, and you can click on a filename to view its contents.

A screenshot showing the files available within the workspace ‘my-workspace’. A CSV file is highlighted, and the values from within that file are displayed in a nicely formatted panel.

Having viewed your outputs, you might decide to change your code and run more jobs.

Once you are happy with your outputs, you can request that some of them be released, so that you can share them with collaborators or prepare them for publication. You add output files to a request, either individually or in groups.

A screenshot showing the page within Airlock that allows users to select which files they would like to add to a release request.

When you have added to a request all the files you would like to release, you need to explain why you need to release these files, and describe what steps you have taken to minimise the risks that the outputs are disclosive. (We provide extensive documentation with guidance about applying statistical disclosure control.)

A screenshot of the release request page, showing where a user needs to add information supporting their release requests, including surrounding context and what controls they have applied.

When you submit your request, it will be reviewed by two independent members of the OpenSAFELY team. They will check that the outputs are appropriate to be released and that suitable disclosure control has been applied.

If they are both happy with all the outputs in the request, then they will release the files from the secure server to the jobs site, where they can be viewed by your collaborators. If you have approval from NHS England, the outputs can then be published.

If the output checkers have concerns, they will return the request to you, and you will either have to provide more information about your request, or you will have to remove a disclosive file from the request, or you will have to run a new job to produce new outputs with different controls applied.

Keeping our documentation up to date

It can be a challenge to keep documentation up to date when an application is under development. To help with this while developing Airlock, we use a tool called Playwright.

Playwright is a tool designed to help with automated testing of web applications, by letting us programmatically control a web browser. For instance, it lets us write code to visit a page in a browser, fill in some fields in a form, click on a button, and then check that the page updates as we expect.

We make significant use of it in testing Airlock. This video shows Playwright whizzing through an end-to-end test that validates the whole process of creating, submitting, reviewing, and releasing a request.

We run this test, and many others like it, every time we add new code, and we do not allow ourselves to deploy code where any of our tests fail.

(At the time of writing, we have 750 separate automated tests with around 10,000 lines of test code, compared to around 5,000 lines of production code. Most of these 750 tests do not use Playwright, but instead test parts of the system in isolation.)

What does this have to do with keeping the documentation up to date?

When we run our automated tests, we have a handful of special tests that instruct Playwright to capture screenshots of the screens that are being tested. These screenshots are written directly to our documentation directory, so that when we make changes to our code, the documentation can be kept in sync.

As always, you can find the full Airlock source code (along with the code for all other OpenSAFELY components) on GitHub.