How OpenSAFELY began | Bennett Institute for Applied Data Science

OpenSAFELY was born out of the COVID-19 pandemic. Within weeks of the first official confirmation of cases of “viral pneumonia” by Chinese authorities in late 2019, governments and health teams all over the world began to make preparations for pandemic response.

At the Bennett Institute – then still called “the DataLab” – we did the same. We immediately started asking ourselves: what would colleagues in healthcare need to tackle a full-blown pandemic? How could we make use of the National Health Service’s existing datasets? What could we do, on a practical level, to contribute?

Two days after the official pandemic announcement, we published an article on the British Medical Journal website, in which we said:

“There are numerous sources of data that can be better exploited from primary and secondary care, each with their own attendant barriers […]. COVID-19 shows more clearly than ever that we can and must deliver clean, real-time, standardised data to support direct care and all aspects of system planning and response. This is not a ‘back-office expense’ to be minimised, but a core part of delivery."

The team immediately started brainstorming ideas. The most interesting was for a “data platform to give health researchers what they will need […] fast, secure access to large volumes of COVID-19 related data”. It was initially known as “The Open COVID Research Platform”.

Jessica Morley, who was our Policy Lead at the time, later wrote a detailed history of those early days, and reflected:

“We realised that there was an increasingly urgent need to answer questions such as: which demographic characteristics or medical conditions made people more vulnerable to COVID-19? Which drugs might help or hinder the treatment? And what happens to people after they have recovered from initial infection? We also realised that to answer these questions quickly, researchers would need rapid access to unprecedented volumes of clinical data, and a means of conducting high-quality analytics in a collaborative fashion. We concluded that rather than a single study or data source, the NHS needed a platform that would enable many data analysis studies to be conducted in a single secure environment."

Within days we wrote a joint letter, with colleagues from the London School of Hygiene and Tropical Medicine, to the Secretary of State for Health. In it we proposed a platform of some kind: something open for research, but safe for patient privacy. Soon after that, we asked our contacts at health records company TPP if they might be interested in collaborating on something. They immediately said yes.

On the same day, the government had issued a “Control of Patient Information” (COPI) notice to the Chief Executive of the NHS, which gave legal backing in principle to access electronic health records on behalf of NHS England – a prerequisite for making the platform work. But a legal basis is only part of the story: there would still need to be a trustworthy way to achieve that access. Things were moving extremely fast.

One week after the first UK lockdown was announced, we committed the first line of code to GitHub – work on the platform had begun. But it still didn’t have a finalised name. That took a few more weeks of debate and friendly argument to decide. Before the end of April 2020, we’d settled on “OpenSAFELY.”

On 7 May 2020, the first scientific paper written using OpenSAFELY went to pre-print: “OpenSAFELY: factors associated with COVID-19-related hospital death in the linked electronic health records of 17 million adult NHS patients.”

This was a big moment. We were showing that OpenSAFELY worked as a fully open-source, privacy-preserving software platform, capable of running open and reproducible analytics across electronic health records, all held securely in situ. Working with colleagues from the Electronic Health Records research group at the London School of Hygiene and Tropical Medicine, NHS England, and TPP, we’d got it up and running in just 42 days.

As 2020 rolled on, that paper was formally published in Nature, and we expanded OpenSAFELY’s reach to include data from the other major electronic health records company, EMIS, as well as TPP. Now the platform provided researchers with access to more than 55 million patient records – more than 95% of all patient data in England.

Towards the end of that year – partly in response to requests from privacy campaigners – we created the Jobs Dashboard at jobs.opensafely.org, so that anyone could see the work running on the platform.

In December, the first COVID-19 vaccine was approved for use in the UK. The NHS vaccination programme began just six days later. OpenSAFELY delivered its first live dashboards showing which kinds of patients were and were not receiving the vaccine within ten days of the first vaccine being administered.

The next big milestone came in 2021, when we published the first federated analysis using data from both TPP and EMIS: “Trends and clinical characteristics of COVID-19 vaccine recipients: a federated analysis of 57.9 million patients’ primary care records in situ using OpenSAFELY.”

Jessica Morley noted:

“This federated analysis was a truly massive technical achievement […] it was driven, as ever, by the combination of skills that no single individual, or even team, is ever likely to embody alone, across EHR data analysis, EHR system design, software development, data management, open science, and more."

Since then, it’s been that collaborative team thinking that has pushed OpenSAFELY forward as a continuously evolving and improving digital tool for researchers. Our goal has always been to be a team that goes beyond research – we wanted to use data to build machines that act in the world, machines that make a practical, tangible difference. Tools that you can use to get things done. OpenSAFELY has got a great deal of practical, tangible work done: 86 peer-reviewed and published papers so far, and counting.

OpenSAFELY began with the pandemic, and has been largely funded to date on the understanding that research conducted through it relates to COVID-19 – but we think it has a bright future ahead. We’d like to expand its reach, to get more researchers using it in more organisations. We’re already looking at ways we could use the OpenSAFELY model in other aspects of health care, beyond electronic health records. Or beyond health care entirely.

The journey so far has not been easy. There have been downs, as well as ups. But the team is committed, hard-working and determined. We have a lot more work to do yet.