A rallying cry for data in the NHS has been “Data Saves Lives”; it does, and it can. In this article I want to provide you with a working knowledge of how your data moves around the NHS without resorting to three letter acronyms or technical jargon. There are hundreds of millions of pounds being invested in NHS Data infrastructure and the next five years provide a golden opportunity for the UK to deliver better, cheaper, faster healthcare by using data effectively.
How NHS Data is currently moved around
We expect organisations to collect our data and use it for their own purposes. Supermarkets understand shopping habits, social media recommends content and governments track individuals across continents.
We might assume that the NHS, as one national system that conducts 590 million patient contacts a year, would have a giant database of records, prescriptions and procedures. This is not the case. There are lots of sources of data in the NHS, but they are local, fragmented, and often incomplete with some hospitals still relying on paper to manage patient information.
A general rule within the NHS is that the detailed records stay with the service you use, and summary data gets moved around. When you go to A&E near your home it is likely that there is a link between the hospital and your GP records. This reduces how much time each patient has to spend in A&E. However, if you visit an A&E in a different part of the country the data shared is limited to medication and allergies. They won’t know about your recent outpatient appointments or call up the X-ray from the time you broke your leg three years ago, even if it is directly relevant (like breaking that leg again).
Even within the same hospital there are silos of information. Nurses working across wards might need thirty-plus user names and passwords to access the various systems that have been implemented across the hospital. You may have experienced this as a patient as you provided the same information to multiple staff members from different departments or wondered why you have to be the one to tell the hospital when you last came in and what for.
The difficulty in linking local systems translates to challenges in asking national questions. This is partly due to the devolution of health. Each part of the UK has decided to collect data in slightly different formats, with slightly different purposes to slightly different levels of accuracy and completeness. This means there is no UK wide picture of health. As an example, there has been a recent push to get blood pressure checks in barbers because of fears of dangerously high blood pressure going unnoticed. It would be useful to understand how many people with a recording of high blood pressure at a GP surgery went into A&E within the next month but there is no way of getting that information. This also impacts how patient care is monitored. The longer and more complicated the journey from local system to national oversight, the slower the response is to identifying unsafe services and the longer patients are put at unnecessary risk.
Given the challenges in stitching data together across nations the rest of this article is going to focus on England, apologies Wales, Scotland and Northern Ireland.
The purposes of data to save lives in the NHS
Most people are familiar with their data being used for direct care (what the person treating you needs to know about you to make the right decision) but there are three other purposes that save lives:
- Managing Population Health
- Research (and Innovation)
- Operating and Planning NHS Services
These purposes mean that there are different requirements and, much like the NHS doesn’t make photocopiers or fax machines, the NHS doesn’t own the technology that GPs or hospitals use to manage electronic healthcare records. The consequences of differing requirements and multiple technologies means that data is often duplicated and moved around different parts of the NHS using different methods. The downsides of an electronic version of the NHS is that there are lots of machines, all storing data in slightly idiosyncratic ways with their own secure interfaces on secure networks that don’t talk to each other. This invites a comparison to older fax technology, still used by the NHS today. Faxing was simple because each fax machine moved data in the same way and was connected to the same network. It turns out that faxes, despite being very insecure and manual, are a wonderfully simple and reliable way of moving information from human eyes to other human eyes.
Below are two figures that illustrates a simplified version of how information for each of these purposes flow between use case and geographic area so that we can take a bird’s eye view.
There is generally very good data in providers that is held locally and there is a huge opportunity for local areas to combine data to provide integrated services across the use cases above. Nationally, there are big gaps that restrict the information available to decision makers in a sensible timeframe about where to invest, what is working and what isn’t.
Figure 1. Basic data requirements of NHS data
Figure 2. An oversimplified diagram of how data moves through the NHS
Managing Population Health
Managing Population Health can be described as “How can we proactively identify groups of people at risk and support interventions to improve health”. From a data perspective you need to identify the person and so you need to know names, NHS numbers and contact details. Sharing this data around is the cause of all major security breaches and a good way to get fined (up to £17m or 4% of global turnover ).
There is no population health management solution available nationally. The closest we have come to that was during covid where the rules on sharing data within the NHS were relaxed and allowed the NHS to do an amazing job of risk stratifying the entire population and aligning them with the vaccination cohorts and then messaging each person about it. This was an extraordinary achievement. The technology that pulled limited information from GP records and curated them in a single source is not being used to create a national population health management solution because when this data was extracted, we were in a pandemic and during that time the NHS could share confidential data nationally without going through the normal information governance procedures (so long as the data was being used to respond to the pandemic).
Why is this less than ideal
The NHS can’t easily risk stratify the population for different diseases to pro-actively manage care or compare how different patient populations are using NHS services. We often hear about “frequent fliers” in GP surgeries, but we can’t put a number on that nationally to understand the scale of the problem and what is driving usage. This means local systems can’t act to protect their populations and we continue to treat illness instead of maintaining health.
What is being done about it
Not a lot, as for now, the NHS in England is limited by the restrictions on data being pulled up from primary care to covid usage. There are other health systems internationally with similar data protection rules where this movement is permitted. If the restrictions on data sharing within the NHS were removed, it could form the basis for a world class preventative population management platform that leverages the scale of the data to learn what works to improve health and reduce healthcare costs. All population health management is currently reliant on local implementations of technology. This will mean that depending on where you live will determine whether your NHS has the capability to pro-actively target and manage your care across services.
Research brings in non-NHS organisations into the picture, epidemiologists want to know what’s happening to diseases, pharmaceutical companies want to know if their drugs are working, and technology companies want to know if their algorithms are cost effective. Most people (73%) want their data to be used to improve the health of others but this relies on the trust patients have that the NHS will keep their data safe. This trust is put at risk when the purposes and people using the data are not made clear. The recent indefinite delay of general practice data for planning and research being a great example of public pressure preventing a reasonable effort to support NHS services due to a lack of communication and transparency.
In the bottom right of Figure 2 there is parallelogram that isn’t connected to the other data streams that has a unique solution to this problem. Instead of moving data around the UK, OpenSafely moves the analysis to the data. This has meant that for the first time it has been possible to perform research using primary care data at a patient level on a national scale. Oddly OpenSafely isn’t quite part of the NHS, it is a project run by Ben Goldacre based at the University of Oxford and again, this is only possible due to covid and only covid analysis is currently allowed on the system.
Why is this less than ideal
For all other areas outside of covid, researchers need to either have access to disease specific datasets or work with local providers. Even if the use case was expanded to include areas beyond covid the OpenSafely team are constrained by their ability to approve new projects whilst maintaining the infrastructure.
What is being done about it
A third route is now available through local, regional and national “secure data environments” for which there has been a recent announcement of £200m funding from NHS England. There is a national push to use a common data standard across all these secure data environments which will allow for federated analysis across local systems.
Operating and Planning NHS Services
Operating and Planning NHS Services is a secondary use of healthcare data. To manage and plan jobs, beds and new hospitals today as well as improving how people are cared for by changing the way the NHS works tomorrow. For this, you don’t need personal data, but you do need to be able to join patients across the different NHS services so that you know information such as how many patients were admitted to hospital, discharged, and then turned up at A&E a week later (something we want to avoid). The operational side of this data often gets overlooked. Health systems need to access to close to real time data in order to make decisions such as where ambulances should be sent, or where there are available beds in mental health trusts for people in crisis.
Nationally, this relies on bulk data sent from each provider (labelled as strategic and commissioning datasets). There are currently 90 data standards and 182 collections. That means 182 datasets that must be curated, validated and transferred. These differ from extractions that we discussed above in that there is no direct link from these systems to a national database and many of these collections have some manual elements that require spreadsheet tinkering each day/week/month.
There is detailed record level data that is sent by acute hospitals, community services and mental health trusts but it takes around 4 weeks for these to be properly processed and outside of the acute hospitals the data is of such variable quality that it is very difficult to use it for planning or managing operational resources.
You will also notice that this data flows one way. There is no feedback cycle whereby data is shared back to providers outside of the published statistics. This feedback cycle is critical for improving the quality of data as providers are then incentivised to provide accurate data as it supports their own operational needs. Instead, questions that should be able to be answered by previous collections require a new collection due to poor data quality and so the number of collections increases whilst data quality remains static.
Why this is less than ideal
Running a national health service based on manually submitted templates rather than automated pipelines is prone to error and means that data teams are constantly overloaded with report requests rather than fixing the underlying data flows or standards.
In addition, as only the hospital data is considered reliable, there is less focus on community services, mental health provision and primary care. The questions seem too hard to answer and so business cases for investment are weaker compared to the acute setting which perpetuates the cycle.
What is being done about it
There is a £480m project to build a federated data platform that will link together the different IT systems found in each provider, alongside a program to speed up the data submissions through direct extraction. By decreasing the technical cost and increasing the speed of data sharing between providers, trusts, local systems, and the national team, the hope is that the NHS will be able to co-ordinate care more effectively and have the information needed to plan which services are needed where without emailing excel templates. There are no plans for this to include data from GPs, and the platform cannot directly address the data quality issues we see in mental health and community. However, there would at least be a feedback loop to these care settings which may highlight where the largest inaccuracies lie. This would provide the momentum needed to make these data flows fit for purpose, accelerating the capabilities of local areas to take charge of how to design services that best fit their population.
Looking through the data architecture of the NHS in England you can see the patchwork of policy decisions and investments that have layered on top of each other. The explosion of digitising software has brought with it a tangled network of interfaces that has made moving data around the NHS much more frustrating than it needs to be.
Here are four key tests for you to consider in your local area:
- Can you accurately record and share in real time the care that our population is receiving whether it is by the GP practice, A&E, community health team, mental health team or social care team?
- Can you use your population health management infrastructure to identify individuals that need interventions and invite them ?
- Can you enable secure data access to patient-level data including primary care and hospital event data, prescribing data, and enrich it with other sets of data (other settings of care, diagnostics, genomics) ?
- Can you accurately understand the demand, capacity and performance of the local health care system in close to real time—and into the future—so you effectively manage the health and care system?
Addressing these issues will maximise the value of the information we hold as a population to improve our health and our communities but we can’t just assume that new technology and funding will solve all our problems. It will require central guidance, local leadership as well as empathy for the clinical teams recording and accessing data. Eight ideas to start with:
- Be clear on your data strategy; how do you approach data, how do you make it work for you, how do you get the most out of your investment, what insights are you getting from it?
- Make sure you have a whole system approach to data—including data from community care and mental health as well as primary care and acute
- Get Primary Care data sharing in a similar way to hospital data so that population health management analysis can shape where we use resources
- Do a stock take of your data architecture recognising it will have grown organically and reactively, now is the time to make sure it is fit for purpose
- Create data flows that you can rely on and embed a feedback loop to improve the quality and put the right incentives in place to make this stick by prioritising, incentivising and standardising data collection
- Make staff’s lives easier by eliminating manual collections, duplicative data processes and overburdensome reporting that is a sticking plaster for legacy data issues – instead get your data flowing and automate your reporting
- Collaborate with partners to codify the population health management data infrastructure across localities
- Ensure code that touches record level data within the NHS is open sourced so that best practices can be shared and we put a stop to unnecessary duplications of effort
The UK has been at the forefront of universal care for 75 years and has the potential to provide global leadership in population healthcare services. The next five years are crucial to take advantage of the investments in local integrated data, secure research environments and federated analysis. We have moved past the fax machine and it’s time to move past the emailed excel attachment, too.
 NHS Key Statistics
 Zhang, J., Ashrafian, H., Delaney, B. et al. Impact of primary to secondary care data sharing on care quality in NHS England hospitals. npj Digit. Med. 6, 144 (2023). https://doi.org/10.1038/s41746-023-00891-y
 NHS blood pressure checks at the barbers to prevent killer conditions
 Warren LR, Clarke J, Arora S, et al Improving data sharing between acute hospitals in England: an overview of health record system distribution and retrospective observational analysis of inter-hospital transitions of care BMJ Open 2019;9:e031637. doi: 10.1136/bmjopen-2019-031637
 Healthwatch – How do people feel about their data being shared by the NHS
 GP Data for Planning and Research: Letter from Parliamentary Under Secretary of State for Health and Social Care to general practices in England – 19 July
 Fisher L, Hopcroft LE, Rodgers S, et al Changes in medication safety indicators in England throughout the covid-19 pandemic using OpenSAFELY: population based, retrospective cohort study of 57 million patients using federated analytics BMJ Medicine 2023;2:e000392. doi: 10.1136/bmjmed-2022-000392