Q&A for Big Data Session, BGS Autumn Meeting 2020

25 January 2021

Dr Oly Todd is a registrar at Bradford Teaching Hospitals NHS Foundation Trust. He is currently undertaking a PhD research study at the University of Leeds, funded for by the Dunhill Medical Trust investigating the association of blood pressure in older people with frailty. He chaired this Q&A at the BGS Autumn Meeting 2020, this conference is now available on-demand here.  

The virtual BGS Autumn Meeting included an hour’s workshop on “Big Data”, with a keynote talk by Dr Andy Clegg on the eFI mark II, and rapid fire talks by Jane Masoli, Dr Maruko, James van Oppen and a discussion with the @geridata panel, chaired by Oly Todd. The workshop looked at some of the key challenges in applying big data to research questions around ageing and mentioned helpful resources available. There were lots of good questions, not all of which we could properly address on the day. We continued the discussion and have written it up below with some links we hope are helpful. Please get in touch if you would like to get involved in the @geridata collaborative. You can also access the talks on the BGS website here.


  • Atul Anand, Geriatrics Clinical Lecturer, University of Edinburgh

  • Cini Bhanu, GP & PhD student, University College London

  • Andy Clegg, Geriatrics Professor, University of Leeds

  • Joe Hollinghurst, Data Scientist SAIL, Swansea University

  • Jane Masoli, Geriatrics SpR & PhD student, University of Exeter

  • Marc Osterdahl, Geriatrics SpR, Charing Cross, London

  • Mai Stafford, Data Scientist, The Health Foundation

  • Oly Todd, PhD student, University of Leeds

  • James Van Oppen, A&E SpR, University of Leicester

  • Katherine Walesby, Geriatrics SpR & PhD student, University of Edinburgh

  • Chris Wilkinson, Cardiology Clinical Lecturer  Newcastle University

Vicki Goodwin: much of the routine data is collected via medical records but this is missing huge amounts of data collected from e.g. community Allied Healthcare Professionals which would add detail about functional outcomes and Patient Reported Outcome Measures. How can we ensure there is data collection and linkage across different professions?

OT: Thanks Vicki, excellent question. It is vital, if we want to use routine data for the study of ageing, that more of the things we know matter get recorded in routine data. Currently, vital cardiovascular signs and diagnoses are well recorded in routine data e.g. blood pressure (BP), heart rate, or heart attacks, heart failure etc, but measures such as timed up and go, Barthel, Malnutrition Universal Screening Tool (MUST) or key symptoms such as continence, bowel function etc are not. Would welcome your thoughts, and those of others, on how we can promote this. I would suggest there are at least two problems:

  1. These recordings are not routinely undertaken universally. For this, it would be useful at least to take stock of what has been recorded routinely and compare to other studies in comparable patient groups to see what the gap is.

  2. These recordings may be done but are not recorded in a way accessible to analysis of routine data. A first step may be to develop consensus code lists for geriatric conditions, and the required tests which would be necessary to record in routine data i.e. quality of care indicators, to inform future policy and audit.

AA: This is a really important question. We have found locally that community health records tend to include much more free text data. This is often very rich in information, but lacks the simpler coding seen in administrative data. One option to consider is natural language processing (NLP) that uses rule-sets to convert patterns of text into coded data. This is a developing field in electronic health record research, but is likely to grow rapidly as artificial intelligence tools further improve in accuracy. However, I agree that an ideal scenario would be to improve standardisation of data entry through the use of structured questionnaires or similar, which would negate the need for extensive exploratory NLP research.

AC: Perhaps worth considering different electronic health record (EHR) systems across primary care and secondary care, rather than by professional group. Most routine data in the UK comes from primary care systems, and data can be inputted by a range of primary care professionals, but typically GPs and practice nurses. Primary care physios and pharmacists can input data too. Community allied healthcare professionals are often employed by Community Healthcare Trusts (secondary care, even though they work in the ‘community’), and different EHR systems in use (e.g. RIO). I think part of the problem is that linkage to community health is less common (although some places have done this, including Bradford), and the typical ‘routine data’ people are either primary care or hospital trust based. So, agree with Vicki that it could be an interesting avenue, but the approach might be through closer engagement with people working in community healthcare settings (physiotherapists/occupational therapists/podiatrists/mental health/district nurses typically) who will better understand the data that is being collected.

Susan Shenkin: Thanks very much for a great session. You've highlighted the great opportunities for using a range of data sources for answering different questions. How should new researchers get started, to understand the pros and cons of the different options?

OT: Thanks Susie, agree there are pros and cons to routine data research (as presented at this session using Clinical Practice Research Datalink (CPRD) or Secure Anonymised Information Linkage (SAIL)) versus traditional cohort studies (e.g. English Longitudinal Study of Ageing or Lothian Birth Cohort). Advantages of routine data focus on their generalisability to real-world populations, advantages of cohort studies include their better standardisation of measurements and richness of data. Choosing one over the other may depend on the research question. Hybrids of the two – e.g. UK Biobank data linked to routine data may combine advantages of both. There are also pros and cons of the different routine data sets (e.g. CPRD, SAIL, ResearchOne, Q-Research) which relate to cost, time to get data access, linkage to other forms of data (e.g. hospital episode statistics, or Office for National Statistics death data), as well as more human factors such as what is available to you locally/ in use by your team or collaborators. Worth discussing with researchers who have published from different datasets. We have people who use a range of data sets among our @geridata collaboration and could direct any people interested to panel members.

CW: I agree entirely – I think that it is really valuable to discuss the datasets that you are interested in using with others that have used them before, so that you can understand the strengths and weaknesses of each. This will allow researchers to pick the best dataset for the question – and to know how feasible the proposed project is. Part of the @geridata ambition is to provide a pool of people to contact. This paper might be helpful for an overview too.

CB: Agree with all of the above. I was initially guided by my research team - my department had a lot of expertise with The Health Improvement Network (THIN) database which is similar to CPRD, so we often share code lists and strategies for data cleaning and management which has been hugely useful. There are some differences between THIN and CPRD with regards to ethics approval and how much data is supplied for projects. I found it was best suited to my research question as prescription data is well coded and it offered large population numbers required for my study.

Vicki Goodwin: ...and make sure that routine data in medical records is of good quality?

CW: This is a really important point! Looking up papers that have used the dataset that you’re interested in, the website for the dataset, and speaking to others may give you some idea (but not always)! And, of course, acknowledging the limitations of the work, including the dataset.

CB: Agree with above. For large UK primary care databases there is plenty of work that has been done to guide how we interpret routine data and ensure it is of suitable quality, factoring in the specific limitations with primary care recording. For example, this study looks at different data quality filters.

Helen Jones: How is access to CPRD controlled?

MO: Hi Helen. CPRD is controlled by Independent Scientific Advisory Committee (ISAC), established to ensure that research using their data is high quality and relevant to the public interest for which it is collected. It requires a formal application, set out like a short research protocol, and it is usually beneficial to work with a group where there will be someone with knowledge of UK primary care. There is currently an expedited process for work involving Covid-19. More details are available here  - good luck!

Helen Jones: How do you ensure patient reported outcome measures (PROMS) are feedback in practice to help improve services

JvO: PROM instruments report individuals’ health outcomes and can be used to monitor progress. My view is that the data belongs just as much to our patients as it does to us, and so there is a need to use it for maximal benefit. Having PROM data at the clinician-patient interface could help to form an agenda for the consultation and provide focus for clinical communication. For instance, Rheumatologists commonly use the Disease Activity Score to monitor treatment response and adjust interventions for people living with rheumatoid arthritis. Of course, there is some bias in imagining expanding this approach on a wide scale, as clinicians who are already well-attuned to practising person-centred medicine will be more likely to review and reflect on available PROMs data in order to identify shared outcome goals. One of our current projects is to observe and evaluate how PROMs are used by Emergency Department clinicians when data collected in the waiting room is made readily available for the consultation… so I’ll report back when we know more!

Deborah Thompson: Hi James, are you planning to include experience based design? James, we also have some staff experience based design that mirrors the patient experience. 

JvO: Currently, quality improvement projects and service redesign initiatives often use service metrics (for example, admission duration or time to medication administration), and all-too-often neglect to ask patients what matters to them. This is probably because electronic databases automatically and rapidly report service metrics, whereas it takes time and money to gather survey and interview feedback from patients. Having accessible, validated, and reliable PROMs for older people will allow clinicians and healthcare managers to include person-centred and meaningful endpoints when monitoring the impact of interventions.

MS: This question about PROMs/functional outcomes is interesting. I’ve been wondering how we could demonstrate the value to healthcare/patient outcomes/patient experience of including functional measures. Let me know if you think of a good analytical project on that!

MS provided the following specific links from The Health Foundation which relate to topics discussed at the BGS session, which may be helpful to people interested in this area:


Add new comment

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.