FORESIGHT: A novel GPT-based pipeline trained on NHS data
February 7, 2023
Zeljko Kraljevic
outlines how these foundation models for medicine can provide the potential for a diverse integration of medical data that includes electronic health records, images, lab values, biologic layers such as the genome and gut microbiome...
Over the past four years, the AI world has surged ahead with large language models (LLMs), also known as “foundation models” which can be adapted to achieve many linguistic tasks. You’ve probably seen a plethora of articles in the media recently about some of these models (ChatGPT, Dalle-2), that can write coherent essays, write code, but also generate art and films, and many other capabilities.
With the NHS at breaking point, a critical question is whether these AI approaches could be used to improve care. Hospital records hold detailed information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Temporal modelling of this medical history, which considers the sequence of events, could be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications.
I have developed
Foresight
as part of the
CogStack
platform, a novel GPT-based pipeline that is trained on NHS data to forecast future medical events such as disorders, medications, symptoms and interventions.
On tests in two large King’s Health Partner hospitals (King’s College Hospital, South London and Maudsley) and the US MIMIC-III dataset Foresight performed well when set challenges by clinicians. The model is being used for many uses including real-world risk estimation, virtual clinical trials and clinical research to study the progression of diseases, simulate interventions and counterfactuals, and for educational purposes.
Preprint:
https://arxiv.org/abs/2212.08072

Share

We are looking forward to welcoming Professor Honghan Wu, Professor of Health Informatics and AI at the University of Glasgow, who will deliver his talk “Large language model and Radiology: how to facilitate human and AI collaboration? " as part of our Seminar Series. Abstract: In this upcoming talk, Professor Honghan Wu explores the essential shift from viewing AI as a potential replacement for radiologists to recognizing it as a critical collaborative partner. Moving beyond basic tasks like detection and triage, the presentation highlights how AI can address practical clinical "pain points," such as reducing automated protocoling time by up to 60% and decreasing the time spent communicating with providers and patients by 30%. Professor Wu will present recent research on using knowledge-retrieval and Large Language Models for clinical report error correction and generation. The session concludes with an examination of the real-world deployment lifecycle, discussing the challenges of monitoring the over 700 FDA-cleared radiology AI devices currently in practice Seminar Series Event : “Large language model and Radiology: how to facilitate human and AI collaboration?" Date and Time: Thursday 25 June 2026, 15:00 – 16.00 hrs (BST) Location: Large Committee Room, Hodgkin Building, Guy's Campus Attendance: Mandatory for all DRIVE-Health students; a calendar invitation has already been sent. Registration: Alumni and wider King's College London research community all welcome - please email drive-health-cdt@kcl.ac.uk to let us know if you would like to attend. Biography Honghan Wu is a Professor of Health Informatics and AI, based in the School of Health and Wellbeing of the University of Glasgow, where he leads the research theme of data science and AI. Prof Wu is a co-director of Health Data Research Scotland. He also is an honorary professor at Hong Kong University, an honorary associate professor at Institute of Health Informatics, UCL, and a former Turing Fellow of The Alan Turing Institute, UK's national institute for data science and artificial intelligence. Prof Wu holds a PhD in Computing Science. His current research focuses on machine learning, natural language processing, knowledge graph and their applications in medicine.

We are pleased to welcome Simon Ellershaw, PhD Candidate at University College London (UCL) as part of the UKRI UCL Centre for Doctoral Training in AI-enabled Healthcare Systems, who will deliver his talk “Developing Healthcare LLMs: From the NHS to Silicon Valley " as part of our Seminar Series. Abstract: This talk links my PhD and my Silicon Valley internship through one theme: what it really takes to build and deploy LLMs in healthcare. I will introduce Foresight England (Foresight E), a national-scale generative foundation model trained from scratch on 54.9 million de-identified longitudinal NHS EHRs to model patient timelines and enable zero-shot prediction across around 40,000 coded medical events. As NHS England has paused data access pending review, I will focus on the core methodology and lessons learned. I will then switch to my Parexel internship in San Francisco, where I worked in the company’s AI lab on production-focused applications, including pharmacovigilance and protocol de-risking. I will explain how I ended up there, what I worked on, and what I learned, with a candid view of what day-to-day life and work in the Bay Area actually looks like. I will also reflect on how the recent generative AI boom has reshaped the problems teams like ours choose to tackle and the way this work gets built, evaluated, and shipped. Seminar Series Event : “Developing Healthcare LLMs: From the NHS to Silicon Valley" Date and Time: Wednesday 27 May 2026, 15:00 – 16.00 hrs (BST) Location: Judy Dunn, SGDP Building, Denmark Hill Campus Attendance: Mandatory for all DRIVE-Health students; a calendar invitation has already been sent. Registration: Alumni and wider King's College London research community all welcome - please email drive-health-cdt@kcl.ac.uk to let us know if you would like to attend. Biography Simon Ellershaw is a PhD Candidate at University College London (UCL) as part of the UKRI UCL Centre for Doctoral Training in AI-enabled Healthcare Systems, supervised by Prof Richard Dobson and Dr Anoop Shah. His research spans LLM-based generation of hospital discharge summaries, national-scale pre-training of generative models on 57 million electronic health records, and post-training using real-world patient outcomes as verifiable reinforcement-learning rewards. Alongside his PhD, he interned at Parexel AI Labs and now works part-time as an NLP Engineer, developing and deploying production LLM/NLP systems, including applications in pharmacovigilance and quality assurance.



