Flagship project delivers step change in text analytics capability

Nov 15, 2022

The HDR UK National Text Analytics Project team recently came together to share the impacts of their work and opportunities for clinical natural language processing (NLP). Rene Ndoyi, one of the attendees, describes his experience of the HDR UK National Text Analytics project symposium.


Author: Rene Ndoyi, Intern at Institute of Health Informatics


Maximizing text analytics capability for health data research: key learnings from the HDR UK National Text Analytics project symposium

On 28 September 2022, the HDR UK National Text Analytics Project team, led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London), came together to share the impacts of their work and opportunities for the clinical

natural language processing (NLP) community to deliver and use new NLP tools at this HDR UK symposium.


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research, and discuss what the community needs to be able to access and use NLP resources for research. One of the attendees, Rene Ndoyi describes his thoughts and learning from the symposium below.


My name is Rene Ndoyi, a recent graduate of the HDR UK Black Internship Programme and intern at the UCL Institute of Health Informatics. The internship programme was such a success in my quest to develop a career in health data science. Among the many interesting projects that I was introduced to is the National Text Analytics Resource – led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London).


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. The project has built a community and brought together specialised resources that provide researchers with the tools and support to explore unstructured free text clinical data, using natural language processing (NLP) and text analytics.


Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research. Attendees also discussed what the community needs to be able to access and use NLP resources for research.


My internship mentor, Natalie Fitzpatrick, recommended that I attend the symposium as one of the many ways that the project brings together a community but also creates awareness of opportunities for NLP research being carried out across HDR UK.


It was very insightful and interesting to learn about the work that has been done and the success the project has earned over the past five years.


As an early career researcher who is building my skills in data science, I was keen to learn of the various tools and methods that have been developed to address the challenges of using unstructured free text data. A key piece of work is CogStack, a clinical information retrieval and extraction platform to create richer, more useful clinical information to improve healthcare. The tool enables querying data, without having to code thousands of SQL queries, based on real-time data.

Another tool I learnt about was MedCAT, which extracts information from Electronic Health Records and links it to biomedical vocabulary systems like SNOMED-CT and UMLS. Both of these tools are available for the research community to use via the Health Data Research Innovation Gateway, with the code made open source on GitHub.


Efforts to develop and apply these kinds of tools are important in tackling challenges around avoiding bias, transferability and model sharing.


The team described various ways that they are approaching this – from improving access to unstructured data for research, to developing trusted models of governance and standards. They have developed a template model sharing agreement that is being used across 10 different NHS Trusts to date, so that NLP models can be shared easily.


I also learnt that analysis of free text data can be achieved through R programming, a language I am currently learning. The idea of coding reproducible step by step workflows and frameworks is related to my internship learning experiences. Under Dr Johan Thygesen’s supervision, we are exploring development of reproducible and extensible frameworks, based on a previous study that developed a framework for Covid 19 trajectories among 57 million Adults in England.


Speakers also highlighted the importance of data governance and employing user-centred approaches. Natalie Fitzpatrick gave an interesting talk on creating a free text donated databank to develop and train NLP tools. I was fascinated to hear people’s feedback about this databank. Stakeholders, including patients and the public, researchers, clinicians and information governance and ethics experts, shared their thoughts through focus groups. There was a lot of support for the databank, but important issues were highlighted, such as the need to overcome different forms of bias, lack of generalisability, poor quality of data and patients’ ability to access their data to correct errors.


From my experiences at the symposium, I have no doubt that these efforts will harness more opportunities for improved patient care. I look forward to future meetings and opportunities to learn more about the National Text Analytics Resource project.


Share

06 May, 2024
Join our DRIVE-Health community for an exciting in-person event where we dive into the world of Generative AI and how it is shaping the future of healthcare. Our theme this year is ‘From Generative AI to Generating Impact,’ which aims to explore the ways in which developing and deploying AI in the real-world influences healthcare outcomes and advances medical research. From cutting-edge technologies to real-world applications, this symposium will explore the latest trends and innovations in the field. Meet students from across KCL faculties, network with industry partners, exchange ideas, and gain valuable insights to drive results in your own projects. Don't miss out on this opportunity to be part of the conversation! Registration is required, please email drivecdt@kcl.ac.uk for further event details. Our annual Symposium is a one-day face-to-face event for all DRIVE-Health students, academic supervisors, stakeholders and partners. Our aim is to discuss translating scientific and technological innovations in AI and data science, from research to clinical practice and commercial enterprise. The symposium will feature keynote talks, panel discussions, and poster presentations showcasing cutting-edge research and successful case studies. We will also celebrate our coming together with networking drinks at the end of the symposium. The EPSRC DRIVE-Health Centre for Doctoral Training is training the next generation of PhD health data scientists to become the innovation leaders of tomorrow. Our students work within an active NHS environment, and develop new models of data-driven care, whilst leveraging significant recent investment and infrastructure in Health Data Research within the UK. By registering for this event, you give consent to provide your name, e-mail address and registration information with King's College London for the purposes of managing the EPSRC DRIVE-Health CDT's Summer Symposium. Your personal data will be managed by those organisations and by Eventbrite according to their published privacy policies.
13 Mar, 2024
DRIVE-Health has been awarded £7.9 million from The Engineering & Physical Sciences Research Council (EPSRC) for student intake from 2024 onwards. DRIVE-Health is one of 65 CDTs which received funding, totalling more than £1 billion. Using seed funding from King’s Centre for Doctoral Studies awarded in 2020, DRIVE-Health has trained 30 students to date. Building on this, the new award will support five additional cohorts at King’s, totalling at least 85 talented PhD students. The CDT is expecting to welcome its fourth intake of at least 15 students in October 2024. DRIVE-Health is the first health data science training centre in the UK to harness cross-sector collaboration across the NHS, industry, enterprise, policy makers, and academia. Working with diverse partners, DRIVE-Health PhD students develop cutting-edge models which leverage healthcare data to improve patient outcomes, streamline operations, and enhance clinical decision-making processes. EPSRC CDT DRIVE-Health’s vision is informed by three core goals: To provide world-class training in health data science research to the next generation of health data scientists, who will have the multidisciplinary skills needed to enable transformations in public health and breakthrough treatments. To solve the most challenging problems in data-driven health research through a diverse community of the brightest minds in health data science and an open, collaborative culture which fosters exchange and champions innovation. To co-create a translational cross-sector collaboration with the NHS, industry, enterprise, policy makers and academia. Professor Richard Dobson, Co-Director of DRIVE-Health and Professor of Medical Informatics at King’s IoPPN, says "As more data from biological, social, genomic, imaging, smart devices, and electronic health records becomes available, there are significant opportunities to revolutionise the way healthcare is delivered. Through DRIVE-Health, we will train some of the brightest minds in health data science to develop cutting-edge tools which utilise data to improve healthcare systems and patient outcomes." "This is an exciting time for medicine, with new data paradigms creating a novel research and implementation landscape covering the full span from cell to society. Over the next nine years, DRIVE-Health will nurture world-class researchers that will chart that landscape and drive the UK’s health data agenda." Professor Vasa Curcin, Co-Director of DRIVE-Health and Professor of Health Informatics at King’s FoLSM. The DRIVE-Health PhD Programme (2024-2032) focuses on five key scientific research themes: Sustainable health data systems engineering: Investigates methods to develop secure and scalable software systems for healthcare. Theme lead: Dr Zina Ibrahim. Multimodal patient data streams: Integrates diverse patient data types for analysis, including wearables and electronic health records. Theme lead: Dr Jorge Cardoso. Complex simulations and digital twins: Builds simulated environments to train AI models for healthcare applications. Theme lead: Dr Steffen Zschaler. Next-generation clinical user interfaces: Ensures healthcare data science applications are usable in clinical settings. Theme lead: Professor Nick Holliman. Co-designing impactful patient-centric healthcare solutions: Co-producing and co-designing healthcare solutions to maximise impact across all themes. Theme lead: Professor Claire Steves. On top of the £7.9m provided by the EPSRC, DRIVE-Health has received over £5.1m from partners, as well as in-kind contributions worth nearly £4m.
Share by: