07/08/2025

The CDE-Mapper: iCARE4CVD’s new AI-based tool to standardise health data

iCARE4CVD researchers unveil the CDE-Mapper: an innovative AI tool using Retrieval-Augmented Generation to harmonise complex cardiovascular health data globally.

iCARE4CVD project is creating one of the world’s largest collections of health data from people living with cardiovascular disease. This includes lab results, diagnoses, co-existing health conditions, biomarkers, and data from wearable devices such as heart rate monitors and fitness trackers. By combining these diverse data sources, the project aims to better understand how heart disease develops and progresses, and to help design more personalised treatments tailored to each individual.

To protect people’s privacy, all this data is stored in a special system called a federated database. This means researchers can study the information without moving it or exposing personal details.

But there’s a major challenge: health data is often recorded in many different ways, depending on the hospital, country, or technology used. For example, one clinic may describe a condition using a medical term like “heart failure,” while another might store the same condition as a numerical code. Test results might use different units, and even common measurements can vary in how they’re named or structured.

And particularly in a federated system – where data remains spread across different sites – this lack of consistency makes it extremely difficult to combine and compare information. Without a shared “language” and structure, gaining meaningful insights across studies and healthcare systems is nearly impossible.

This raises a critical question: How can we ensure that all of this diverse health data is understood in the same way – so it can be reliably used for research, care, and better outcomes for patients?

The CDE Mapper

To address this, the iCARE4CVD team developed CDE-Mapper – an innovative tool designed to make health data from different locations understandable, consistent, and analysis-ready.

What sets CDE-Mapper apart is its use of a method called Retrieval-Augmented Generation (RAG). This advanced AI approach allows the tool not only to interpret data based on what it has already learned, but also to actively “look up” relevant information from trusted sources in real time.

CDE-Mapper works by linking medical terms, measurements, and even complex data entries to internationally recognised medical vocabularies, such as SNOMED, ICD, and OMOP. These so-called controlled vocabularies ensure that health data is interpreted in a consistent way, regardless of where or how it was collected.

The system is designed to:

  • Understand both simple (like “blood type”) and complex data (like “family history of high blood pressure measured at different times”).
  • Break down complex terms into smaller parts to better match them to standard medical terms.
  • Learn and improve over time by involving human experts who review and confirm its choices.

Evaluating the tool in practice

To understand how well CDE-Mapper performs in real-world settings, the research team tested the tool on a variety of data sources, including medical literature and patient records, and compared its performance against other widely used AI systems.

The goal was to assess how accurately CDE-Mapper could match different health terms and formats to standard medical vocabularies – particularly in complex and diverse data environments.

They found out that the CDE-mapper:

  • Outperformed existing methods by more than 11% in accuracy when identifying and translating health concepts.

  • Was especially strong in handling complex or multi-part data, such as a heart rate measured in a specific body position or blood levels recorded in particular units.

  • Was more consistent at finding correct matches across medical dictionaries, which is crucial for making sense of real-world health records.

What are the remaining limitations?

While CDE-Mapper is a major step forward, there are still some challenges to be aware of:

  • It relies on expert input. To work well, the tool needs high-quality examples and guidance from healthcare professionals, especially when setting it up.

  • It may struggle with rare or complex cases. For example, unusual heart conditions or specific ECG descriptions can still be difficult to interpret correctly.

  • It depends on comprehensive medical dictionaries. If the standardised vocabularies it uses are missing terms or are out of date, the tool might not find the best match.

  • It requires strong computing power. Because the system uses advanced AI, it currently needs powerful technology, which may not be easily available in all settings.

 

These areas are actively being improved as the project  continues, with the goal of making the tool faster, more flexible, and easier to use in real-world healthcare settings.

What are the implications in practice?

Making sense of real-world health records is crucial for advancing research and improving care. When health data is better organised and standardised:

  • Doctors get clearer insights, leading to better diagnoses and treatment decisions.

  • Researchers can more easily study diseases, spot trends, and develop new therapies.

  • Patients benefit from more personalised care, especially those with complex conditions like heart failure or chronic cardiovascular disease.
share on:

Subscribe to our newsletter

Subscribe to our newsletter to stay up to date with all the most recent information about iCARE4CVD.