The integration of artificial intelligence (AI) into clinical research has transformed how medical data is analyzed, enabling the creation of predictive tools that enhance decision-making. During the ICTHIC webinar “Managing bleeding in cancer: from anticoagulants to artificial intelligence,” Dr. Miren Taberna presented her pioneering work on utilizing AI and natural language processing (NLP) to develop risk assessment models for cancer patients [1].
Her discussion emphasized AI’s potential in predictive modeling for bleeding and thromboembolic events, offering new insights into personalized medicine. You can also watch Miren Taberna’s lecture in the video below and the full webinar recording here.
Building a Predictive Infrastructure: Study Design and Data Integration
Dr. Taberna began by outlining the study framework, which included data from nine hospitals spanning four years (2014–2018). The research aimed to leverage both structured and unstructured clinical data from electronic health records (EHRs) to develop robust predictive models. A key challenge was the harmonization of diverse EHR systems across institutions, which required advanced methodologies for de-identification and data integration [1].
The process began with extracting clinical data from EHRs using NLP to interpret free-text notes. This step involved sophisticated algorithms capable of understanding medical terminologies, such as SNOMED-CT, and assigning contextual meanings to terms. For example, the system could distinguish between “bleeding” as a current diagnosis versus a family history or a potential risk factor. Once de-identified and harmonized, the data was subjected to machine learning (ML) techniques to identify patterns and build predictive models [1].
Natural Language Processing in Action: Ensuring Data Quality
A significant portion of Dr. Taberna’s work revolved around optimizing NLP tools for extracting meaningful data from unstructured sources. She highlighted the complexities of teaching AI systems to manage ambiguities, such as differentiating between acronyms like “MM” (multiple myeloma, millimeters, or malignant melanoma) or identifying anemia through indirect markers like hemoglobin levels, transfusion history, or inferred diagnoses. These processes required rigorous manual annotation and validation by medical experts to ensure the models’ accuracy and reliability [1].
The external evaluation methodology, designed by Dr. Taberna’s team, involved validating NLP outputs against manually curated datasets. This quality assurance step was critical for maintaining the integrity of data used in predictive modeling, particularly in adapting NLP tools to linguistic variations across regions. The resulting structured database provided a high-quality foundation for developing predictive algorithms [2].
Predictive Models for Bleeding and Thromboembolism
Using this enriched dataset, Dr. Taberna’s team created predictive models for two critical complications in cancer patients receiving anticoagulation therapy: major bleeding and recurrent thromboembolism. The bleeding risk model, developed from a cohort of 1,816 patients, identified six key predictors: hemoglobin levels, presence of metastasis, patient age, platelet count, leukocyte count, and serum creatinine levels. These models were trained on 75% of the dataset and validated on the remaining 25%, achieving moderate predictive performance with an area under the curve (AUC) of approximately 0.6 [1].
Similarly, a recurrence model for venous thromboembolism (VTE) incorporated additional factors such as family history of VTE, adenocarcinoma histology, and specific thrombotic events like pulmonary embolism or deep vein thrombosis. While the performance of these models was also moderate, their integration into clinical guidelines, such as those of the Spanish Society of Medical Oncology, underscored their potential to refine patient risk stratification [3-4].
Real-World Applications and Challenges
The study analyzed data from nearly three million patients, ultimately narrowing the cohort to 16,400 patients with cancer and VTE. From this population, patients meeting strict inclusion criteria were further refined into a smaller dataset for algorithm training. This meticulous approach ensured that the models were based on incident cases with adequate follow-up, reducing biases and enhancing clinical applicability [1].
Despite these advances, Dr. Taberna acknowledged the challenges of translating AI models into everyday practice. A primary concern is ensuring that algorithms are both mathematically sound and clinically relevant. Collaboration between medical and technology teams is essential to develop tools that align with clinical workflows and decision-making processes. Additionally, variations in EHR documentation practices across institutions remain a barrier to broader implementation [1].
Conclusions
Dr. Taberna concluded her presentation by emphasizing the transformative potential of AI in oncology. By enabling the creation of predictive tools tailored to individual patient profiles, AI can help clinicians make more informed decisions and improve outcomes. However, she stressed the importance of robust validation processes and interdisciplinary collaboration to ensure that these technologies meet the highest standards of quality and clinical utility. The success of this research highlights AI’s role as a catalyst for innovation in risk assessment and personalized medicine.
Watch Taberna’s lecture:
References
- Muñoz Martín JA et al. Prediction model for major bleeding in anticoagulated patients with cancer-associated venous thromboembolism using machine learning and natural language processing. Clin Transl Oncol. 2024; Online ahead of print.
- Canales L et al. Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology. JMIR Med Inform. 2021; 9(7):e20492.
- Muñoz Martín JA et al. Development of a predictive model of venous thromboembolism recurrence in anticoagulated cancer patients using machine learning. Thromb Res. 2023; 228:181-188.
- Morán LO et al. SEOM clinical guidelines on venous thromboembolism (VTE) and cancer (2023). Clin Transl Oncol. 2024; 26(11):2877-2901.