Abstract
OBJECTIVES: To develop a large-language-model (LLM)-centric workflow flow extraction and migration of clinician-documented colonoscopy recall recommendations from unstructured reports and letters during an enterprise-wide electronic health record (EHR) transition.
MATERIALS AND METHODS: A multi-stage workflow [Optical Character Recognition (OCR) -> LLM -> structured fields] was built around a central GPT-4 Turbo inference step following prompt optimization. Validation was performed on a held-out set (N = 326 notes) using 2-clinician consensus and then benchmarked against traditional rule-based natural-language-processing (NLP) (spaCy v3). Layered quality control-manual review, field validation, and anomaly detection-was used to assess workflow results prior to upload (N = 118 181 total patients).
RESULTS: Prompt optimization enabled GPT-4 Turbo to achieve perfect concordance with clinician review in a small test set (macro-F1 = 1.0; N = 100 patients). Expanded validation on a held-out set demonstrated improved F1 (0.89; CI = [0.65, 0.92], N = 326) relative to a traditional rule-based NLP approach (F1 = 0.78; CI = [0.58, 0.82]). The system processed 118 181 records in ≈9 hours (≈2 s/record) at a direct implementation cost of ∼$12 000.
DISCUSSION: An LLM-driven workflow safely migrated preventive-care data at population scale, with potential accuracy improvements over traditional rule-based NLP approaches and substantial reductions in time and cost relative to manual review.
CONCLUSION: LLMs can play a valuable role in high-quality structuring of clinical data, preserving longitudinal care continuity during EHR modernization.