Skip to main content
Surgical Informatics Lab
Primary menu
  • Research
  • People
  • Publications
  • Webinars
  • Resources
  • Open Positions

Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments

Beaulieu-Jones, Brendin R, Margaret T Berrigan, Sahaj Shah, Jayson S Marwaha, Shuo-Lun Lai, and Gabriel A Brat. 2024. “Evaluating Capabilities of Large Language Models: Performance of GPT-4 on Surgical Knowledge Assessments”. Surgery 175 (4): 936-42.
Last updated on 02/23/2025

Recent Publications

  • Is More Thinking Always Better? First Impressions of ChatGPT-5 in Surgery Conversations
  • Development of a Claims-Based Computable Phenotype for Ulcerative Colitis Flares
  • Evaluating Capabilities of Large Language Models: Performance of GPT4 on Surgical Knowledge Assessments
  • Implications of mappings between International Classification of Diseases clinical diagnosis codes and Human Phenotype Ontology terms
  • Response to: Comment on “Integrating Human Intuition into Prediction Algorithms for Improved Surgical Risk Stratification”
  • Implications of mappings between ICD clinical diagnosis codes and Human Phenotype Ontology terms
  • Twitter
Powered byOpenScholar®Admin Login