Skip to main content

Primary menu

Research
People
Publications
Webinars
Resources
Open Positions

Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments

Beaulieu-Jones, Brendin R, Margaret T Berrigan, Sahaj Shah, Jayson S Marwaha, Shuo-Lun Lai, and Gabriel A Brat. 2024. “Evaluating Capabilities of Large Language Models: Performance of GPT-4 on Surgical Knowledge Assessments”. Surgery 175 (4): 936-42.

Last updated on 02/23/2025

Recent Publications

Is More Thinking Always Better? First Impressions of ChatGPT-5 in Surgery Conversations
Development of a Claims-Based Computable Phenotype for Ulcerative Colitis Flares
Evaluating Capabilities of Large Language Models: Performance of GPT4 on Surgical Knowledge Assessments
Implications of mappings between International Classification of Diseases clinical diagnosis codes and Human Phenotype Ontology terms
Response to: Comment on “Integrating Human Intuition into Prediction Algorithms for Improved Surgical Risk Stratification”
Implications of mappings between ICD clinical diagnosis codes and Human Phenotype Ontology terms

Twitter

Powered byOpenScholar^®Admin Login