Aesthetic Plastic Surgery, 2025 (SCI-Expanded, Scopus)
Background: The integration of large language models (LLM) into plastic and aesthetic surgery has shown promise. However, research comparing different LLMs in handling clinical scenarios and their temporal consistency remains limited. This study evaluated the performances of ChatGPT-4o, Gemini 1.5 Pro and Claude 3.5 Sonnet in aesthetic surgery scenarios. The objectives were to compare their overall performance, analyze reliability in complicated and uncomplicated cases, assess temporal consistency, evaluate performance across five clinical domains: preoperative cautions, postoperative care, holistic approach, algorithmic approach, and surgical planning. Methods: Twenty-four case scenarios (12 complicated, 12 uncomplicated) were input into the LLMs at three time points (T1, T2, T3) over two weeks. Three blinded board-certified plastic surgeons evaluated responses using a 5-point Likert scale. Statistical analyses were applied. Results: Chat GPT-4 achieved the highest mean score (4.92), outperforming Gemini 1.5 Pro (3.62) and Claude 3.5 Sonnet (3.21) (p < 0.001). It performed consistently across complicated (4.87) and uncomplicated cases (4.96) (p > 0.05) and demonstrated temporal stability (p > 0.05). Gemini 1.5 Pro showed temporal consistency for complicated cases (p > 0.05), but not in uncomplicated cases. Claude 3.5 Sonnet exhibited significant temporal inconsistencies (p < 0.05). In the domain specific analyzes, GPT-4 was superior to others. Claude 3.5 Sonnet had the lowest scores in most domains, except algorithmic approach, where it outperformed Gemini (4.4 vs. 4.1, p < 0.05). Conclusions: LLMs could be a promising tool for supporting surgical decision-making. Future research should aim to enhance LLM reliability and validate its real-world applications. Level of Evidence I: This journal requires that authors assign a level of evidence to each article. For a full description of these Evidence-Based Medicine ratings, please refer to the Table of Contents or the online Instructions to Authors www.springer.com/00266.