Residency Application Selection Committee Discriminatory Ability in Identifying Artificial Intelligence-Generated Personal Statements

Issam Koleilat, Advaith Bongu, Sumy Chang, Dylan Nieman, Steven Priolo, Nell Maloney Patel

Research output: Contribution to journalArticlepeer-review

Abstract

OBJECTIVE: Advances in artificial intelligence (AI) have given rise to sophisticated algorithms capable of generating human-like text. The goal of this study was to evaluate the ability of human reviewers to reliably differentiate personal statements (PS) written by human authors from those generated by AI software. SETTING: Four personal statements from the archives of two surgical program directors were de-identified and used as the human samples. Two AI platforms were used to generate nine additional PS. PARTICIPANTS: Four surgeons from the residency selection committees of two surgical residency programs of a large multihospital system served as blinded reviewers. AI was also asked to evaluate each PS sample for authorship. DESIGN: Sensitivity, specificity and accuracy of the reviewers in identifying the PS author were calculated. Kappa statistic for correlation between the hypothesized author and the true author were calculated. Inter-rater reliability was calculated using the kappa statistic with Light's modification given more than two reviewers in a fully-crossed design. Logistic regression was performed with to model the impact of perceived creativity, writing quality, and authorship or the likelihood of offering an interview. RESULTS: Human reviewer sensitivity for identifying an AI-generated PS was 0.87 with specificity of 0.37 and overall accuracy of 0.55. The level of agreement by kappa statistic of the reviewer estimate of authorship and the true authorship was 0.19 (slight agreement). The reviewers themselves had an inter-rater reliability of 0.067 (poor), with only complete agreement (four out of four reviewers) on two PS, both authored by humans. The odds ratio of offering an interview (compared to a composite of “backup” status or no interview) to a perceived human author was 7 times that of a perceived AI author (95% confidence interval 1.5276 to 32.0758, p=0.0144). AI hypothesized human authorship for twelve of the PS, with the last one “unsure.” CONCLUSIONS: The increasing pervasiveness of AI will have far-reaching effects including on the resident application and recruitment process. Identifying AI-generated personal statements is exceedingly difficult. With the decreasing availability of objective data to assess applicants, a review and potential restructuring of the approach to resident recruitment may be warranted.

Original languageAmerican English
Pages (from-to)780-785
Number of pages6
JournalJournal of Surgical Education
Volume81
Issue number6
DOIs
StatePublished - Jun 2024

ASJC Scopus subject areas

  • Surgery
  • Education

Keywords

  • Artificial intelligence
  • Personal statement
  • Residency application

Fingerprint

Dive into the research topics of 'Residency Application Selection Committee Discriminatory Ability in Identifying Artificial Intelligence-Generated Personal Statements'. Together they form a unique fingerprint.

Cite this