TY - JOUR
T1 - Statistics of Generative Artificial Intelligence and Nongenerative Predictive Analytics Machine Learning in Medicine
AU - Rashidi, Hooman H.
AU - Hu, Bo
AU - Pantanowitz, Joshua
AU - Tran, Nam
AU - Liu, Silvia
AU - Chamanzar, Alireza
AU - Gur, Mert
AU - Chang, Chung Chou H.
AU - Wang, Yanshan
AU - Tafti, Ahmad
AU - Pantanowitz, Liron
AU - Hanna, Matthew G.
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2025/3
Y1 - 2025/3
N2 - The rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) in medicine has prompted medical professionals to increasingly familiarize themselves with related topics. This also demands grasping the underlying statistical principles that govern their design, validation, and reproducibility. Uniquely, the practice of pathology and medicine produces vast amount of data that can be exploited by AI/ML. The emergence of generative AI, especially in the area of large language models and multimodal frameworks, represents approaches that are starting to transform medicine. Fundamentally, generative and traditional (eg, nongenerative predictive analytics) ML techniques rely on certain common statistical measures to function. However, unique to generative AI are metrics such as, but not limited to, perplexity and BiLingual Evaluation Understudy score that provide a means to determine the quality of generated samples that are typically unfamiliar to most medical practitioners. In contrast, nongenerative predictive analytics ML often uses more familiar metrics tailored to specific tasks as seen in the typical classification (ie, confusion metrics measures, such as accuracy, sensitivity, F1 score, and receiver operating characteristic area under the curve) or regression studies (ie, root mean square error and R2). To this end, the goal of this review article (as part 4 of our AI review series) is to provide an overview and a comparative measure of statistical measures and methodologies used in both generative AI and traditional (ie, nongenerative predictive analytics) ML fields along with their strengths and known limitations. By understanding their similarities and differences along with their respective applications, we will become better stewards of this transformative space, which ultimately enables us to better address our current and future needs and challenges in a more responsible and scientifically sound manner.
AB - The rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) in medicine has prompted medical professionals to increasingly familiarize themselves with related topics. This also demands grasping the underlying statistical principles that govern their design, validation, and reproducibility. Uniquely, the practice of pathology and medicine produces vast amount of data that can be exploited by AI/ML. The emergence of generative AI, especially in the area of large language models and multimodal frameworks, represents approaches that are starting to transform medicine. Fundamentally, generative and traditional (eg, nongenerative predictive analytics) ML techniques rely on certain common statistical measures to function. However, unique to generative AI are metrics such as, but not limited to, perplexity and BiLingual Evaluation Understudy score that provide a means to determine the quality of generated samples that are typically unfamiliar to most medical practitioners. In contrast, nongenerative predictive analytics ML often uses more familiar metrics tailored to specific tasks as seen in the typical classification (ie, confusion metrics measures, such as accuracy, sensitivity, F1 score, and receiver operating characteristic area under the curve) or regression studies (ie, root mean square error and R2). To this end, the goal of this review article (as part 4 of our AI review series) is to provide an overview and a comparative measure of statistical measures and methodologies used in both generative AI and traditional (ie, nongenerative predictive analytics) ML fields along with their strengths and known limitations. By understanding their similarities and differences along with their respective applications, we will become better stewards of this transformative space, which ultimately enables us to better address our current and future needs and challenges in a more responsible and scientifically sound manner.
KW - accuracy
KW - BiLingual Evaluation Understudy
KW - F1 score
KW - perplexity
KW - precision
KW - receiver operating characteristic area under the curve
UR - http://www.scopus.com/inward/record.url?scp=85212242491&partnerID=8YFLogxK
U2 - 10.1016/j.modpat.2024.100663
DO - 10.1016/j.modpat.2024.100663
M3 - Review article
C2 - 39579984
AN - SCOPUS:85212242491
SN - 0893-3952
VL - 38
JO - Modern Pathology
JF - Modern Pathology
IS - 3
M1 - 100663
ER -