ACVR: Tufts - Accuracy of AI software for detection of confirmed : simplebooklet.com

Received: 11 July 2021 Revised: 22 January 2022 Accepted: 23 January 2022DOI: 10.1111/vru.13089ORIGINAL INVESTIGATIONAccuracy of artificial intelligence software for the detection ofconfirmed pleural effusion in thoracic radiographs in dogsThiago Rinaldi Müller1Mauricio Solano1Mirian Harumi Tsunemi21Department Clinical Sciences, TuftsUniversity Cummings School of VeterinaryMedicine, North Grafton, Massachusetts, USA2Department of Biostatistics, São Paulo StateUniversity. R. Prof. Dr. Antônio Celso WagnerZanin, São Paulo, BrazilCorrespondenceThiago Muller, Clinical Sciences Department,Cummings School of Veterinary Medicine, 200Westboro Rd, North Grafton, MA 01536, USA.Email: Thiago.Muller@tufts.eduPrevious presentation or publicationdisclosure: The results of this study werepresented at the 2021 ACVR Annual ScientificMeeting, Virtual meeting, October 20–23,2021.Reporting guideline disclosure: The followingwas used for guidance in the preparation ofthis manuscript: Mongan J, Moy L, Kahn CE Jr.Checklist for Artificial Intelligence in MedicalImaging (CLAIM): a guide for authors andreviewers. Radiol Artif Intell 2020;2(2):e200029.https://doi.org/10.1148/ryai.2020200029AbstractThe use of artificial intelligence (AI) algorithms in diagnostic radiology is a developingarea in veterinary medicine and may provide substantial benefit in many clinical set-tings. These range from timely image interpretation in the emergency setting when noboarded radiologist is available to allowing boarded radiologists to focus on more chal-lenging cases that require complex medical decision making. Testing the performanceof artificial intelligence (AI) software in veterinary medicine is at its early stages, andonly a scant number of reports of validation of AI software have been published. Thepurpose of this study was to investigate the performance of an AI algorithm (VetologyAI®) in the detection of pleural effusion in thoracic radiographs of dogs. In this retro-spective, diagnostic case–controlled study, 62 canine patients were recruited. A con-trol group of 21 dogs with normal thoracic radiographs and a sample group of 41 dogswith confirmed pleural effusion were selected from the electronic medical records atthe Cummings School of Veterinary Medicine. The images were cropped to include onlythe area of interest (i.e., thorax). The software then classified images into those withpleural effusion and those without. The AI algorithm was able to determine the pres-ence of pleural effusion with 88.7% accuracy (P < 0.05). The sensitivity and specificitywere 90.2% and 81.8%, respectively (positive predictive value, 92.5%; negative predic-tive value, 81.8%). The application of t his technology in the diagnostic interpretationof thoracic radiographs in veterinary medicine appears to be of value and warrants fur-ther investigation and testing.KEYWORDSimaging algorithms, machine learning, convolutional neural network, pleural effusion, thorax,radiograph, dog, c anine, X-ray1 INTRODUCTIONRecently, technological innovations in artificial intelligence (AI) andmachine learning (ML) software have been shown to be useful to med-ical and veterinary professionals.1,2. Artificial intelligence can analyzeimages to recognize objects of interest and distinguish certain featuresin a variety of medical conditions in different imaging modalities.3–9Abbreviations: AI, artificial intelligence; CNN, convolutional neural network; DL, deeplearning; ML, machine learning.AI can be defined as a set of computer algorithms that attempt tosimulate the problem-solving capacity and cognitive function of thehuman brain. AI software tries to recreate the cognitive function ofthe human brain.10ML is a form of AI that applies a specific algo-rithm to observational data points without the need for additional pro-gramming by software developers. Another form of AI is representa-tion learning (RL). In RL, the algorithm detects certain features thataim to classify a certain condition. For example, in the context of thispaper, the features could be the abnormal radiographic findings thatVet Radiol Ultrasound. 2022;1–7. © 2022 American College of Veterinary Radiology 1wileyonlinelibrary.com/journal/vru

Page 2

2 MÜLLER ET AL.a radiologist uses to conclude that there is pleural effusion. The lat-ter is viewed as the condition. RL tends to become more accurate asmore data become available. Deep learning (DP), another subset ofAI, uses multiple algorithms to analyze data. To develop an efficientDP system, multiple images need to be available to the software toimprove accuracy when compared to the “correct answer.” The cor-rect answer can be viewed as the gold standard, which i n the field ofAI, i s known as the ground truth.10,11A complete explanation of howAI works is beyond the scope of this article. However, two premierarticles on how AI works can be found in the list of references at theend of this paper.10,11There is considerable potential for AI to pro-vide an initial interpretation of medical images in thoracic radiographsin human medicine. For instance, in human patients, AI algorithmshave been created to evaluate thoracic radiographs for pneumothoraxand tuberculosis or to detect other key extrathoracic findings, such asbony fractures12–14. Furthermore, AI has been demonstrated to aid indecision-making, predicting the mortality risk in patients with COVID-19 with high accuracy using the patients’ physiological conditions,symptoms, preexisting conditions, and demographic information.15AIhas also demonstrated the ability to recognize ischemic strokes onMRI.16Previous studies have shown that, for some specific tasks, AIsystems are already outperforming humans in the detection of breastcancer using digital mammography.17The quantity of peer-reviewedarticles published annually involving deep learning (DL) or convolu-tional neural networks (CNNs), another commonly used term thatgroups different AI methodologies, has increased exponentially in thepast 5 years.18,19Advances in AI have yet to be assessed compre-hensively in veterinary diagnostic imaging. To date, few studies test-ing the application of deep learning in veterinary medicine have beenpublished.In 2018, a study comparing two strategies to separate normal ver-sus abnormal thoracic radiographs in dogs showed good performanceof the AI software in detecting abnormalities and assisting generalpractitioners.20In 2020, a pilot study used AI techniques to screen tho-racicradiographsfor the detection of canine left atrial enlargement andcompared the results with those of veterinary radiologist interpreta-tions. The overall accuracy of the CNN algorithm and veterinary radi-ologists in that study was identical.21Studies have demonstrated clas-sification accuracy in the detection of thoracic abnormalities such asgeneralized cardiomegaly22, tracheal collapse, left atrial enlargement,alveolar pulmonary patterns, pneumothorax, and pulmonary masses indogs.20Another report also showed that CNN was able to identify mul-tiple thoracic lesions in canine as well as feline radiographs.2One ofthe main challenges of AI software is in obtaining a large quality dataset for training. Another challenge faced by AI is training for rare dis-eases, as there are few examples from which AI software can learn,and validation can be difficult given the requirement for large samplesizes.13While recent publications evaluating thoracic radiographs indogs using DL approaches2,20,23provided strong evidence for the util-ity and accuracy of AI, these studies compared radiology reports with-out confirmation of the various disease processes investigated, includ-ing pleural effusion. Whereas this is an accepted methodology, pitfallsmay occur with this method, as abnormal findings that are the focus ofany given study might not be systematically mentioned in the radiol-ogist’s reports. Generally, the radiology report can be subjective andmay not be supported by later evidence, such as histopathology orsurgery findings. Although the performance of AI algorithms in vet-erinary medicine is beginning to be tested, they are still lagging whencompared to human medicine, especially in validating the accuracy ofthe technology.18,19The authors believe this new technology shouldbe validated before its application in day-to-day veterinary medicine.However, commercially available products are already being offered toprivate veterinary practitioners and practices. Hence, validation of thetechnology has a sense of urgency.A variety of diseases can lead to the abnormal accumulation of fluidin the pleural space.24,25,26,27Thoracic radiographs are arguably themost efficient method to detect and subjectively quantify pleural effu-sion (PE), and the commonly seen radiographic signs of PE have beenwell described.28Positioning, adipose tissue, accumulation, and dis-eases such as pleural nodules or masses can sometimes be misinter-preted as PE.29The AI software selected for this study, Vetology AI Guardian (Vetol-ogy Innovations, San Diego, CA, USA), was created to produce diag-nostic reports for the evaluation of canine thoracic radiographs. Thissoftware uses multiple CNN algorithms and has been developed usingdeep learning best practices. The testing and training involve com-parison to the ground truth of an ACVR-certified veterinary radiolo-gist’s report. The AI-based software is directed to identify a varietyof routinely assessed features in radiographic images of the caninethorax. The purpose of this study was to investigate the perfor-mance of an AI algorithm for the detection of confirmed PE in tho-racic radiographs of dogs. The authors hypothesized that AI mayhave satisfactory accuracy as a screening method in the detectionof PE.2 MATERIALS AND METHODS2.1Experimental design and selection of subjectsThe present retrospective, diagnostic, case–control study was con-ducted under the approval of the Foster Hospital for Small AnimalsHospital Director’s office. This includes approval for the use of the data.All the images were individually evaluated by two of the authors (T.M.,diagnostic imaging resident, and M.S., ACVR-certified veterinary radi-ologist) with over 10 and 30 years of experience in small animal diag-nostic imaging, respectively. The sample size in each group was definedby power analysis. A retrospective analysis of the electronic medicalrecords at the Cummings School of Veterinary Medicine at Tufts Uni-versity between January 2009 and October 2020 was reviewed. Inclu-sion criteria were availability of diagnostic quality orthogonal radio-graphic projections, radiographic reports that contained a diagnosis ofPE, and confirmation of pleural fluid by thoracocentesis (17), thoracicultrasound (13), surgery (9), CT (1) or MRI (1). None of the patientshad manual inflation of their lungs or were under general anesthesiafor acquisition of the radiographs.

Page 3

MÜLLER ET AL. 3Additional normal thoracic radiographs were obtained frompatients without clinical evidence of thoracic disease and with noprevious history of PE. A study was considered normal when noabnormalities were present in the lungs, cardiovascular structures,pleural space, or mediastinum. These patients were presented to pre-operatory examination or preanesthesia examination with unrelatedthoracic abnormalities such as cervical spinal pain, spinal cord lesionsand dental disease.Before all cases were submitted for AI analysis, the authors (TMand MS) jointly evaluated each of the radiographs and, by consensus,graded the degree of severity of the pleural effusion as mild, moder-ate, or severe. Patients without pleural effusion and a normal thoraxwere assigned to group 1. The confirmed cases of pleural effusion wereassigned to group 2.2.2 Data recording and analysisDigital radiographs were received by the Vetology AI software instandard DICOM format. Image processing included the applicationof intensity normalization, denoising, and gamma correction to theextracted images. The images were cropped to include only the areaof interest (i.e., thorax). The software then classified images into thosewith pleural effusion and those without. The CNN was trained onapproximately 2000 images of pleural effusion and approximately2000 images of normal patients on the TensorFlow platform. The nor-mal set did not contain other diseases. TensorFlow is an open-sourceplatform that is used for machine learning. The platform provides a userinterface for executing AI algorithms.30A broad range of digital radio-graphs from clinical cases with varied canine breeds, ages, geographies,and digital X-ray systems were used to source training data representa-tive of diverse real-world cases. The training set of images was selectedand labeled for disease status by technicians under the supervision ofboard-certified radiologists. A k-fold (k = 3) cross-validation algorithmassessed model performance. The CNN architecture is VGG16.31TheCNN is a binary classifier that provides a probability of pleural effusion.The specific threshold was selected prior to this study based on inter-nal testing.2.3 Statistical analysisThe results generated by the AI software for groups 1 and 2 were sta-tistically evaluated.Statistical significance was considered for a P-value< 0.05. Efficiency in discriminating effusion was assessed by statisticson sensitivity, specificity, predictive values, and accuracy.32,33Confi-dence intervals (CIs) of 95% of the agreement and efficiency param-eters were constructed. All statistical analyses were performed byan independent statistician (MT) using a statistical software package(R Core Team, 2020. Vienna, Austria). Factors such as age, weight,and breed were not included in the statistical analysis given that thesoftware was trained to recognize and interpret varied canine breedsof different ages.3 RESULTSAll studies were produced using commercially available DR equipment(Canon Digital Radiography Systems, CXDI 17 × 17 flat panel detec-tor and image postprocessing software Sound Smart DR, Carlsbad, CA)with exposure techniques that varied according to the thickness of theanimal, with kVp ranging from 80–90 and mAs from 3.5–8.0. The DRequipment uses grid suppression software. The inclusion criteria weremet by 62 dogs of different breeds. The median age of dogs in the nor-mal group was 10 years (range 3 to 14 years), and in the pleural effu-sion group, it was 12 years (range of 4 to 24 years). The median weightof dogs in the normal group was 16 kg (range 6.0–47.7 kg). The medianweight of dogs in the pleural effusion group was 32 kg (range 7.5–55.0kg). Dog breeds included American Pit Bull Terrier, American Stafford-shire Terrier, Beagle, Border Collie, Boston Terrier, Chihuahua, CockerSpaniel, French Bulldog, German Shepherd, Goldendoodle, Grayhound,Labrador, Maltese, Miniature Poodle, mixed breed, Portuguese WaterDog, Pug, Schnauzer, Scottish Deer Hound, Siberian Husky and TibetanTerrier.Forty-one included dogs had confirmed pleural effusion, and 21dogs had normal thoracic radiographs. A total of 173 images were eval-uated from the 62 patients. This included 62 right lateral images, 49left lateral images, and 62 ventrodorsal images. Evaluators were awareof the animal’s presenting complaint for group 1 (n: 21). This groupincluded dogs undergoing a preanesthesia workup with unrelated tho-racic abnormalities. Group 2 (n: 41) included animals with a history oftraumatic metastatic disease, cardiomyopathy, thoracic neoplasia, andlung lobe torsion.The final diagnosis leading to pleural effusion included diaphrag-matic hernia (3), chylothorax (4), lung neoplasia (6), heart base mass(1), lung lobe torsion (4), rib neoplasia (4), mediastinal mass (6),metastatic disease (6), fungal disease with pleuritis (1), and heart fail-ure (6). The authors classified the confirmed pleural effusion casesas having mild (9), moderate (22), and severe (10) volumes of pleuraleffusion.The AI software detected pleural effusion in 37/41 of the confirmedpleural effusion cases and correctly classified 18/21 of the normalcases.The sensitivity of the AI model to recognize pleural effusionwas 90.2% (95% CI 0.768–0.972), and it showed a specificity of85.7% (95% CI 0.636–0.969). The positive and negative predic-tive values of the AI software for predicting pleural effusion were92.5% (95% CI 0.796–0.984) and 81.8% (95% CI 0.597–0.948),respectively. The diagnostic accuracy of the AI software was 88.7%(95% CI 0.781–0.953).Examples of radiographs correctly and incorrectly classified by theAI model are reported in Figures 1 and 2, respectively.

Page 4

4 MÜLLER ET AL.FIGURE 1 Examples of radiographs correctly classified by the AI model as having pleural effusion. A, Right lateral (kVp 80, mAs 6.5) and B,ventrodorsal (kVp 90, mAs 6.5) radiographic projections of a dog with mild and unilateral signs of pleural effusion. There is a vesicular pattern inthe cranioventral aspect of the lung fields on the lateral projection (arrowhead). The free fluid accumulates ventral to the heart, increasing theradiographic opacity of the mediastinal fat (arrow). This dog had confirmation of left cranial lung lobe torsion and pleural effusion on surgeryFIGURE 2 Examples of radiographs incorrectly classified by the AI model as having pleural effusion. A, Left lateral (kVp 80, mAs 6.5) and B,ventrodorsal (kVp 90, mAs 6.5) radiographic projections of a dog without pleural effusion. Note the fat opacity ventral to the heart (arrow). Theexcessive fat accumulation in the mediastinum is mistakenly assigned a yes-effusion value, likely as the radiographic density of fat allowingvisualization of the apex of the heart was not considered by the algorithmThe AI model failed to identify four cases of pleural effusion thatwere previously classified by the two authors as mild (3) (Figure 3)andmoderate (1) (Figure 4).4 DISCUSSIONThe authors use the term validation of the software specifically todetermine the sensitivity and specificity of the AI algorithm to detectpleural effusion in test groups A and B. This is limited to a popula-tion of 41 abnormal and 21 normal subjects. It is worth highlightingin this discussion that there is a separate and independent data setused by the developers of any AI software to train their software. Thelatter is often a larger data set in the thousands that contain a myr-iad of pathological processes, including pleural effusion. However, justbecause AI software has been trained with a larger data set, using amyriad of pathological processes does not mean that the software hasbeen validated against an external independent data set. The results of

Page 5

MÜLLER ET AL. 5FIGURE 3 Examples of radiographs incorrectly classified by the AI model as not having pleural effusion. A, Right lateral (kVp 80, mAs 4.5) andB, ventrodorsal (kVp 90, mAs 4.5) radiographic projections of a dog with mild and unilateral pleural effusion. Arrows indicate fluid lines indicatingaminimal amount of fluid within the pleural space. The amount of fluid in the lateral projection obscures the apex of the heart. This dog hadconfirmation of a heart base mass by echocardiography. Pleural effusion was confirmed by thoracic ultrasoundFIGURE 4 Examples of radiographs incorrectly classified by the AI model as not having pleural effusion. A, Right lateral (kVp 80, mAs 6.5) andB, ventrodorsal (kVp 90, mAs 6.5) radiographic projections of a dog with moderate and bilateral pleural effusion (arrows). This dog hadconfirmation of pleuritis by thoracentesis due to coccidioidomycosis. The software did not recognize the soft tissue opacity obscuring portions ofthe heart as free pleural fluid. The lung borders are highlighted by the fluid (scalloping), as noted by the arrowheads in the lateral viewthis study suggest that the application of the AI model could assist vet-erinarians in the detection of pleural effusion in thoracic radiographsand possibly serve as a screening tool for triaging a patient. Remark-ably, a similar sensitivity (91%) and specificity (91%) was obtained byother authors9in detecting pleural effusion in human patients with theuse of C NN. A similar performance was also found by CNN in veteri-nary patients.20,23However, these previous studies relied only on theradiologist report, and confirmation of the disease was not obtained.Using exclusively the radiologist’s report as the ground truth for thisanalysis can create some pitfalls9, as the accuracy of the radiologist indetecting pleural effusion is estimated to vary from 67% to 92%.34,35There are no reports evaluating the accuracy of a board-certified radi-ologist in detecting pleural effusion in the veterinary literature. Pleu-ral effusion can be misinterpreted in cases of pleural masses, adiposetissue accumulation within the mediastinum, and positioning of thepatient. In other words, the authors speculate that the radiographicreport is not the ideal ground truth, as it is subject to the accuracyof a radiologist. The test group in this work (confirmed cases of pleu-ral effusion) was selected using various methods, such as thoracente-sis or surgery. While the authors reviewed the abnormal cases and by

Page 6

6 MÜLLER ET AL.consensus agreed that all subjects in the test group had radiographicevidence of pleural effusion, the software performance was testedagainst the proven presence of pleural effusion.The AI model used, tested on a small database of radiographicimages, showed a high classification accuracyin the detection of pleuraleffusion in dogs and, as importantly, in the detection of normal thoracicradiographs.One false-negativecase of pleuraleffusion exhibitedsubtle changes.This case was classified as mild by the authors. This highlightsthe fact that the CNN may also encounter difficulties in correctlyidentifying subtle sets of abnormalities and assigning the incor-rect output, similar to what a radiologist may encounter in prac-tice. On the other hand, another false negative case had a severevolume of pleural effusion. The cause of this misdiagnosis remainsunclear, as it was deemed a case of severe pleural effusion by theauthors.The aim of the present study was to test the ability of the AI modelto detect PE as an isolated abnormality and not to determine the causeor classify the severity of PE. The software was programmed to auto-matically classify canine thoracic radiographs as either no-pleural effu-sion or Yes-pleural effusion. In other words, the ability of the soft-ware to discern pleural effusion as the result of right heart failure orlung lobe torsion was not tested. This requires collecting a larger setof a specific disease process to achieve statistical significance. More-over, even with a training data set in the thousands, the develop-ers of the software do not have enough data points to classify effu-sion as mild, moderate, and severe. Further grading the severity ofthe pleural effusion will improve the analysis and understanding ofthe software limitations, but it remains beyond the capabilities of thesoftware.The authors argue that any AI software that is being devel-oped reflects the accuracy of the data it was trained against, whichwas created by specialists. This includes the ability of the imagingexpertstoaccuratelyprovidethekeyfeaturesthatarepresentinany given pathological process. This is one of the key issues thatshould be at the forefront of veterinary practitioners in success-fully understanding the role of AI in day-to-day practice. While val-idation of the software to detect pleural effusion has been shown,correlating the abnormal radiographic finding of pleural effusionwith other abnormal radiographic findings to determine the eti-ology of the effusion, e.g., a pulmonary mass, diaphragmatic her-nia, or heart disease, was not possible due to the small number ofcases.It is the opinion of the authors that AI software as applied to the vet-erinary imaging field is here to stay. However, validating such technol-ogy is essential to ensure the correct use of the technology in day-to-day practice. Although there is no evidence that AI can make complexproblem-solving decisions typical of a radiologist as reflected in radiol-ogy reporting, AI shows promise to at least be used as a screening toolfor general practitioners. The user of such technology must, however,have proper expectations and a clear understanding of the current pit-falls and advantages of such technology.ACKNOWLEDGMENTSThe authors would like to acknowledge Vetology for providing the AIsoftware for testing.LIST OF AUTHOR CONTRIBUTIONSCategory 1(a) Conception and Design: Solano, Müller, Tsunemi.(b) Acquisition of Data: Müller(c) Analysis and Interpretation of Data: Solano, Müller, TsunemiCategory 2(a) Drafting the Article: Müller(b) Revising Article for Intellectual Content: Solano, Müller,TsunemiCategory 3(a) Final Approval of the Completed Article: Solano, Müller,TsunemiCategory 4(a) Agreement to be accountable for all aspects of the work inensuring that questions related to the accuracy or integrityof any part of the work are appropriately investigated andresolved: Solano, Müller, TsunemiCONFLICT OF INTERESTThe authors have declared no conflict of interest.ORCIDThiago Rinaldi Müllerhttps://orcid.org/0000-0002-7494-8588REFERENCES1. Ci_han P, Gokçe E, Kalipsiz O. A review of machine learning applica-tions in veterinary field. Kafkas Univ Vet Fak Der 2017;23:673–80.2. Boissady E, de La Comble A, Zhu X, Hespel A-M. Artificial intelligenceevaluating primary thoracic lesions has an overall lower error ratecompared to veterinarians or veterinarians in conjunction with theartificial intelligence. Vet Radiol Ultrasound. 2020;1–93. Vinicki K, Ferrari P, Belic M, Turk R. Using convolutional neural net-works for determining reticulocyte percentage in cats. http://arxiv.org/abs/1803.04873.arXiv: 1803.04873; 2018.4. Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medi-cal image analysis. Med Image Anal 2017;42:60–88.5. Vetology launches pet X-ray-reading ai software [accessed 20.01.21],https://www.veterinarypracticenews.com/vetology-launches-pet-X-ray-reading-aisoftware/; 2018.6. Artificial intelligence powers new X-ray software [accessed 20.01.21],https://todaysveterinarybusiness.com/artificial-intelligence-powers-new-X-raysoftware/; 20187. Chassagnon G, Vakalopoulou M, Paragios N, Revel MP. Artificial intel-ligence applications for thoracic imaging. Eur J Radiol 2020;123:1-6.8. Winkel DJ, Heye T, Weikert TJ, Boll DT, Stieltjes B. Evaluation of anAI-based detection software for acute findings in abdominal computedtomography scans toward an automated work list prioritization of rou-tine CT examinations. Invest Radiol 2019;54:55-99. Cicero M, Bilbily A, Colak E, Dowdell T, Gray B, Perampaladas K,Barflett J. Training and validating a deep convolutional neural network

Page 7

MÜLLER ET AL. 7for computer-aided detection and classification of abnormalities onfrontal chest radiographs. Invest Radiol 2017;52:281-7.10. Chartrand G, Cheng PM, Voronstov E, Drozdzal M, Turcotte S, Pal CJ,Kadoury S, Tang A. Deep learning: a primer for radiologists.Radiograph-ics 2017;37:2113-2131.11. Choy G, Khalizadeh O, Michalski M, Do S, Samir AEE, Pianykh OS,Geis JR, Pandharipande PV, Brink JA, Dreyer KJ. Current applica-tions and future impact of machine learning in radiology. Radiology2018;288:318-328.12. Burns JE, Yao J, Muñoz H, Summers RM. Automated detection, local-ization, and classification of traumatic vertebral body fractures in thethoracic and lumbar spine at CT. Radiology. 2016;278(1):64-73.13. Lakhani P, Sundaram B. Deep learning at chest radiography: auto-mated classification of pulmonary tuberculosis by using convolutionalneural networks. Radiology. 2017;284(2):574-582.14. Taylor AG, Mielke C, Mongan J. Automated detection of moder-ate and large pneumothorax on frontal chest X-rays using deepconvolutional neural networks: a retrospective study. PLoS Med.2018;15(11):e100269715. Pourhomayoun M, Shakibi M. Predicting mortality risk in patients withCOVID-19 using machine learning to help medical decision-making.Smart Health, 2021; 20. https://doi.org/10.1016/j.smhl.2020.100178.16. Maier O, Schröder C, Forkert ND, Martinetz T, Handels H. Classifiersfor ischemic stroke lesion segmentation: a comparison study.PLoS One.2015;10(12):e014511817. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-alone arti-ficial intelligence for breast cancer detection in mammography: com-parison with 101 radiologists. J Natl Cancer Inst. 2019;111(9):916-922.https://doi.org/10.1093/jnci/djy22218. Sahiner B, Pezeshk A, Hadjiiski LM, et al. Deep learning in medicalimaging and radiation therapy. Med Phys. 2019;46:e1-e3619. Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medi-cal image analysis. Med Image Anal. 2017;42:60-8820. Yoon Y, Hwang T, Lee H. Prediction of radiographic abnormalities bythe use of bag-of-features and convolutional neural networks. Vet J.2018;237:43-48. https://doi.org/10.1016/j.tvjl.2018.05.00921. Li S, Wang Z, Visser LC, Wisner ER, Cheng H. Pilot study: Applica-tion of artificial intelligence for detecting left atrial enlargement oncanine thoracic radiographs. Vet Radiol Ultrasound. 2020;61:611–618.https://doi.org/10.1111/vru.1290122. Burti S., Longhin Osti V., Zotti A., Banzato T.. Use of deep learning todetect cardiomegaly on thoracic radiographs in dogs. The VeterinaryJournal. 2020:26223. Banzato T, Wodzinski M, Burti S, et al. Automatic classificationof canine thoracic radiographs using deep learning. Nature Scien-tific Reports (2021) 11:3964 | https://doi.org/10.1038/s41598-021-83515-324. O’Brien PJ and Lumsden JH. Cytologic examination of body cavity flu-ids. SeminVetMedSurg(SmallAnim)1988; 3: 140–156.25. Dempsey SM and Ewing PJ. A review of the pathophysiology, classifi-cation, and analysis of canine and feline cavitary effusions. JAmAnimHosp Assoc 2011; 47: 1–11.26. Noone KE. Pleural effusions and diseases of the pleura. Vet Clin NorthAm Small Anim Pract 1985; 15: 1069–1084.27. Forrester SD, Troy GC and Fossum TW. Pleural effusions – pathophys-iology and diagnostic considerations. Comp Cont Ed Pract Vet 1988; 10:121–136.28. Lynch KC, Oliveira CR, Matheson JS, et al: Detection of pneumotho-rax and pleural effusion with horizontal beam radiography, Vet RadiolUltrasound 53(1):38, 2012.29. Thrall DE. Canine and Feline Pleural Space. In: Thrall DE (7th ed): Text-book of veterinary diagnostic radiology. St. Louis, MO: Saunders Elsevier,2018; 670–678.30. Abadi M, Barham P, Chen J, et al. “TensorFlow: A System for Large-Scale Machine Learning TensorFlow: A system for large-scale machinelearning,” in 12th USENIX Symposium on Operating Systems Designand Implementation (OSDI ’16), 2016, pp. 265–28431. Simonyan K, Zisserman A. “Very deep convolutional networks forlarge-scale image recognition.” Presented as a conference paper inter-national: conference on learning representation ICLR, 2015, SanDiego., CA: arXiv preprint arXiv:1409.1556 (2014).32. Altman DG, Machin D, Bryant TN, and Gardner MJ (2000). Statisticswith Confidence, second edition. British Medical Journal, London, pp.28-29.33. Collett D (1999). Modeling Binary Data. Chapman & Hall/CRC, BocaRaton Florida, pp. 24.34. Ruskin J, Gurney JW, Thorsen MK, Goodman LR. Detection of pleuraleffusions on supine chest radiographs. American Journal of Radiology.1987; 148: 681–683.35. Kitazono MT, Lau CT, Parada AN, Renjen P, Miller WT Jr. Differen-tiation of pleural effusions from parenchymal opacities: Accuracy ofbedside chest radiography. American Journal of Radiology. 2010; 194:407–412.How to cite this article: Müller TR, Solano M, Tsunemi MH.Accuracy of an artificial intelligence software for detection ofconfirmed pleural effusion in thoracic radiographs in dogs. VetRadiol Ultrasound. 2022;1–7.https://doi.org/10.1111/vru.13089