Computational Approaches for Cancer Workshop 2022
Frederick National Laboratory
More info: https://ncihub.org/groups/cafcw/cafcw22
Quantum Computing approach using medicinal plants anticancer properties by ICECBS consortium
Vijay P. Bhatkar, Kenneth Buetow, Souvik Chakravarty, Sasha Cocquyt, Parvati Dev, Devdatt Dubhashi, Shanker Gupta, Haresh K.P, B Jayaram, Cezary Mazurek, Asheet K Nath, Koninika Ray, Amit Saxena, Smita Saxena, Akshay Seetharam, Samta Sharma, Shashank Shekhar, Anil Srivastava, Neelakantan Subramanian, Pushpa Tandon
Ensemble Learning of Attention-based Models for Whole Slide Image Comprehension
Hong-Jun Yoon, Adam Saunders, Folami Alamudun, Sajal Dash, Jacob Hinkle, and Aristeidis Tsaris
Supporting a Community of Cancer Models with the CANDLE Checkpoint Module
Rajeev Jain, Justin M. Wozniak, Jamaludin Mohd Yusof, George Zaki and Sunita Menon
High-Performance and Parallel Workflow for Generating Polygenic Risk Scores Using Multiple Algorithms
Alex Rodriguez, Ravi Madduri
Methods: We developed a workflow that uses the input summary GWAS data and performs standard quality control to generate PRS with multiple software such as PLINK, PRSice-2, LDpred-2, lassosum, PRS-CSx and SBayesR. This workflow was configured to leverage the high-performance computational resources available in the ORNL KDI cluster and can be enabled to run on cloud resources.
Results: We used the pipeline to generate new PRS and validate the PRS scores generated from other large consortia, across multiple VA MVP related projects. One such project was used to help evaluate the ability of genome-wide PRS the prostate cancer risk compared to a recently developed PRS of 269 established prostate cancer risk variants and multi-ancestry weights [REF]. Genome-wide PRS approaches included LDpred2, PRS-CSx, and EB-PRS. The results showed that the PRS constructed using 269 variants had significantly larger AUCs in both African and European ancestry men, with African and European ancestry men in the top PRS decile having larger odds of prostate cancer. The use of the pipeline in this investigation suggested that genome-wide PRS may not improve the ability to distinguish prostate cancer compared to a genome-wide significant PRS. The analysis using the workflow method was executed in less than 5 hours using 20 nodes in the OLCF KDI cluster for all of the software included in the analysis. The workflow ran each of the PRS software in parallel.
Conclusions: The open-source pipeline is available at: https://github.com/exascale-genomics/PRS-dev
Temporal Stability of Immuno-Phenotype Radiomic Score in Melanoma
Nizam Ahamed, Evan Porter, Baher Elgohari, Mohamed Abdelhakiem, John Kirkwood, Diwakar Davar, Zaid Siddiqui
Pure seminoma subtyping using computational approaches
Kirill E. Medvedev, Anna V. Savelyeva, Aditya Bagrodia, Liwei Jia, Nick V. Grishin
Deep Learning in Cervical Cancer: Searchable Catalogs and Smart Data Curation
Dani Ushizima, Andrea Bianchi, Fatima Medeiros, Claudia Carneiro, Débora Diniz, Breno Keller, Mariana Rezende, Daniel Silva, Flavio Araujo, Romuere Silva, Marcone Souza
[1] Rezende, Silva, Bernardo, Tobias, Oliveira, Machado, Costa, Medeiros, Ushizima, Carneiro, Bianchi, "Cric searchable image database as a public platform for conventional pap smear cytology data”, Nature Scientific Data 2021. [2] Araújo, Silva, Resende, Ushizima, Medeiros, Carneiro, Bianchi, "Deep Learning for Cell Image Segmentation and Ranking", Computerized Medical Imaging and Graphics, Mar 2019. [3] Araujo, Silva, Ushizima, Parkinson, Hexemer, Carneiro, Medeiros, "Reverse Image Search for Scientific Data within and beyond the Visible Spectrum", Expert Systems and Applications 2018.
Comparison of Radiomics from Prostate Bi-parametric MRI and Pharmocokinetic Parameters From Dynamic Contrast Enhanced MRI for Risk Stratification of PI-RADS=3 Prostate Cancer Lesions
Aaron Ng, Michael Sobota, Ansh Roge, Amogh Hiremath, Nathaniel Braman, Sree Harsha Tirumani, Leaonard Kayat Bittencourt, Lee Ponsky, Anant Madabhushi, Rakesh Shiradkar
Transfer Learning for Language Model Adaptation: A case-study with Hepatocellular Carcinoma
Amara Tariq, Omar Kallas, Patricia Balthazar, Scott Lee, Terry Desser, Daniel Rubin, Judy Wawira Gichoya, Imon Banerjee
There are significant variations in hepatocellular carcinoma (HCC) screening and diagnosis protocols across various institutions; and as a result, patients may receive a mix of imaging studies (US, CT, MRI) across their longitudinal screening which makes standardized reporting of the outcome challenging. Natural language processing (NLP) can be utilized to classify imaging reports following standard guidelines; however structured LI-RADS reporting for HCC-screening using MR have only limited data available due to recent introduction of the standard as well as high cost of the exam. If NLP algorithms can be systematically adapted between different domains (UR or MR) and institutes, diagnosis and screening reporting can be standardized which will accelerate information dissemination to medical practitioners, thus improving patient care and treatment planning.
Methodology
We experimented with transferring language models (LM) between radiology reports of ultrasound (US) and MR scans obtained from two different institutes (Inst1 and Inst2) and developed a HCC diagnosis extraction model for both MR and US screening studies. Transferred LM for MR implies that an LM was trained on US reports of Inst1 and then used to initiate the representation of common words with MR vocabulary of an LM which was further trained on MR reports from Inst2 (vice-versa for US LM). We collected 12116 abdominal US studies performed in Inst1 to screen for HCC without LI-RADS score (untemplated), and 1744 templated studies with LI-RADS score (~10% malignant cases with LI-RADS>2). We also collected 9470 untemplated MR studies conducted in Inst2 without LI-RADS reporting, and prepared 1087 LI-RADS annotated reports (~50% malignant cases) with the help of radiologists as benin cases are often not coded with LI-RADS in practice. We experimented with transferring two LM by training the model from scratch including vocabulary space; 1) word2vec (context-independent) 2) BERT (context-dependent). Three classifier pipelines were tested, i.e., word2vec+RandomForest, word2vec+1DCNN, and BERT+RandomForest with malignant vs. benign as target labels. Classification performance was used to benchmark quality of natively-trained LM and transferred LM.
Results
Fine-tuned LM performs better when paired with any of the selected classifiers for the more challenging task of classifying MR reports without template with highest overall weighted f1-score 0.90. Similarly, US-fine tuned language models (best weighted f1-score 0.95) perform better when paired with any of the classifiers (Random Forest or 1DCNN) for the more challenging task of classifying US reports without a template. Learnt LM space clearly demonstrates that semantically similar words (e.g., ‘isointensity’, ‘hypointensity’, ‘hyperintensity’, ‘intense’, ‘bright’, ‘hypointense’) in the original and fine-tuned language space are mapped close together even when they originated from two separate domains.
Discussion
The study reports a successful transfer of language models from one domain to a similar domain in radiology and compares it to not performing adaptation. Experimental results showed that fine-tuning of the word-embedding models with similar domain adaptation (US → MR and MR → US), even for multi-institutional reports, provides more opportunity for semantic knowledge preservation for down-steam HCC classification tasks compared to training the language model from scratch.