Stephen Wicks, Ph.D. Senior Data Scientist, Translational Medicine. Clarivate Analytics. Dr. Wicks has leveraged extensive genetics, neuroscience, and systems biology experience to assist partners in developing data organization and integration strategies with a particular eye toward reaping insights from the careful integration of traditionally siloed data streams. Dr. Wicks obtained his Ph.D. in Neuroscience from the University of British Columbia and has worked for many years in both academic and industrial environments before joining the Life Sciences Professional and Consulting Services division of Clarivate.
Title: GWAS PLINK workflow integration
The analysis of high-throughput genotyping data (SNPs) in a genome-wide association study (GWAS) is used to identify clusters of genomic variants that are linked to a phenotype of interest. Such data can be vital in highlighting genetic variants that are linked to quantitative differences in measured phenotypic traits related to health and disease. The tranSMART platform currently supports the storage of GWAS association test results (GWAS tab), as well as the visualization of association test results via the GWAVA plugin developed by Pfizer. Association test results are indexed by polymorphism, and not by subject; no subject level data are contained in the result file, merely population-level data reflecting variant distributions in the population. Thus, it is challenging to integrate this kind of information in a meaningful way with the subject-indexed tranSMART platform. The result has been a substantially separate workflow for the analysis of variant data related to clinical or experimental populations in tranSMART.
In a collaboration with the University of Liverpool, Thomson Reuters has enabled functionality for loading, storing, and analyzing high-throughput patient-level genotype data encoded in PLINK binary format. This implementation loads variant data (BED file), and platform/map data (BIM file), but re-writes the associated phenotype column in the FAM file according to the subset1/subset2 status of cohorts formed in the tranSMART UI. That is, the user can define arbitrary cohorts in a standard tranSMART workflow, and use those cohorts to drive a PLINK analysis (Association test, Logistic Regression, or Linear Regression) from within a GWAS Advanced Workflow on the source PLINK dataset. The end result is a file (for example, an Association test result file) that could be stored and visualized using existing GWAS workflows in tranSMART.