A recap of the reference research for this project: “Multiple testing in genome-wide association studies via hidden Markov models”
Our project is inspired by the paper “Multiple testing in genome-wide association studies via hidden Markov models” by Zhi Wei, Wenguang Sun, Kai Wang, and Hakon Hakonarson (Wei et al. 2009). In the paper, they talk about how conventional p-value based testing procedures result in inefficiencies in Genome-wide Association Study (GWAS) since they do not take linkage disequilibrium (LD) among SNPs into account. This paper explores a method called Pooled Local Index of Significance (PLIS), which is demonstrated to be ideal in the sense that, among all legitimate FDR procedures, it has the lowest false negative rate (FNR). PLIS gets greater power than traditional P-value based approaches by re-ranking importance for all SNPs with LD taken into account. According to simulation data, PLIS is superior to traditional FDR methods for finding SNPs related to diseases.
In this project, we start with a demo of a sample dataset from the library that the researchers created for their paper. After that, we attempt to validate the efficiency of the PLIS method and whether it is actually better than p-value based methods through a simulation. Finally, we apply the PLIS and p-value based methods on a real Type 1 Diabetes (T1D) dataset in order to see if there’s a difference in their detection of SNPs that are correlated with the disease.