In this lecture we will discuss the role of statistics in the “primary” and “secondary” analysis of DNA sequencing data. By these we mean the process of turning the raw signal of the sequencing instrument to sequence reads, and the process of aligning and assembling the reads to detect variations from the reference genome. We will see that statistical ideas have been pivotal in every step in the development of successful analysis methodologies. After presenting examples from previous and current generations of sequencing technologies, we will briefly examine some new statistical challenges posted by emerging (third generation) sequencing technologies.
Professor Wing Hung Wong’s current research is motivated by problems from personalized medicine and systems biology. He is developing Bayesian nonparametric methods and high performance computing solutions to these problems. In the past his group has developed a number of widely used bioinformatics tools, and technologies from his group had led to the formation of the several companies in the space of genomics data analysis and personalized prognostics. Professor Wong is the Stephen R. Pierce Family Goldman Sachs Professor in Science and Human Health at Stanford University, and is a member of the National Academy of Sciences of the USA.