Session 3a, Statistical Genomics

This session will be held in the Erskine Building, Room 445

10:50 — 11:10

Statistical methods for microarray-based gene set analysis

Sarah Song
University of Auckland

The analysis of gene sets has become a popular topic in recent times, with researchers attempting to improve the interpretability of their microarray analyses through the inclusion of supplementary biological information. While a number of options for gene set analysis exist, most do not incorporate inter-gene correlation information, despite the fact that such correlations are known to be biologically relevant. In this talk the characteristics of some of the most widely used gene set analysis methods will be examined, based on their performance in both simulated and real data sets. In particular the importance of incorporating correlation information into the analysis process will be investigated.

11:10 — 11:30

A novel statistical model to identify biomarkers in 2D proteomic gels

Steven Wu
University of Auckland

University of Otago

University of Auckland

University of Auckland

Proteomic technologies are used to identify differentially expressed plasma proteins that may serve as biomarkers to predict disease. In this study, our aim is to use 2D-gel electrophoresis to identify sets of proteins in early pregnancy plasma that are associated with the subsequent development of pre-eclampsia, a severe hypertensive complication of pregnancy. However, due to technical issues, traditional statistical methods lack the power to detect significant changes in protein abundance between women with and without diseases. We have developed a novel statistical model of 2D-gel data that incorporates both the probability that a spot is expressed and the conditional probability of expression intensity. The model also takes account of threshold detection levels. Using this model, we have gone on to develop two approaches to identifying spots implicated in differences between women with and without pre-eclampsia. These approaches use either a Likelihood Ratio Test or a Bayesian MCMC procedure to identify significant spots. In this talk, I present our model and discuss the relative merits of the two approaches we have developed.

11:30 — 11:50

Incorporating Genotype Uncertainty Into Mark-recapture-type Models For Estimating Abundance Using DNA Samples

Janine Wright
University of Otago

The use of genetic tags (from non-invasive samples such as hair and faeces) to identify individual animals is increasingly common in wildlife studies. Non-invasive genetic sampling has many advantages and huge potential, but while it is possible to generate significant amounts of data from these samples, the biggest challenge in the application of the approach is overcoming inherent errors. Genotyping errors arise when the poor sample quality due to an insufficient quantity of DNA leads to failure of DNA amplification at one or more loci. This has the effect of heterozygous individuals being scored as homozygotes at those loci as only one allele is detected (termed 'allelic drop-out'). False alleles are also possible. Error rates will be species-specific, and will depend on the source of samples and the way the samples have been handled. If errors go undetected and the genotypes are naively used in mark-recapture models, significant overestimates of population size can occur. Using data from the brush-tailed possum (Trichosurus vulpecula) in New Zealand and the European badger (Meles meles) we describe a method based on Bayesian imputation that allows us to model data from samples that include uncertain genotypes.

11:50 — 12:10

Filling in the Blanks - Inferring Genetic Relationships Between Individuals Based on Incomplete Information

Steven Miller
University of Auckland

Grant Harper
Department of Conservation

James Russell
University of Auckland

Hamish MacInnes
University of Auckland

Rachel Fewster
University of Auckland

A common method for inferring individuals’ genetic relationships is to calculate the probability each individual is a member of each of a set of potential populations. This is achieved most simply by building population allele profiles based on individuals sampled from that population, then using the Hardy-Weinberg Equilibrium equations to calculate the probability that individuals’ genotypes could have been drawn from those genetic profiles. The individuals’ sets of probabilities can then be used to characterise groups of genetically similar individuals. However, these relative comparisons are invalid if individuals possess different levels of completeness for the selected genetic traits. It is not always feasible to characterise every individual for every genetic trait selected for the analysis. We present a novel method for inferring an individual’s complete- information probability of belonging to a population when that individual has an incomplete set of genetic information. We apply this method to data from a Rattus sp. post-eradication repopulation scenario from Pearl Island, Stewart Island (Rakiura), New Zealand. In this scenario, the reappearance of rats following eradication needed to be identified as a reinvasion from the surrounding mainland, or a failed eradication on the island itself.

Presentation Program