Seminars
The Mathematics and Statistics Thursday Seminar Series will be held at 3:00pm on Thursdays, during term time in room 446 (Erskine Building).
Seminars may be of three formats:
- specialised, which involves a 50-minute presentation with questions, or
- general, which is geared towards a general mathematically literate audience and involves a 30-minute presentation followed by 20 minutes of questions and discussion with afternoon tea, or
- interdisciplinary colloquium, which is geared towards a general scientifically literate audience and involves a 30-minute presentation followed by 20 minutes of questions and discussion with afternoon tea.
Also see the concurrent primer series, are short expository introductions to a field of research by someone who is active in the area.
Additional research seminars may occur on other days as arranged.
For all enquiries, please contact
Dr Raazesh Sainudiin
r.sainudiin@math.canterbury.ac.nz
Forthcoming Seminars
Greg Ewing (University of Vienna)
Inference under the Coalescent
Wednesday, 23 May 2012, 1.00pm Room 441, Erskine Building
Abstract. I introduce the Wright-Fisher population model and its extension the Kingman Coalescent. These population models take into account the non independence of samples from populations and permit modeling of complex population histories. Using DNA or RNA sequence data, we infer population parameters such as population sizes, split times, and migration rates under these models with both traditional MHMCMC and likelihood free ABC. Limitations and future work is also discussed.
Robin Tiffen (Teaching Fellow, Department of Mathematics and Statistics)
NCEA what on earth does it mean?
Thursday, 24 May 2012, 3.00pm
Room 446, Erskine Building
Abstract. Our national secondary qualification was introduced 10 years ago and is currently undergoing revision. Initially, mathematics was largely externally assessed. Course design was flexible with a large number of achievement and unit standards available from which to select. The current realignment of standards has removed all the unit standards from curriculum based courses and reduced to three the number of externally assessed achievement standards. The remainder of the assessment, sometimes more than half, is internally assessed by schools. In this seminar, I will look at the latest incarnation of NCEA and how much (or how little) students arriving at university from school may know.
Patrick W. Saart (University of Canterbury)
Nonparametric Specification Test in Semiparametric Autoregressive Condition Duration Model
Wednesday, 30 May 2012, 1.00pm
Room 441, Erskine Building
Abstract. A crucially important advantage of the semiparametric regression approach to the nonlinear autoregressive conditional duration (ACD) model developed in Saart et al. (2011), i.e. the so‒called Semiparametric ACD (SEMI‒ACD) model, is the fact that its estimation method does not require a parametric assumption on the conditional distribution of the standardized duration process and, therefore, the shape of the baseline hazard function. The research in this paper complements that of Saart et al. (2011) by introducing a nonparametric procedure to test the parametric density function of ACD error through the use of the SEMI‒ACD based residual. The hypothetical structure of the test is useful, not only to the establishment of a better parametric ACD model, but also to the specification testing of a number of financial market microstructure hypotheses, especially those related to the information asymmetry in finance. The testing procedure introduced in this paper differs in many ways from those discussed in existing literatures, for example Ait-Sahalia (1996), Gao and King (2004) and Fernandes and Grammig (2005). We show theoretically and experimentally the statistical validity of our testing procedure, while demonstrating its usefulness and practicality using datasets from New York and Australia Stock Exchange.
Erkan Buzbas (Stanford University)
Approximate Bayesian Computation when Simulating Data is Difficult
Thursday, 31 May 2012, 1.00pm
Room 445, Erskine Building
Abstract. Complex stochastic systems result in model likelihoods that are difficult to evaluate, defying standard computational approaches to inference. Approximate Bayesian Computation (ABC) methods perform inference without evaluating the likelihoods and thus have been useful to test complex models in ecology, evolution, and genetics. However, central to the success of ABC methods is inexpensive simulation of data sets under the model of interest, which is not always computationally feasible. Modelling the system of interest at a level of complexity that statistical inference is computationally feasible is therefore a common practice, which often requires an upfront reduction in the complexity of the scientific model originally hypothesized. In this talk, I present a generic family of approximate Bayesian methods (AABC) which bypasses an upfront reduction in the complexity of the model. AABC methods deliver inference under models that standard ABC methods are computationally unfeasible by relaxing the requirement that all simulated data sets used to assess the likelihoods should be generated under the complex model of interest. AABC methods treat a limited number of simulated data sets generated under the model of interest as fixed background information and use an auxiliary model to efficiently simulate a large number of data sets required to perform ABC. My approach allows to simulate data sets approximately from the model of interest that are otherwise difficult to simulate, thereby extending the class of models to which ABC methods are applicable. I show that the posterior distribution targeted by AABC methods converges to the "true" value of the parameter generating the data as the number of simulated data sets and the sample size of the observed data set simultaneously increase. I demonstrate the use of AABC methods by testing the common origin hypothesis of Central African Pygmy populations, a part of our umbrella project METHIS aiming to reconstruct the recent history of admixed populations.
Previous Seminars
Jeanette McLeod (University of Canterbury)
Graph Connectivity in the Streaming Model
Thursday, 17 May 2012, 3.00pm
Room 446, Erskine Building
Abstract. Over the past decade there has been a significant amount of interest in the study of massive graphs whose edge sets are too large to be stored in memory. This has given rise to the streaming model of computation where algorithms are restricted to a single pass over the edge set and have significantly less memory than would be required to store the entire graph. Determining the types of graph problems that can be solved efficiently in this model is difficult, and in fact, for many graph properties, the restrictions of the model make it is impossible to determine whether a given graph has the property (Feigenbaum et al., 2005). In this talk, three single-pass streaming algorithms based on results of Nagamochi and Ibaraki (1992), and Zelke (2007) will be presented. The first is an algorithm for finding the block-cutpoint graph of a simple input graph G = (V,E) presented as a stream of edges. It has worst case complexity O(|E|). The second is an algorithm for finding all 2-(vertex) separators of a 2-connected simple graph G = (V,E), and has worst case complexity O(|V|2). Finally we give an algorithm which will find all 2-edge cuts of a 2-connected graph G = (V,E) in time O(|V|2).
Deidre Wall (National University of Ireland in Galway)
Graphical Comparisons of Survivor Functions and an Interactive Surrogate Plot for RPART trees
Wednesday 16 May, 3:00pm
Room 235, Erskine Building
Abstract. Classification and Regression Trees (CART) are a simple non-parametric regression approach. The main feature of CART is the data is recursively partitioned into groups. At any given node in a CART, the best split s is chosen and the data is split into subsets using this criterion. Next the best splits for each of these subsets are found and the data is partitioned again. This continues until some stopping criterion is reached. A surrogate split is a split which most accurately predicts the action of s, the best split at a node. Surrogates are useful for identifying variables, which may not appear in the tree but if the first split variable is removed from the model, the variable may appear and have a similar tree structure. Here we will use an interactive surrogate plot for RPART trees to identify these surrogate splits for and plot the tree using these surrogates using clinical and pathological variables to predict Oncotype DX. Oncotype DX is a test that analyses 16 genes in patients with breast cancer and assigns each patient with an Oncotype DX Recurrence Score, which is an estimate of their likelihood of developing a breast cancer recurrence in 10 years.
Frank Lad (University of Canterbury)
Completing the Logarithmic Scoring Rule for Assessing Probability Distributions
Thursday, 10 May 2012, 3.00 pm
Room 446, Erskine Building
Abstract. We propose and motivate an expanded version of the logarithmic score for forecasting distributions, termed the Total Log score. The expectation of the Total Log score equals the Negentropy plus the Negextropy of the distribution, and also equals the sum of the Negentropies (equivalently, the Negextropies) for the events constituting the partition generated by the quantity in question. We examine both discrete and continuous forms of the scoring rule, and we discuss issues of scaling for scoring assessments. The Total Log score analysis suggests the dual tracking of the quadratic score along with the usual log score when assessing the qualities of probability distributions. An application to the sequential scoring of forecast distributions for the daily rate of stock returns displays the usefulness of the proposal. The outlook provided by our analysis suggests a general theory of complementary distributions, begging for discussion and further analysis.
(Paper written with Giuseppe Sanfilippo and Gianna Agro, University of Palermo)
Mark Hickman (University of Canterbury)
Euclidean Signature Curves
Thursday, 3 May 2012, 3.00pm
Room 446, Erskine Building
Abstract. In 1998, Calabi et al. suggested the use of signature curves in object recognition problems. In that paper it was claimed that two curves were congruent under a group G if and only if their G-signature curves were identical. For the next decade the research was mainly focused on finding a numerically stable method to compute the signature curve. However in 2009 Musso and Nicolodi produced examples of non-congruent curves that nonetheless had identical Euclidean signature curves.
In this talk, I will simplify the Musso and Nicolodi construction to produce two simple closed curves of the same length that are not congruent which have identical signature curves. I will then give a corrected version of the fundamental “theorem” of the 1998 paper.
Rebecca Killick (University of Lancaster)
Efficient Detection of Multiple Changepoints with an Oceanographic Time Series
Please note the room and time change.
Thursday, 26 April 2012, 2.00pm 3:00pm
Room 101 446, Erskine Building
Abstract. We consider the problem of detecting multiple changepoints in large oceanographic data sets. In this setting the amount of data being collected is continually increasing and consequently the number of changepoints will also increase with time. An efficient and accurate analysis of such data is of considerable interest to those working in the energy sector as understanding the characteristics of the ocean environment is central to reliable design and operation of marine and coastal structures. Detecting the presence of changepoints in oceanographic time-series is of particular importance, since statistical and engineering modelling of the ocean environment, structural loading and response typically assumes stationarity of the environment (in time). Drawing on recent work on efficient search methods by Killick et al. (2011), we compare and contrast the effect of different approaches to this data, focusing in particular on computational and statistical aspects. The talk will conclude by highlighting the importance of such computationally efficient methods in an oceanographic setting.
Professor Geoffrey Grimmett (University of Cambridge)
Probability, the Science of Uncertainty
Forder Lecture, 2012
Thursday, 19 April 2012, 5.30pm
Room 031, Erskine Building
Abstract.
“Always be a little improbable” [Wilde]
Probability theory is a mature area of mathematics that was developed largely in France using the language of gambling. Why did we bother?
The role of modern probability will be discussed and illustrated with many examples from “real life”, including gambling, parenthood, and the sinking of the Titanic.
“This branch of mathematics is the only one, I believe, in which good writers frequently get results which are entirely erroneous.” [Peirce]
Iain Raeburn (University of Otago)
Representations of Semigroups and Equilibrium States
Thursday, 29 March 2012, 3.00pm
Room 446, Erskine Building
Abstract. By an isometry we mean a linear operator on Hilbert space which preserves the norm. The product (composition) of two isometries is an isometry, and one can consider representations of semigroups by isometries. This turns out to be a very fruitful activity, one which has attracted the attention of many mathematicians over the past 50 years.
In this talk we will discuss this program, focusing on a particularly insightful family of examples introduced by Nica (1993), and a semigroup built from the natural numbers which fits Nica's mould in a nontrivial way. This semigroup generates an operator-algebraic dynamical system which is proving to be an inspiring model for the study of equilibrium states. We will explain what this means, and describe some recent joint work with Marcelo Laca on the subject.
Frank D'Amico (University of Pau & Pays Adour, France)
Detection of Significant Changes in Short Time Series: Applied Issues in Ecology
Wednesday, 28 March 2012, 3.00pm
Room 235, Erskine Building
Abstract. In ecological and environmental studies, sudden and significant changes in temporal patterns often happen on short-time scales. Some of these changes are naturally-driven and predictable whilst others are unpredictable. They can require analyses of change-point in daily, monthly or annual scales, with the added difficulty that the observations are not necessarily numerous and that the decisions must be made very quickly. This leads to a methodological paradoxical difficulty to resolve by statisticians. Existing methods have been developed for long time series of statistical indices (including hydrology and climatology applications). These methods need to be adapted to i) the study of phenomena on short duration, i.e. small samples size through resampling techniques, and ii. take into account the dependence between successive observations induced by the fact that they are time series.
The aim of this seminar is not to detail the mathematical framework but rather to focus on applied issues where it is crucial to determine potential dates of any sudden change of abundance and to link them to external phenomena where significant decrease/increase occurs (such as accidental pollutions, natural disease outbreak, annual behavioural routine etc).
Professor Thomas Lumley (University of Auckland)
Design and Analysis Issues in a Two-phase DNA Resequencing Study
Friday, 23 March 2012, 3.00pm
Room 101, Erskine Building
Abstract. The CHARGE Consortium is a group of cohorts that has conducted extensive genetic association studies based on genome-wide SNP-chip data. We are in the process of conducting a resequencing study to follow up on the associations and attempt to find the functional DNA variants responsible for them.
I will talk about the subsampling design, issues of efficiency and robustness in estimation and testing, and the impact on analysis of being unable to share individual-level data between cohorts.
Rick Beatson (University of Canterbury)
Approximation Theory: Analysis, Algorithms and Applications
Thursday, 22 March 2012, 3.00pm
Room 446, Erskine Building
Abstract. Approximation Theory can be viewed as the study of functions and operators for approximation. Often this is a mix of analysis of the tools (the function spaces), algorithms (what are the most efficient and stable algorithms for using the tools) and applications. The subject draws from many areas of mathematics, including classical analysis, harmonic analysis, special functions, numerical linear algebra, et cetera. Typically results are proven analytically but are best understood and visualised geometrically.
I will illustrate the analysis, algorithms, applications circle in the particular instance of fast methods for computing with radial basis functions. The talk will include lots of pictures and animations, and some applications.
In Kang (University of Canterbury)
Wavelets, ICA and Statistical Parametric Mapping: with Applications to Agitation-Sedation Modelling, Detecting Change Points and to Neuroinformatics
Monday, 19 March 2012, 9.00am
Room 101, Erskine Building
PhD seminar
Abstract. The wavelet methods developed, advocated and used in this thesis are primarily based on the discrete wavelet transform (DWT), wavelet thresholding and density estimation via wavelet smoothing. First a suite of wavelet techniques are advocated, based on the DWT, and applied successfully to assess whether an ICU patient's simulated agitation-sedation (A-S) status reflects their true dynamic A-S profile. The use of quantitative modelling to enhance understanding of the A-S system and the provision of an A-S simulation platform are key tools in this area of patient critical care. Secondly novel wavelet density metrics are developed, a wavelet time coverage index (WTCI) and a wavelet probability band (WPB), based on a Bayesian density estimation. This led to the development of two numeric metrics, the average normalized wavelet density and the relative average normalized wavelet density; both shown to be in close agreement to our DWT and earlier metrics. The DWT and WPB approaches also yield excellent visual assessment tools and are generalisable to any study involving bivariate time series of a large number of units (patients, households etc) and of significant length. Wavelet thresholding and independent component analysis (ICA) are tested as denoising methods, and applied to brain image data, as part of the neuro-informatics study of Turner et al. (2003). ICA methods are then implemented for denoising all the cerebral function data, at a voxel by voxel basis. This is performed as a pre-processing step to the creation of statistical parametric maps (SPMs), used to model brain function with respect to personality as non-linear models. The results derived from our novel SPM-ICA approach support the theory of a biological basis for personality and report more de/activation clusters in the brain, as related to specific personality traits than Turner et al. (2003). Our work gives credence to a growing body of thought for the need of non-linear modelling in psychometric research (Cloninger, 2008). Our work also has the potential to increase momentum for patient specific drugs for depressives. Lastly we develop a DWT based methodology for change point analysis that uses modified, maximal overlap DWT (MODWT) coefficients to link to a novel shifting DWT (SDWT) methodology - a combination of the DWT and MODWT for the change point detection problem, which is shown to provide an accurate and computationally efficient change point location method. SWDT can be generalized to find multiple change points, by way of a binary segmentation procedure and iteration.
Martina Gallenberger (Institute of Biomathematics and Biometry Helmholtz Zentrum München, Germany)
Parameter Estimation for ARX-Hammerstein Models based on Matrix-valued Kernels
Thursday, 15 March 2012, 3.00pm
Room 446, Erskine Building
Abstract. Hammerstein models are a special class of nonlinear models given as a combination of an unknown nonlinearity and a linear dynamical system. Choosing the linear system to have an autoregressive structure with external inputs (ARX) results in ARX-Hammerstein models. Usually the nonlinearity is unknown and therefore identification of these models consists of estimating the nonlinearity and the ARX systems parameters.
This talk presents an estimation method for ARX-Hammerstein models using least squares support vector machines (LS-SVM) to model the nonlinearity. We consider multiple input/multiple output models which leads to approximation of vector-valued functions based on matrix-valued kernels. We want to apply this method to parameter estimation in a nonlinear ODE model describing the glucose-insulin regulatory system. Recent experimental results indicate the relevance of the beta-cell cycle for the development of diabetes mellitus. Therefore, we developed a mathematical model that connects the dynamics of glucose and insulin concentration with the beta-cell cycle.
Vince Bidwell (Private consultant, formerly at Lincoln Ventures Ltd)
The Eigenstructure Representation of Groundwater Dynamics, as a Precursor for Aquifer Management
Thursday, 8 March 2012, 3.00pm
Room 446, Erskine Building
Abstract. Abstraction of groundwater for irrigation has environmental effects on surface waters connected to the aquifers. There is a requirement by the regulatory authorities for design of robust rules for management of the groundwater resource. Groundwater is a distributed system of stored water that responds dynamically to climatically-driven recharge and pumped abstraction. This system can be mathematically described by the eigenvalues and eigenfunctions of the groundwater flow equation. The equivalent continuous-time, discrete-space, numerical groundwater models can also be represented in modal (eigenstructure) state-space format. This format forms the basis for design of system control. A simplification for practical control design is achieved by application of analytical solutions for the eigenstructure of groundwater dynamics for simple aquifers. These solutions enable spreadsheet-based computation and presentation of control options for public debate.
Jeroen Schillewaert (Free University of Brussels)
A Geometric Approach to the Freudenthal-Tits Magic Square
Thursday, 1 March 2012, 3.00pm
Room 446, Erskine Building
Abstract. Incidence geometries are closely related to division rings and the validity of certain geometric statements can determine the underlying algebraic structure of the objects.
The main goals of the project are to give a uniform axiomatic description of the geometries related to exceptional Lie algebras occurring in the Freudenthal-Tits magic square and to obtain a purely geometric characterization of the exceptional geometries E6, E7 and a construction of E8.
(Joint work with H. Van Maldeghem)
Ian Wanless (Monash University)
Latin Squares
Thursday, 23 February 2012, 3.00pm
Room 446, Erskine Building
Abstract. A Latin Square is a matrix in which each row and column is a permutation of the same set of symbols. Examples include completed sudoku puzzles, but also the Cayley table of any finite group.
In this seminar I will look at some applications of Latin Squares and explore several research questions that have entertained me over the years, including such basic questions as “How many Latin Squares are there?”
Tony Dale (NZi3, University of Canterbury)
Massively Parallel Computing on CPU Supercomputers
Friday, 10 February 2012, 3.00pm
Room 446, Erksine Building
Abstract. GPUs are not the only platform available for stream processing and the SIMD model. Massive parallelization on hundreds of cores using OpenMP, OpenCL or other libraries and compilers is possible on CPU-type supercomputers.
In this presentation, the BlueFern technical team will describe options for many-core computing on the BlueFerm systems:
- what is possible
- getting access (it's free!)
- Q + A
Anna MacDonald (University of Canterbury)
“Black-Box” Solution for Threshold Estimation in Extreme Value Modelling
Thursday, 19 January 2012, 11:00am
Room 446, Erskine Building
Abstract. A plethora of recent articles have proposed various extreme value mixture models for threshold estimation and quantifying the corresponding uncertainty. These mixture models typically treat the threshold as a parameter, so it can be objectively estimated using standard inference tools, avoiding the aforementioned graphical diagnostics which require expert (subjective) judgment. These mixture models are typically easy to automate for application to multiple datasets, or in forecasting situations, for which various adhoc adaptations have had to be made in the past to overcome the threshold estimation problem.
This talk will outline one particularly flexible mixture model which splices together the usual extreme value model for the upper tail behaviour, with the threshold as a parameter, and the “bulk” of the distribution below the threshold captured by a non-parametric kernel density estimator. This representation avoids the need to specify a-priori a particular parametric model for the bulk distribution, and only really requires the trivial assumption of a smooth density which is realistic in most applications. Unlike other mixture models, within the literature, the mixture model set up also allows for extension to the non-stationary case and is flexible enough to cope with a number of characteristics that the data may hold. Inference for all the parameters, including threshold and kernel bandwidth, is carried out in a Bayesian paradigm, potentially allowing sources of expert information to be included which can help with the inherent sparsity of extremal sample information.