Frederick (a.k.a. Erick) A. Matsen
Allan Wilson Postdoctoral fellow
Biomathematics Research Centre
University of Canterbury
Private Bag 4800
Christchurch
New Zealand
+64 3 364 2987 x7431


Research focus


I develop mathematical techniques and computer algorithms to improve our understanding of evolution. My current research is motivated by two main questions: first, and how can we (better) reconstruct evolutionary history from present-day DNA sequences? Second, how do organisms diversify (e.g. speciate)?

How can we (better) reconstruct evolutionary history from present-day DNA sequences?

This is a very big and somewhat old question, with hundreds of scientists working on different aspects. The field which has developed is called phylogenetics. The general idea is that organisms with similar DNA sequences are usually more closely related than organisms which quite different DNA sequences. Making this formal and then running many computations on a computer leads to a tree diagram showing interelationships; this diagram is called a phylogenetic tree. My contributions to this big project are in two areas: phylogenetic mixtures and theoretical analysis of Bayesian methods.

Mixtures: It is well established from a theoretical perspective that if sequences evolve under a single (simple) model then a large amount of sequence data will reconstruct the tree correctly with high probability. However, it is now known that different parts of a sequence evolve in different ways; this is formulated statistically as a phylogenetic mixture model. In contrast to the single-process case, it is known that data from mixtures of processes does not uniquely determine a tree. Mike Steel and I recently realized that even more is true: it is possible to have a mixture of two processes on one tree such that the resulting data looks exactly like a single process on a different tree. I'm now interested in whether these sorts of issues really do pose a problem for phylogenetics researchers. Recent work with Mossel and Steel partially addresses this question through a combination of geometric and combinatorial means.

Bayesian: There are many different ways of building phylogenetic trees, and one class of such methods are called Bayesian methods. One advantage of these methods is that they can give posterior probabilities, which are (more or less) an estimate of how correct certain parts of the tree are. However, it can happen that even if there is no actual evidence determining how a certain set of species evolved, the methods can choose one scenario and attach a very high posterior probability to the story. This problem is called the ``star tree paradox,'' and Mike Steel and I recently showed analytically that it can persist even when the methods are given arbitrarily long DNA sequences.

How do organisms diversify?

This is an even older question, with a correspondingly bigger literature. My focus is on only one approach, which is based on looking at "shape" or overall structure of tree phylogenetic trees. A quick review of a couple virus trees show that different evolutionary scenarios can lead to different tree shapes.

In order to use tree shape in a scientific fashion, we need ways of quantifying it. So far I have written about tree shape in three ways: geometric, algebraic/combinatorial, and recursive. The recursive (optimization) approach has been the most productive for applications to data. I am currently applying this framework with Katherine St. John to search for evidence of tree reconstruction bias in modern tree reconstruction algorithms. I am also collaborating with Alexei Drummond applying these techniques to test for coalescent model mis-specification.

Other projects

In the past, John Wakeley and I investigated a class of models between the lattice model and the island model, and were able to show that these models converged back to the island model when the number of subpopulations goes to infinity. For this project I applied some nice theory about random walks on graphs.

I have also worked on the evolution of language with Martin Nowak. Rather than approach learning theory from the classical angle of an idealized teacher-learner pair, we investigated a model where the agents try to find a common language. We found some remarkably simple individual strategies which led to the population finding a common language with high probability given some constraints on the underlying space of languages.

Publications

[PDF] F. A. Matsen, E. Mossel, and M. Steel. Mixed-up trees: the structure of phylogenetic mixtures. arXiv:0705.4328 [q-bio.PE], 2007.

[PDF] F. A. Matsen and M. Steel. Phylogenetic mixtures on a single tree can mimic a tree of another topology. arXiv:0704.2260 [q-bio.PE], 2007.

[PDF] M. Steel and F. A. Matsen. The bayesian star paradox persists for long finite sequences. Molecular Biology and Evolution, 24(4):1075--1079, April 2007.

[PDF] F.A. Matsen. Optimization over a class of tree shape statistics. accepted to IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2006.

[PDF] F.A. Matsen and S.N. Evans. Ubiquity of synonymity: almost all large binary trees are not uniquely identified by their spectra or their immanantal polynomials. arXiv:q-bio/0512010, 2006.

[PDF] F.A. Matsen. A geometric approach to tree shape statistics. Systematic Biology, 55(4):652--661, 2006.

[PDF] F.A. Matsen and J. Wakeley. Convergence to the island-model coalescent process in populations with restricted migration. Genetics, 172(1):701--708, January 2006.

[PDF] F.A. Matsen and M.A. Nowak. Win-stay, lose-shift in language learning from peers. PNAS, 101(52):18053--18057, December 2004. Commentary by K. Sigmund.


Software

simmons     My software to compute tree shape statistics.

alga, etc.     The source code for the genetic algorithm and related software described in Optimization...


Other interests

Computer programming:
I am completely obsessed with the very fast French functional/imperative language ocaml which was first shown to me by my buddy Martin Willensdorfer. Functional languages are appropriate for experimenting with combinatorics, and I find writing a nice double recursion to be almost as satisfying as coming up with a mathematical proof. I'm also a big fan of perl, and use it daily.

Free software:
I don't run any commercial software on my machine. This isn't a philosophical viewpoint; it just works better. I run gentoo linux and the ratpoison window manager. The editor is vim, naturally. I make extensive use of various free scientific computing packages, including the GNU scientific library GSL with advanced random number generation, the canonical linear algebra package LAPACK, and the GNU linear programming kit GLPK. There are nice ocaml front-ends to all of these.

The rest of life:
When I'm not geeking out about science and computers I backcountry ski, climb, whitewater kayak, ride bikes and practice Ashtanga yoga. I also love to spend time in the backyard of my parents house hanging out with the folks and my friends from Seattle.


Miscellany

Why are there over 2500 hits on Google Scholar for F.A. Matsen?
I'm actually the fourth Frederick A. Matsen in my family, and both my father, an orthopaedic surgeon, and my grandfather, a theoretical physicist, are quite prolific. I could have avoided this name collision by using my nickname, but I'm proud of this heritage. Apologies.

Where did you get that cool shoulder bag?
It's made by my buddy Eli, who quit his engineering job and decided to start making messenger bags out of re-used materials. His company is called Alchemy Goods.
[Join the
FSF!]


Last modified June 5, 2007.
This document was translated from LATEX by HEVEA.