Frederick (a.k.a. Erick) A. Matsen
Allan Wilson Postdoctoral fellow Biomathematics Research
Centre
University of Canterbury
Private Bag 4800
Christchurch
New Zealand
+64 3 364 2987 x7431
Research focus
I develop mathematical techniques and computer algorithms to improve
our understanding of evolution. My current research is motivated by two main
questions: first, and how can we (better) reconstruct evolutionary
history from present-day DNA sequences? Second, how do organisms
diversify (e.g. speciate)?
How can we (better) reconstruct evolutionary
history from present-day DNA sequences?
This is a very big and somewhat old question, with hundreds of scientists
working on different aspects. The field which has developed is called
phylogenetics. The general idea is that organisms with similar DNA
sequences are usually more closely related than organisms which
quite different DNA sequences. Making this formal and then running
many computations on a computer leads to a tree diagram showing
interelationships; this diagram is called a phylogenetic tree.
My contributions to this big project are in two areas: phylogenetic
mixtures and theoretical analysis of Bayesian methods.
Mixtures: It is well established from a theoretical perspective that if
sequences evolve under a single (simple) model then a large amount
of sequence data will reconstruct the tree correctly with high
probability. However, it is now known that different parts of a
sequence evolve in different ways; this is formulated statistically
as a phylogenetic mixture model. In contrast to the single-process
case, it is known that data from mixtures of processes does not
uniquely determine a tree. Mike Steel and I recently realized that
even more is true: it is possible to have a mixture of two processes on
one tree such that the resulting data looks exactly like a single
process on a different tree. I'm now interested in whether these
sorts of issues really do pose a problem for phylogenetics researchers. Recent
work with Mossel and Steel partially addresses this question
through a combination of geometric and combinatorial means.
Bayesian: There are many different ways of building phylogenetic
trees, and one class of such methods are called Bayesian methods.
One advantage of these methods is that they can give posterior
probabilities, which are (more or less) an estimate of how
correct certain parts of the tree are.
However, it can happen that even if there is no actual evidence
determining how a certain set of species evolved, the methods can
choose one scenario and attach a very high posterior probability to
the story. This problem is called the ``star tree paradox,'' and
Mike Steel and I recently showed analytically that it can persist
even when the methods are given arbitrarily long DNA sequences.
How do organisms diversify?
This is an even older question, with a correspondingly bigger
literature. My focus is on only one approach, which is based on
looking at "shape" or overall structure of tree phylogenetic trees.
A quick review of a couple virus trees show that different
evolutionary scenarios can lead to different tree shapes.
In order to use tree shape in a scientific fashion, we need ways of
quantifying it. So far I have written about
tree shape in three ways: geometric,
algebraic/combinatorial, and recursive.
The recursive (optimization) approach has been the most productive for applications
to data. I am currently applying this framework with Katherine St.
John to search for evidence of tree reconstruction bias in modern
tree reconstruction algorithms. I am also collaborating with Alexei
Drummond applying these techniques to test for coalescent model
mis-specification.
Other projects
In the past, John Wakeley and I investigated a class of models
between the lattice model and the island model, and were able to
show that these models converged back to the island model when the
number of subpopulations goes to infinity. For this project I
applied some nice theory about random walks on graphs.
I have also worked on the evolution of language with Martin Nowak.
Rather than approach learning theory from the classical angle of an
idealized teacher-learner pair, we investigated a model where the
agents try to find a common language. We found some remarkably
simple individual strategies which led to the population finding a
common language with high probability given some constraints on the
underlying space of languages.
Publications
[PDF]
F. A. Matsen, E. Mossel, and M. Steel.
Mixed-up trees: the structure of phylogenetic mixtures.
arXiv:0705.4328 [q-bio.PE], 2007.
[PDF]
F. A. Matsen and M. Steel.
Phylogenetic mixtures on a single tree can mimic a tree of another
topology.
arXiv:0704.2260 [q-bio.PE], 2007.
[PDF]
M. Steel and F. A. Matsen.
The bayesian star paradox persists for long finite sequences.
Molecular Biology and Evolution, 24(4):1075--1079, April 2007.
[PDF]
F.A. Matsen.
Optimization over a class of tree shape statistics.
accepted to IEEE/ACM Transactions on Computational Biology and
Bioinformatics, 2006.
[PDF]
F.A. Matsen and S.N. Evans.
Ubiquity of synonymity: almost all large binary trees are not
uniquely identified by their spectra or their immanantal polynomials.
arXiv:q-bio/0512010, 2006.
[PDF]
F.A. Matsen.
A geometric approach to tree shape statistics.
Systematic Biology, 55(4):652--661, 2006.
[PDF]
F.A. Matsen and J. Wakeley.
Convergence to the island-model coalescent process in populations
with restricted migration.
Genetics, 172(1):701--708, January 2006.
[PDF]
F.A. Matsen and M.A. Nowak.
Win-stay, lose-shift in language learning from peers.
PNAS, 101(52):18053--18057, December 2004.
Commentary by K. Sigmund.
Software
simmons
My software to compute tree shape statistics.
alga, etc.
The source code for the genetic algorithm and related software described in
Optimization...
Other interests
Computer programming:
I am completely obsessed
with the very fast French functional/imperative language
ocaml which was first shown to me
by my buddy
Martin
Willensdorfer. Functional languages are appropriate for
experimenting with combinatorics, and I find writing a nice
double recursion to be almost as satisfying as coming up with a
mathematical proof. I'm also a big fan of
perl, and use it daily.
Free software:
I don't run any commercial software on my
machine. This isn't a
philosophical viewpoint; it just works better. I run
gentoo linux and the
ratpoison window
manager. The editor is
vim, naturally.
I make extensive use of various
free scientific computing packages, including the
GNU scientific library
GSL with advanced
random number generation, the canonical linear algebra package
LAPACK,
and the GNU linear programming kit
GLPK. There
are nice ocaml front-ends to all of these.
The rest of life:
When I'm not geeking out about science
and computers I backcountry ski, climb, whitewater kayak, ride bikes and practice
Ashtanga yoga. I also love to spend time in the backyard of my
parents house hanging out with the folks and my friends from
Seattle.
Miscellany
Why are there over 2500 hits on Google Scholar for F.A. Matsen?
I'm actually the fourth Frederick A.
Matsen in my family, and both my
father,
an orthopaedic surgeon, and my grandfather, a theoretical
physicist, are quite prolific. I could have avoided this
name collision by using my nickname, but I'm proud of this
heritage. Apologies.
Where did you get that cool shoulder bag?
It's made by my buddy Eli, who quit his engineering job and
decided to start making messenger bags out of re-used materials.
His company is called Alchemy
Goods.