anagallis: a program for parsimony analysis of character hierarchies


In parsimony analysis, the problem of inapplicables (see Maddison W.P. 1993. Syst. Biol. 42, 576-581) can be overcome by maximizing the amount of similarity that can be interpreted as homology, an idea that I first discussed in this 2002 talk.

Maximization of homology also provides the key to extend parsimony to the analysis of unaligned sequence data, as discussed in this 2004 talk and in this 2005 paper. In that paper it is shown that in tree alignment programs such as POY, cost regime 3221 (gap opening cost three, transition and transversion costs two, and gap extension cost one) provides an optimal approximation for the cost set that maximizes homology when all instances of homology are equally weighted. A discussion of differential weighting of homologies can be found in this 2015 paper (section on approximations and section on sensitivity analysis).

Inapplicables as they arise in the classic approach are a special case of inapplicables as they arise in sequence data. This special case can be tackled with algorithms that are computationally less complex. A discussion can be found in the above 2015 paper (section on inapplicables). Anagallis is a computer program that provides tree searches with such algorithms.

I announced soon-to-be release of anagallis at WHS XXXII (Rostock, 3-7 August 2013) but afterwards decided to postpone release until the program could guarantee optimality of tree scores obtained under a much wider range of conditions than initially announced (see current program documentation for some details). As it turned out, that took way longer than initially thought... My Rostock presentation is available at ResearchGate here. The related presentation of my 2012 Riverside WSH XXXI talk can be found here.

The first public version of anagallis (version 0.998 beta) was released on 16 April 2018. Version 1.01, the current version and the first version with a MacOS executable included, was released on 11 December 2018. A dump of its built-in documentation can be found here (it is also available as this osf preprint). It includes a discussion of the theory behind the program, of the high level structure of its main optimization algorithm, and of the scope and limits of that algorithm. A gzipped tarfile that contains the MacOS executable, a statically linked 32-bit linux executable, and several files to get started with the program can be found here. Basic instructions, including how to get started once the tarfile is downloaded, are here.

In October 2017, Brazeau, Guillerme and Smith published this interesting paper on morphological analysis with inapplicable data on BioRXiv. The main difference with my approach seems to be that they independently optimize single-column characters with inapplicables rather than character hierarchies as a whole. This may give good results under a wide range of conditions, but in general the optimization of a character hierarchy on a tree cannot be reduced to a series of independent single-character optimizations on that tree. Doing so may yield a fast approximation for the score of a character hierarchy on a tree, but it can miss optimal state reconstructions, miss the optimal score, and ultimately identify non-optimal trees as optimal during tree search. The first of these three issues can be illustrated using the example of their Fig. 3. It is discussed in more detail here (also available as this osf preprint).