In phylogenetic analysis we often face the problem that several subclade topologies are known or easily inferred and well supported by bootstrap analysis, but basal branching patterns cannot be unambiguously estimated by the usual methods maximum parsimony [1], neighbor-joining [2], or maximum likelihood [3], nor are they well supported. The ProfDist-software implements the profile neighbor-joining method (PNJ) as described in [4]. ProfDist inherits the accuracy and robustness of profiles and the time efficiency of neighbor-joining. The software enables one to represent subclades by a sequence profile and to estimate evolutionary distances between profiles to obtain a matrix of distances between subclades. Subclades could be defined automatically, semiautomatically or manually, before a variant of the neighbor-joining algorithm [5] is applied to reconstruct a phylogenetic tree. The robustness of the tree estimation based on profiles is validated by running a variant of the bootstrap procedure [6], [7]. Resulting trees or a consensus tree including edge lengths can be visualized by standard tree viewers like ATV [8], TREEVIEW [9], NJplot [10] or hyperbolic-tree [11]. The main feature of ProfDist is the efficient implementation of PNJ. However, ProfDist combines this new algorithm with standard methods that are used in phylogenetics and all of which can be started step by step or independently by the buttons on top of the ProfDist main window. To improve the usability several tree resp. sequence file formats, e.g., NEWICK resp. FASTA or EMBL, are supported. In the preferences section window, the essential parameters of PNJ are shown and can be set or adjusted: (1) the number of bootstrap replicates, (2) the distance estimation method, (3) a user defined substitution model, (4) the PNJ agglomeration procedure and (5) the path to the preferred tree viewer. Implemented distance estimation methods are JC [12], K2P [13], GTR [14], [15] or the Log-Det transformation [16]. For the profile neighbor joining algorithm there are only two driving parameters important: (1) the minimal bootstrap value supporting a group of sequences representing a trustworthy monophyletic group which is transformed to a sequence profile, and (2) a percent identity threshold representing an a priori profile to prevent low bootstrap values due to high sequence identities. Because we used memory as well as time efficient algorithms and data structures, PNJ is suitable for reconstructing large phylogenetic trees. In particular, if one knows in advance certain aligned sequences being monophyletic and in EMBL-format (e.g., as in the case of the European rRNA database [17]), the time and memory efficency of PNJ is once more drastically improved.
Algorithm and program are further described in [18].