PhyDesign: An Overview

The PhyDesign web application consists of 3 components: This latter component produces profiles of phylogenetic informativeness, calculates net and relative (per site) informativeness over specified time intervals, and offers the ability to integrate informativeness over epochs of interest.

You can use the following files to test the PhyDesign web application (Right click and save link as...):
If you use PhyDesign, please, cite the following article:

Lopez-Giraldez F., and J.P. Townsend, 2011. PhyDesign: an online application for profiling phylogenetic informativeness. BMC Evol. Biol., 11: 152.

and the program used to calculate the rates:

Pond, S.L.K., Frost, S.D.W., and S.V. Muse, 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics, 21(5): 676–9.

Olsen, G. J., unpublished. DNArates. DNArates.

Mayrose, I., Graur, D., Ben-Tal, N., and T. Pupko, 2004 Comparison of Site-Specific Rate-Inference Methods for Protein Sequences: Empirical Bayesian Methods Are Superior. Mol. Biol. Evol., 21: 1781-91.

The Phylogenetic informativeness methods keep evolving. We recently incorparated in the website the methods for phylogenetic signal and noise analysis published in the following manuscript:

Townsend J.P., Su Z., and Y. I. Tekle, 2012. Phylogenetic signal and noise: predicting the power of a data set to resolve phylogeny. Syst. Biol., 61: 835-49.

We will soon incorporate the methods for phylogenetic signal and noise analysis based on the GTR model for a four-Taxon tree with uneven subtending branches. Meanwhile, this analysis can be performed using this Wolfram Mathematica notebook following these instructions.

Using PhyDesign

1)

Uploading information for analysis

 

In order to provide an analysis of phylogenetic informativeness, PhyDesign requires the site rate distribution for each locus. To obtain the rates, the user will need to input two files: (1) an alignment in Nexus or Phylip format of loci of interest pruned to contain a set of taxa for which the tree topology is fairly well known, and (2) an ultrametric tree for those taxa in Newick (recommended) or Nexus format. The ultrametric tree can be either a chronogram -- an ultrametric tree with branch lengths proportional to time -- or it can be in unspecified molecular evolutionary units. After uploading the alignment and the tree files, the user can choose a program from the drop-down menu with which to obtain the substitution rates. Once, a program has been chosen, it is possible to access to advance options where the user will be offered with different evolutionary models and parameters. For DNA sequences, we recommend use of HyPhy, for which a HyPhy batch file was created to implement all time-reversible models. Unlike DNArates, HyPhy also accepts multifurcating trees. For amino acid sequences, rate4site is provided. To facilitate extensive analyses of large datasets, the user is also asked to provide an email address where a link to the rates result will be sent.

Alternatively, if the rate distribution for each locus is known, the user can input a file that includes these rates in a specific format for the markers to be analyzed. (For more information on accepted file types and formats, check the FAQ). If an alignment and a tree are uploaded, the site rate vectors will first be calculated. A file containing these rates with the proper format can be downloaded for future use in the site rate form, eliminating the need to repeat this calculation step.

  Form1   Form2

2)

Downloading and choosing rates for profiling

 

By clicking the emailed link, the user will be taken to a page similar to one in the image. The table presents the rate estimation results for each alignment site divided by the partitions/genes (rows) that you defined in the input alignment. There is a link for each partition to the original output produced by the rate estimation program. Please check this file, just in case PhyDesign is missing some error message produced by the rate estimation program. This table also summarizes the number of sites for which a mutation rate could be calculated (#Rates), and the number of faulty sites for which this calculation was not possible (#Undef.). The last two columns of the table are used to specify the partitions to be profiled and the colors for representing them. Profiles and color for each one can be modified here or later.

Additionally, if the rates were estimated with PhyDesign, two results files are offered as downloads in the left panel: (1) a compressed file containing individual rate files for each locus, and (2) a single file containing site rate vectors for all loci. The latter can be downloaded for future uploading in the site rate form, eliminating the need to repeat the rate calculation. To obtain the profiles from the partitions selected, just click in the "Get Profiles" button.

  Rate table

3)

Manipulating profile: graph and quantitation

 

This is the interface through which the user can select partitions and colors for profiles, adjust the range of time values (x-axis), obtain net and per site informativeness (more info here), integrate over different epochs, and download the results. The sidebar contains all controls for manipulation of the data and graphics; the results are displayed on the main part of the page.

This interface has two main functions: displaying a graphical representation of phylogenetic informativeness, and calculating quantitative measures of informativeness (described later). By default, two graphs are shown in the main section: the ultrametric tree and, aligned to it, the phylogenetic informativeness profiles. In the sidebar, the first four sections relate primarily to controlling the visual display. First, the user must select which partitions will be analyzed and displayed on the graph. By clicking the color wheel, the user can change the color of the profiles. It is also possible to hide the tree used in the analysis by checking the "Hide tree" box. Also, the user can change the informativeness calculations depending on whether net informativeness or per site informativeness is more useful. Finally, the user can choose the minimum and maximum time values to be displayed and analyzed.

  Profile screenshot
 

This is a sample chronogram used in the analysis process as displayed on the profiling page. The tree and phylogenetic informativeness profiles can be downloaded as displayed in SVG format by "right-clicking" on the graph. The SVG files can be further modified in any vector graphics program (such as Adobe Illustrator or the Open Source editor Inkscape).

  Tree
 

The quantitation tool at the bottom of the sidebar allows the user to integrate over - or, what is the same calculate the area below the profiles - specified epochs and obtain informativeness values. The quantitation results file which can be downloaded is updated with the new calculations every time the quantitation form is submitted. The quantitation data table includes the number of rates, mean rate, standard deviation from the mean of the rates, and integration values based on the quantitation tool. Integrating phylogenetic informativeness over specific epochs provides a metric for ranking loci. Integration values will be largest for the loci that have the highest probability of substitution in the given epoch that will not be obscured by subsequent evolution. Note, however, that phylogenetic informativeness plots quantitate and display predicted signal and do not account for phylogenetic noise. Thus, quantitative results should be thoughtfully considered in light of homoplasy that is likely to arise and diminish utility during epochs deeper than the peak of informativeness for a given profile.

We have tested PhyDesign with 2000 loci with a length of 1000bp each. Although plotting 2000 profiles is possible, numerical informativeness values might be more practical for these problem sizes. If the user wants to analyze a higher volume of data, one can download

- Perl module for Signal and Noise analysis.

  Quantitate tool Integration sample