Infer presence and mode of polyploidy

Below are the inputs, commands, and outputs to do several analyses with GRAMPA. The inputs are based on simulated data. For more detailed info on the simulations check our paper.

Important!

These examples were done with earlier versions of GRAMPA (<1.4.0) so some of the command line options and output formats may have changed, but the general idea and results remain the same. See the README for up-to-date info on options and formats.

Allopolyploidy

Inputs

1000 gene trees were simulated with gain and loss using JPrIME based on the following allopolyploid-like MUL-tree:

In this scenario, lineages B and C hybridized to form an allopolyploid lineage that diversified into the x,y,z clade.

We then remove one polyploid clade from the MUL-tree to get a singly-labeled tree as input for GRAMPA. This is the same topology as above, except that the x,y,z clade sister to C is removed. This is the type of tree that typical phylogenetic reconstruction programs would produce even in the presence of allopolyploidy:

So, the input files for this search are:

  1. Singly-labeled species tree: spec_tree_3a.tre
  2. 1000 gene trees simulated from an allopolyploid MUL-tree: gene_trees_3a.txt

GRAMPA command

Since in reality we wouldn't know whether there is an allo-, auto-, or no polyploidy in this tree, we want GRAMPA to search all nodes as possible polyploid lineages. That means we don't specify -h1 or -h2.

python grampa.py -s spec_tree_3a.tre -g gene_trees_3a.txt -o allo_example_output -f allo_test -v 0

Outputs

The above command would create the directory allo_example_output with four output files

Since we are trying to determine the mode of polyploidy, we are interested in the allo_test_out.txt file. This file contains log info and the total reconciliation scores for each MUL-tree considered and looks something like this:

# Tree #    H1 node H2 node Tree string Total score
ST          ((((((x,y)<1>,z)<2>,B)<3>,A)<4>,C)<5>,D)<6> 7980
MT-1    A   A   ((((((x,y)<1>,z)<2>,B)<3>,(A+,A*)<4>)<5>,C)<6>,D)<7>    8272
MT-2    A   C   ((((((x,y)<1>,z)<2>,B)<3>,A+)<4>,(C,A*)<5>)<6>,D)<7>    8767
MT-3    A   B   ((((((x,y)<1>,z)<2>,(B,A*)<3>)<4>,A+)<5>,C)<6>,D)<7>    8777
MT-4    A   D   ((((((x,y)<1>,z)<2>,B)<3>,A+)<4>,C)<5>,(D,A*)<6>)<7>    8553
.
.
.
MT-126  <5> <6> (((((((x+,y+)<1>,z+)<2>,B+)<3>,A+)<4>,C+)<5>,D)<6>,(((((x*,y*)<7>,z*)<8>,B*)<9>,A*)<10>,C*)<11>)<12>    8420
MT-127  <5> <5> (((((((x+,y+)<1>,z+)<2>,B+)<3>,A+)<4>,C+)<5>,(((((x*,y*)<6>,z*)<7>,B*)<8>,A*)<9>,C*)<10>)<11>,D)<12>    7824
# ---------
The MUL-tree with the minimum parsimony score is MT-74: ((((((x+,y+)<1>,z+)<2>,B)<3>,A)<4>,(C,((x*,y*)<5>,z*)<6>)<7>)<8>,D)<9>
Score = 5018

GRAMPA tells us MUL-tree 74 is the lowest scoring tree:

((((((x+,y+)<1>,z+)<2>,B)<3>,A)<4>,(C,((x*,y*)<5>,z*)<6>)<7>)<8>,D)<9>

Notice that this is the same topology that was used to simulate the gene-trees. GRAMPA has successfully identified an allopolyploid MUL-tree and placed the second polyploid lineage on the correct branch!

Autopolyploidy

1000 gene trees were simulated with gain and loss using JPrIME based on the following autopolyploid-like MUL-tree:

In this scenario, a lineage sister to species C underwent autopolyploidization and subsequently diversified into the x,y,z clade.

We then remove one polyploid clade from the MUL-tree to get a singly-labeled tree as input for GRAMPA. This is the same topology as above, except that one x,y,z clade is removed. This is the type of tree that typical phylogenetic reconstruction programs would produce even in the presence of autopolyploidy:

So, the input files for this search are:

  1. Singly-labeled species tree: spec_tree_18.tre
  2. 1000 gene trees simulated from an allopolyploid MUL-tree: gene_trees_18.txt

GRAMPA command

Since in reality we wouldn't know whether there is an allo-, auto-, or no polyploidy in this tree, we want GRAMPA to search all nodes as possible polyploid lineages. That means we don't specify -h1 or -h2.

python grampa.py -s spec_tree_18.tre -g gene_trees_18.txt -o auto_example_output -f auto_test -v 0

Outputs

The above command would create the directory auto_example_output with four output files

Since we are trying to determine the mode of polyploidy, we are interested in the auto_test_out.txt file. This file contains log info and the total reconciliation scores for each MUL-tree considered and looks something like this:

# Tree #    H1 node H2 node Tree string Total score
ST          (((B,A)<1>,(((x,y)<2>,z)<3>,C)<4>)<5>,D)<6> 5476
MT-1    A   A   (((B,(A+,A*)<1>)<2>,(((x,y)<3>,z)<4>,C)<5>)<6>,D)<7>    6280
MT-2    A   C   (((B,A+)<1>,(((x,y)<2>,z)<3>,(C,A*)<4>)<5>)<6>,D)<7>    6244
MT-3    A   B   ((((B,A*)<1>,A+)<2>,(((x,y)<3>,z)<4>,C)<5>)<6>,D)<7>    6115
MT-4    A   D   (((B,A+)<1>,(((x,y)<2>,z)<3>,C)<4>)<5>,(D,A*)<6>)<7>    6088
.
.
.
MT-132  <5> <6> ((((B+,A+)<1>,(((x+,y+)<2>,z+)<3>,C+)<4>)<5>,D)<6>,((B*,A*)<7>,(((x*,y*)<8>,z*)<9>,C*)<10>)<11>)<12>    6040
MT-133  <5> <5> ((((B+,A+)<1>,(((x+,y+)<2>,z+)<3>,C+)<4>)<5>,((B*,A*)<6>,(((x*,y*)<7>,z*)<8>,C*)<9>)<10>)<11>,D)<12>    6103
# ---------
The MUL-tree with the minimum parsimony score is MT-57: (((B,A)<1>,((((x+,y+)<2>,z+)<3>,((x*,y*)<4>,z*)<5>)<6>,C)<7>)<8>,D)<9>
Score = 4807

GRAMPA tells us MUL-tree 57 is the lowest scoring tree:

(((B,A)<1>,((((x+,y+)<2>,z+)<3>,((x*,y*)<4>,z*)<5>)<6>,C)<7>)<8>,D)<9>

Notice that this is the same topology that was used to simulate the gene-trees. GRAMPA has successfully identified an autopolyploid MUL-tree on the correct branch!

No polyploidy

1000 gene trees were simulated with gain and loss JPrIME based on the following singly-labeled tree:

In this scenario, no polyploidy has occurred and this is the same tree we give to GRAMPA.

So, the input files for this search are:

  1. Singly-labeled species tree: spec_tree_33.tre
  2. 1000 gene trees simulated from an allopolyploid MUL-tree: gene_trees_33.txt

GRAMPA command

Since in reality we wouldn't know whether there is an allo-, auto-, or no polyploidy in this tree, we want GRAMPA to search all nodes as possible polyploid lineages. That means we don't specify -h1 or -h2.

python grampa.py -s spec_tree_33.tre -g gene_trees_33.txt -o nop_example_output -f nop_test -v 0

Outputs

The above command would create the directory nop_example_output with four output files

Since we are trying to determine the mode of polyploidy, we are interested in the nop_test_out.txt file. This file contains log info and the total reconciliation scores for each MUL-tree considered and looks something like this:

# Tree #    H1 node H2 node Tree string Total score
ST          ((((((x,y)<1>,z)<2>,B)<3>,A)<4>,C)<5>,D)<6> 4115
MT-1    A   A   ((((((x,y)<1>,z)<2>,B)<3>,(A+,A*)<4>)<5>,C)<6>,D)<7>    4423
MT-2    A   C   ((((((x,y)<1>,z)<2>,B)<3>,A+)<4>,(C,A*)<5>)<6>,D)<7>    4753
MT-3    A   B   ((((((x,y)<1>,z)<2>,(B,A*)<3>)<4>,A+)<5>,C)<6>,D)<7>    4929
MT-4    A   D   ((((((x,y)<1>,z)<2>,B)<3>,A+)<4>,C)<5>,(D,A*)<6>)<7>    4754
.
.
.
MT-126  <5> <6> (((((((x+,y+)<1>,z+)<2>,B+)<3>,A+)<4>,C+)<5>,D)<6>,(((((x*,y*)<7>,z*)<8>,B*)<9>,A*)<10>,C*)<11>)<12>    4760
MT-127  <5> <5> (((((((x+,y+)<1>,z+)<2>,B+)<3>,A+)<4>,C+)<5>,(((((x*,y*)<6>,z*)<7>,B*)<8>,A*)<9>,C*)<10>)<11>,D)<12>    4877
# ---------
The tree with the minimum parsimony score is the singly-labled tree (ST):   ((((((x,y)<1>,z)<2>,B)<3>,A)<4>,C)<5>,D)<6>
Score = 4115

GRAMPA tells us that the singly-labeled tree is the lowest scoring tree:

((((((x,y)<1>,z)<2>,B)<3>,A)<4>,C)<5>,D)<6>

Notice that this is the same topology that was used to simulate the gene-trees. GRAMPA has successfully determined that no polyploidy has occurred among these lineages!