I want to make a tree (cluster) using Interactive Tree of Life web-based tool (iTOL). As an input file (or string) this tool uses Newick format which is a way of representing graph-theoretical trees with edge lengths using parentheses and commas. Beside that, additional information might be supported such as bootstrapped values of cluster's nodes.
For example, here I created dataset for a cluster analysis using clusterGeneration
package:
library(clusterGeneration)
set.seed(1)
tmp1 <- genRandomClust(numClust=3, sepVal=0.3, numNonNoisy=5,
numNoisy=3, numOutlier=5, numReplicate=2, fileName="chk1")
data <- tmp1$datList[[2]]
Afterwards I performed cluster analysis and assessed support for the cluster's nodes by bootstrap using pvclust
package:
set.seed(2)
y <- pvclust(data=data,method.hclust="average",method.dist="correlation",nboot=100)
plot(y)
Here is the cluster and bootstrapped values:
In order to make a Newick file, I used ape
package:
library(ape)
yy<-as.phylo(y$hclust)
write.tree(yy,digits=2)
write.tree
function will print tree in a Newick format:
((x2:0.45,x6:0.45):0.043,((x7:0.26,(x4:0.14,(x1:0.14,x3:0.14):0.0064):0.12):0.22,(x5:0.28,x8:0.28):0.2):0.011);
Those numbers represent branch lengths (cluster's edge lengths). Following instructions from iTOL help page ("Uploading and working with your own trees" section) I manually added bootstrapped values into my Newick file (bolded values below):
((x2:0.45,x6:0.45)74:0.043,((x7:0.26,(x4:0.14,(x1:0.14,x3:0.14)55:0.0064)68:0.12)100:0.22,(x5:0.28,x8:0.28)100:0.2)63:0.011);
It works fine when I upload the string into iTOL. However, I have a huge cluster and doing it by hand seems tedious...
QUESTION: What would be a code that can perform it instead of manual typing?
Bootstrap values can be obtained by:
(round(y$edges,2)*100)[,1:2]
Branch lengths used to form Newick file can be obtained by:
yy$edge.length
I tried to figure out how write.tree
function works after debugging it. However, I noticed that it internally calls function .write.tree2
and I couldn't understand how to efficiently change the original code and obtain bootstrapped values in appropriate position in a Newick file.
Any suggestion are welcome.
Here is one solution for you: objects of class
phylo
have an available slot callednode.label
that, appropriately, gives you the label of a node. You can use it to store your bootstrap values. There will be written in your Newick File at the appropriate place as you can see in the code of.write.tree2
:The real difficulty is to find the proper order of the nodes. I searched and searched but couldn't find a way to find the right order a posteriori.... so that means we will have to get that information during the transformation from an object of class
hclust
to an object of classphylo
.And luckily, if you look into the function
as.phylo.hclust
, there is a vector containing the nodes index in their correct order vis-à-vis the previoushclust
object:Which means we can make our own
as.phylo.hclust
with anodenames
parameter as long as it is in the same order as the nodes in thehclust
object (which is the case in your example sincepvclust
keeps a coherent order internally, i. e. the order of the nodes in the hclust is the same as in the table in which you picked the bootstraps):In the end, here is how you would use this new function in your case study: