Using a function on a column from tree file class Phylo

99 Views Asked by At

I have a phylogenetic tree with many tips and internal nodes. I have a list of node ids from the tree. These are part of a separate table. I want to add a new column to the table, children. To get the descendants (nodes and tips), I am using phangorn::Descendants(tree, NODEID, type = 'all'). I can add length to get the number of descendants. For example,

phangorn::Descendants(tree, 12514, type = 'all')
[1] 12515 12517 12516  5345  5346  5347  5343  5344

length(phangorn::Descendants(tree, 12514, type = 'all'))
[1] 8

I would like to very simply take the column in my dataframe 'nodes', and use the function above length(phangorn::Descendants(tree, 12514, type = 'all')) to create a new column in the dataframe based off the input nodes.

Here is an example:

tests <- data.frame(nodes=c(12551, 12514, 12519))
length(phangorn::Descendants(tree, 12519, type = 'all'))
[1] 2
length(phangorn::Descendants(tree, 12514, type = 'all'))
[1] 8
length(phangorn::Descendants(tree, 12551, type = 'all'))
[1] 2
tests$children <- length(phangorn::Descendants(tree, tests$nodes, type = 'all'))
tests
  nodes children
1 12551        3
2 12514        3
3 12519        3

As shown above, the number of children is the length of the data.frame and not the actual number of children calculated above. It should be:

tests
  nodes children
1 12551        2
2 12514        8
3 12519        2

If you have any tips or idea on how I can have this behave as expected, that would be great. I have a feeling I have to use apply() or I need to index inside before using the length() function. Thank you in advance.

2

There are 2 best solutions below

0
CRP On BEST ANSWER

You're super close! Here's one quick solution using sapply! There are more alternatives but this one seems to follow the structure of your question!

Generating some data

library(ape)

ntips <- 10
tree <- rtree(ntips)
targetNodes <- data.frame(nodes=seq(ntips+1, ntips+tree$Nnode))

Note that I'm storing all the relevant nodes in the targetNodes object. This is equivalent to the following object in your question:

tests <- data.frame(nodes=c(12551, 12514, 12519))

Using sapply

Now, let's use sapply to repeat the same operation across all the relevant nodes in targetNodes:

targetNodes$children<- sapply(targetNodes$nodes, function(x){
  length(phangorn::Descendants(tree, x, type = 'all'))
})

I'm saving the output of our sapply function by creating a new column in targetNodes.

Good luck!

0
klash On

You were even closer: using lengths instead of length should work.

tests$children <- lengths(phangorn::Descendants(tree, tests$nodes, type = 'all'))