How should I analyze differences between phyletic patterns?

26 Views Asked by At

I have found orthogroups (by OrthoFinder) in full archaeal proteoms of genus Halorubrum. As a result I have a dataframe with number of proteins in each orthogroup of each organism (number of orthogroup in rows and species in columns) where I have changed every number that is more than 1 to 1 to make phyletic patterns. In the end I have this dataframe:

Phyletic patterns of genus Halorubrum - df 'ogroups_pattern:

Phyletic patterns of genus Halorubrum - df 'ogroups_patterns'

There are

 thermophilic(['aethiopicum', 'coriense', 'tebenquichense', 'vacuolatum', 'lipolyticum', 'saccharovorum', 'terrestre', 'salsamenti','yunnanense', 'sodomense', 'distributum', 'aidingense', 'arcis'])

and non-thermophilic organisms. The question is: how should I analyze this data if my goal is to find differences of thermophilic patterns in compare to non-thermophilic?

I tried to calculate Jaccard index in every orthogroup

ogroups_patterns['J'] = ogroups_patterns_terms.sum(axis = 1, numeric_only = True) / ogroups_patterns.sum(axis = 1, numeric_only = True)

where ogroups_patterns_terms is a df with phyletic patterns as in the screenshot, but for thermophiles only

But I have no idea is this the correct way to calculate this index in this case. Maybe allowing zeros in the formula will be a good idea, but Im not sure how to code it.
Every little tip would be extremely helpful, really stucked at this part and have no ideas what to do and how to code it. Bigbig thanking in advance!

0

There are 0 best solutions below