Plot multiple dataframe in a plot with facet_wrap

196 Views Asked by At

I have a dataset df that looks like this:

ID      Week    VarA    VarB    VarC    VarD
s001    w1      2       5       4       7
s001    w2      4       5       2       3
s001    w3      7       2       0       1
s002    w1      4       0       9       8
s002    w2      1       5       2       5
s002    w3      7       3       6       0
s001    w1      6       5       7       9
s003    w2      2       0       1       0
s003    w3      6       9       3       4

For each ID, I am trying to plot its progress by Week for all Var (VarB,VarC,VarD) with VarA as the reference data.

I do df.melt() and run coding below and it works.

ID     Week  Var  Value
s001    w1  VarA    2
s001    w2  VarA    4
s001    w3  VarA    7
s002    w1  VarA    4
s002    w2  VarA    1
s002    w3  VarA    7
s001    w1  VarA    6
s003    w2  VarA    2
s003    w3  VarA    6
s001    w1  VarB    5
s001    w2  VarB    5
...

Codes:

for id in idlist:

#get VarA into new df
newdf= df_melt[df_melt.Var == 'VarA']

#remove rows with VarA so it won't be included in facet_wrap()  
tmp = df_melt[df_melt.Var != 'VarA']

plot2 = ggplot() + ggtitle(id) + labs(x='Week',y="Value") \
    + geom_point(newdf[newdf['ID'] == id], aes(x='Week',y='Value')) \
        + geom_point(tmp[tmp['ID'] == id], aes(x='Week',y='Value',color='Var')) \
           + theme(axis_text_x=element_text(rotation=45))

print(plot2)  

However, when I add facet_wrap('Var', ncol=3,scales='free') I get an error below

IndexError: arrays used as indices must be of integer (or boolean) type

And also I couldn't connect the line using geom_line().

This is my expected output: enter image description here

Is this because of the different df used? Is there a way to use multiple geom_point() for different df and facet_wrap in one ggplot object?

2

There are 2 best solutions below

0
has2k1 On BEST ANSWER

The issue with the question is a bug that would be reproduced by the following code. The bug has been fixed and the next version of plotnine will have the fix.

import pandas as pd
from plotnine import *

df1 = pd.DataFrame({
    'x': list("abc"),
    'y': [1, 2, 3],
    'g': list("AAA")

})

df2 = pd.DataFrame({
    'x': list("abc"),
    'y': [4, 5, 6],
    'g': list("AAB")
})

(ggplot(aes("x", "y"))
 + geom_point(df1)
 + geom_point(df2)
 + facet_wrap("g", scales="free_x")
)
0
kaixas K On

In addition to the fixed bug as mentioned by @has2k1, I have found the solution to add a reference data point VarA by renaming the column name of Var to something else so that both df do not have the same column name and will allow facet_wrap to work only on one of the df.

for pt in idlist:
    #get VarA into new df
    newdf = df_melt[df_melt.Var == 'VarA']
    newdf.rename(columns = {'Var':'RefVar'},inplace=True)

    #remove rows with VarA so it won't be included in facet_wrap() 
    tmp = df_melt[df_melt.Var != 'VarA']

    plot2 = ggplot() \
        + geom_point(tmp[tmp['ID'] == pt],aes(x='Week',y='Value',color='Var')) \
        + facet_wrap('Var',ncol=1,scales='free') \
        + geom_point(newdf[newdf['ID'] == pt],aes(x='Week',y='Value'))  \
        + labs(x='Week',y='Value') + ggtitle(pt) + theme(axis_text_x=element_text(rotation=45),subplots_adjust={'hspace': 0.6})

    print(plot2)