I have a dataset which I have to use calculate the chi-squre test for it. For the explanatory variable Gender, and the outcome variable Vist. For gender I have Male, Female, and NotAnswered. The NotAnswered number is huge but I don't want it, if I use
pd.crosstab(df['Sex'], df['Visit'], margins=True)
I will have 3 rows and 2 columns, Is there anyway to omit the "NotAnswered" from the crosstab's result? I only want my crosstab table has two rows, Female and Male.
code to generate sample data
import pandas as pd
import numpy as np
sexes = np.concatenate([np.random.choice(["Male", "Female"], 90), ["NotAnswered"] * 10])
np.random.shuffle(sexes)
visits = np.random.randint(0, 2, 100)
# create the DataFrame
df = pd.DataFrame({
"Sex": sexes,
"Visit": visits
})