Subsetting a Boolean variable in Python

54 Views Asked by At

When a DataFrame ("Given_DF") has a Boolean variable (such as B below),how can one subset the DataFrame to keep only rows of Variable B with True value?.

Given_DF

ID    A     B
0    123   True
1    456   False
2    789   False
3    132   True
4    465   False

The 'Desired' subset is the DataFrame with only two rows (with ID 0 and 3).

  1. Tried subsetting B as a column,

    Desired = Given_DF["B"].isin(True)  
    
  2. Tried indexing the variable B and using loc to subset to "True" incidences B.

    prep.sort_index(level=["B"])
    Desired = prep.loc["True"]
    

Neither attempts worked. Help would be appreciated.

2

There are 2 best solutions below

4
Barmar On

The same way you subset with any other type. Put an expression that matches your condition inside the subscript of the df.

Desired = Given_DF[Given_DF["B"] == True]

or more simply

Desired = Given_DF[Given_DF["B"]]

.isin() is used when you have a collection of values you want to match, but True is not a collection. You'd have to write .isin([True]) for this to work.

0
Ingwersen_erik On

There are multiple ways to achieve your desired output. Here are a few of them:

# Option 1
filtered_df = Given_DF[Given_DF["B"] == True]

# Option 2: Using `.loc`
filtered_df = Given_DF.loc[Given_DF["B"] == True, :]

# Option 3: Using pd.DataFrame.query
filtered_df = Given_DF.query("B == True")

print(filtered_df)
# Prints:
#
#    ID    A     B
# 0   0  123  True
# 3   3  132  True

If you want to select a specific column(s) after filtering for rows with column "B" equal to True, you can use the following:

# Filtering column B using option 1, previously exemplified and then selecting column "A"
Given_DF[Given_DF["B"] == True]["A"]

# Filtering column B using option 2, previously exemplified and then selecting columns "A", and "ID"
Given_DF.loc[Given_DF["B"] == True, ["ID", "A"]]

# Filtering column B using option 3, previously exemplified and then selecting its index values
remaining_indexes = Given_DF.query("B == True").index

# You can then use these indexes to filter `Given_DF` dataframe of apply it for many other
# use-cases:
Given_DF.loc[Given_DF.index.isin(remaining_indexes), :]