Select rows based on data variable in column and remove others

Question

Select rows based on data variable in column and remove others

83 Views Asked by Neelesh Gupta At 03 July 2022 at 08:19

I have created a DataFrame (df):

    index   compound    e_above_hull         space
    0   CaFeO3  0.052160963499999546    {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
    1   BaSiO3  0.5612689803333337  {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
    2   BaGeO3  0.23235826850000052 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
    3   CdIrO3  0.46319859100000027 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
    4   BaZrO3  4.992200000053515e-05   {'symprec': 0.1, 'source': 'spglib', 'symbol': 'I4/mcm', 'number': 140, 'point_group': '4/mmm', 'crystal_system': 'tetragonal', 'hall': '-I 4 2c'}
    5   CeGaO3  0.12483252989999993 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pnma', 'number': 62, 'point_group': 'mmm', 'crystal_system': 'orthorhombic', 'hall': '-P 2ac 2n'}

Now I want to select those rows only which contains ('symbol': 'Pm-3m') (or 'hall': '-P 4 2 3' or 'number': 221, ). I tried this way but this does not make any change in Dataframe.

df[df['space'].str.contains("'symbol': 'Pm-3m'")==True]

My list contains 2000 plus items, I have shown few of them only. Final output

index   compound    e_above_hull         space
0   CaFeO3  0.052160963499999546    {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
1   BaSiO3  0.5612689803333337  {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
2   BaGeO3  0.23235826850000052 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
3   CdIrO3  0.46319859100000027 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}

Further, my desired Dataframe is like:

index   compound    e_above_hull            space
        0   CaFeO3  0.052160963499999546     Pm-3m
        1   BaSiO3  0.5612689803333337       Pm-3m
        2   BaGeO3  0.23235826850000052      Pm-3m
        3   CdIrO3  0.46319859100000027      Pm-3m

Original Q&A

There are 1 best solutions below

**itprorh66** · Answer 1 · 2022-07-04T13:25:50.300000

Based on what I gather from your question you are asking how to:

filter the contents of your dataframe such that it only contains entries satisfying the following criteria: 'symbol': 'Pm-3m') or ('hall': '-P 4 2 3' or ('number': 221).
replace the contents of the column 'space' with 'Pm-3m' for the filtered rows.

While the following approach may not be the most efficient, it does get the job done. I am sure there are ways to use either df.map, df.filter, or df.query to do the job more efficiently but the following is the approach I would use:

Given the dataframe:

    compound e_above_hull   space
Index           
0   CaFeO3  0.052161    {'symprec': 0.1, 'source': 'spglib', 'symbol':...
1   BaSiO3  0.561269    {'symprec': 0.1, 'source': 'spglib', 'symbol':...
2   BaGeO3  0.232358    {'symprec': 0.1, 'source': 'spglib', 'symbol':...
3   CdIrO3  0.463199    {'symprec': 0.1, 'source': 'spglib', 'symbol':...
4   BaZrO3  0.000050    {'symprec': 0.1, 'source': 'spglib', 'symbol':...
5   CeGaO3  0.124833    {'symprec': 0.1, 'source': 'spglib', 'symbol':...

Add the Check Column to this dataframe which evaluates to True if the criteria are met using:

df['Check'] = [x['symbol'] == 'Pm-3m' or x['hall'] == '-P 4 2 3' or x['number'] == 221 for x in df['space'].to_list()]

Then filter and modify the dataframe as follows:

dfa = df[ df['Check'] == True][['compound', 'e_above_hull', 'space']]
dfa['space'] = ['Pm-3m' for x in dfa['space'].to_list()]

This produces the following result:

    compound    e_above_hull    space
Index           
0   CaFeO3  0.052161    Pm-3m
1   BaSiO3  0.561269    Pm-3m
2   BaGeO3  0.232358    Pm-3m
3   CdIrO3  0.463199    Pm-3m

Select rows based on data variable in column and remove others

There are 1 best solutions below

Related Questions in PYTHON-3.X

Related Questions in DATAFRAME

Related Questions in PYMATGEN

Trending Questions

Popular # Hahtags

Popular Questions