I have created a DataFrame (df):
index compound e_above_hull space
0 CaFeO3 0.052160963499999546 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
1 BaSiO3 0.5612689803333337 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
2 BaGeO3 0.23235826850000052 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
3 CdIrO3 0.46319859100000027 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
4 BaZrO3 4.992200000053515e-05 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'I4/mcm', 'number': 140, 'point_group': '4/mmm', 'crystal_system': 'tetragonal', 'hall': '-I 4 2c'}
5 CeGaO3 0.12483252989999993 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pnma', 'number': 62, 'point_group': 'mmm', 'crystal_system': 'orthorhombic', 'hall': '-P 2ac 2n'}
Now I want to select those rows only which contains ('symbol': 'Pm-3m') (or 'hall': '-P 4 2 3' or 'number': 221, ). I tried this way but this does not make any change in Dataframe.
df[df['space'].str.contains("'symbol': 'Pm-3m'")==True]
My list contains 2000 plus items, I have shown few of them only. Final output
index compound e_above_hull space
0 CaFeO3 0.052160963499999546 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
1 BaSiO3 0.5612689803333337 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
2 BaGeO3 0.23235826850000052 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
3 CdIrO3 0.46319859100000027 {'symprec': 0.1, 'source': 'spglib', 'symbol': 'Pm-3m', 'number': 221, 'point_group': 'm-3m', 'crystal_system': 'cubic', 'hall': '-P 4 2 3'}
Further, my desired Dataframe is like:
index compound e_above_hull space
0 CaFeO3 0.052160963499999546 Pm-3m
1 BaSiO3 0.5612689803333337 Pm-3m
2 BaGeO3 0.23235826850000052 Pm-3m
3 CdIrO3 0.46319859100000027 Pm-3m
Based on what I gather from your question you are asking how to:
While the following approach may not be the most efficient, it does get the job done. I am sure there are ways to use either df.map, df.filter, or df.query to do the job more efficiently but the following is the approach I would use:
Given the dataframe:
Add the Check Column to this dataframe which evaluates to True if the criteria are met using:
Then filter and modify the dataframe as follows:
This produces the following result: