Iterating over a pandas DataFrame using iterrows() produces a series of index Series pairs (tuples).
for timestamp, row in df.iterrows():
I am aware that iterrows() is slow. Ignoring that issue for now -
Some of the returned rows will contain None or NaN values. I want to remove these. (Not from the DataFrame but from a copy of each row returned by iterrows().)
I also want to remove a subset of "columns". Columns are named with a 2-level MultiIndex.
Here's an idea of what the DataFrame looks like:
AACT ABILF ...
open high low close open high low close ...
timestamp ...
2022-01-04 00:00:00 NaN NaN NaN NaN NaN NaN NaN NaN ...
2022-01-04 00:01:00 NaN NaN NaN NaN NaN NaN NaN NaN ...
2022-01-04 00:02:00 NaN NaN NaN NaN NaN NaN NaN NaN ...
2022-01-04 00:03:00 NaN NaN NaN NaN NaN NaN NaN NaN ...
2022-01-04 00:04:00 NaN NaN NaN NaN NaN NaN NaN NaN ...
All the values happen to be NaN here, however in general that will not be the case.
Because I do not know how to approach this problem, here is some pseudocode:
for timestamp, row in df.iterrows():
row.drop([('AACT', 'open'), ('AACT', 'high'), ('AACT', 'low')])
row.drop([('ABILF', 'open'), ('ABILF', 'high'), ('ABILF', 'low')])
row.dropna()
# remaining data is `('AACT', 'close')` and `('ABILF', 'close')`
# iff values in this `Series` are non-NaN
There is the
level=...argument in thepandas.DataFrame.drop()method, see the documentation here.See the following reproductible example: