I have the following dataframe:
data = [
[ID, '2022-04-23T03:36:26Z', 60, 10, 83],
[ID, '2022-04-23T03:37:30Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:37:48Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:38:24Z', 61, 11, 72],
[ID, '2022-04-23T03:44:20Z', 63, 13, 75],
[ID, '2022-04-23T03:45:02Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:45:06Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:45:08Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:45:12Z', Nan, Nan, Nan],
[ID, '2022-04-23T03:45:48Z', 69, 15, 61]
]
df = pd.DataFrame(data=data,
columns=['ID', 'time', 'latitude', 'longitude', 'speed')
The problem is that for some rows I have only the time value e.g. row 2 and 3. For these rows, I want to calculate the average speed based on time, latitude and longitude of the row preceding (row 1) and following (row 4) the Nan speed rows.
For example, the speed value in row 2 and 3 should be an average speed value which is based on the travelled distance (maybe using Haversine formula) divided by the total amount of time ('2022-04-23T03:38:24Z' - '2022-04-23T03:36:26Z').
How can I write this in Python?
pandas.DataFrame.interpolatemay be what you're looking for if you're looking for a naive approach (there's other options if you're looking for something more specific just see the docs):Result: