Using Seaborn to plot pH over time with approximately 5000 points

Question

Using Seaborn to plot pH over time with approximately 5000 points

58 Views Asked by dr_jack3l At 09 January 2024 at 19:20

I am trying to learn more about Pandas/Matplotlib/Seaborn, by working with datasets at work. I would like to get to automating reports with Seaborn graphs, granted, that is probably overkill for basic graphs...but I like the way they look. I am struggling to get a simple graph to work. We pull data weekly, and its 4500-5000 datapoints. I want to work with pH and datetime to better understand working with large datasets.

Here is a snapshot of how I have initially set the data up. The csv file had a bunch of crap in top rows, so I set custom names for the columns I want to work with and removed the first 7 so that it starts with my data:

import pandas as pd
import seaborn as sns
import os
from datetime import datetime

df = pd.read_csv(r"Datasets\Effluent data export 12.28.2023 through 01.03.2024.csv")
df = df.drop(df.index[0:8]).rename(columns={df.columns[0]: 'Date', df.columns[1]: 'Level (in)', df.columns[2]: 'Flow Rate (gpm)', df.columns[3]: 'Total Flow (Gal)', df.columns[12]: 'pH', df.columns[13]: 'Temperature (C)'})
df = df.dropna(subset=['pH'])

I added the dropna for pH because using df.info() I noticed that there was one row of NA data and I don't know how to use that so I removed it. I am unsure how to work with NA as far as replacing it with something to maintain true data integrity, but I can figure that out later.

My graph is coming out with black lines all over the X and Y axis, which is probably due to it trying to show ALL of the ticks at once. I tried to solve this by changing the date from an object to datetime using:

df['Date'] = pd.to_datetime(df['Date'])

That got my date column to go from an object to a datetime64 type, and I have equal non-null counts for both of these columns now.

My issue now is that I don't know how to make a graph with this much data. I know that I need to go back and better understand Numpy, Matplotlib, and Pandas...but I am hoping someone can help me understand what it is that I actually need to learn/focus on for how to work with the graphs to show this data. Currently, I have tried these two things, and the result is ultimately the same in both cases with a line that doesn't look right, and no real graph:

sns.lineplot(data=df, x="Date", y="pH")
sns.scatterplot(data=df, x="Date", y="pH")

Here is an example of the scatterplot graph, which due to the pH being fairly tight, should look more like a parallel line and not one with a negative slope as shown:

Seaborn Scatterplot Fail

My ultimate goal is to create graphs for weeks, months, and years, for our facility data. But I don't know how to work with sets this large, or what graphs would be good. I want to look at water level, flow rate, pH, and temperature (see my .rename() for the actual names but I need to figure out how to make a meaningful graph of datetime vs pH for now so I can see how it works.

Original Q&A

There are 1 best solutions below

**Daweo** · Answer 1 · 2024-01-09T19:41:00.393000

create graphs for weeks, months, and years

I suggest trying grouping by period, consider following simple example

import pandas as pd
df = pd.DataFrame({"when":[f"2024-1-{i}" for i in range(1,31)],"value":range(100,130)})
df["when"] = pd.to_datetime(df["when"])
df_weekly_median = df.groupby(df["when"].dt.to_period("W"))["value"].median()
print(df_weekly_median)

output

when
2024-01-01/2024-01-07    103.0
2024-01-08/2024-01-14    110.0
2024-01-15/2024-01-21    117.0
2024-01-22/2024-01-28    124.0
2024-01-29/2024-02-04    128.5
Freq: W-SUN, Name: value, dtype: float64

Explanation: I create dataframe with dates from 2024-1-1 to 2024-1-30 with values from 100 to 129 (range is inclusive-exclusive) then I convert it to datetime and I use groupby to aggregate data, in this case into weeks (W, see Period aliases for allowed values) and taking median of it.

Using Seaborn to plot pH over time with approximately 5000 points

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in SEABORN

Related Questions in GRAPHING

Trending Questions

Popular # Hahtags

Popular Questions