Plotting Non-Uniform Time Series Data from a Text File

85 Views Asked by At

This question is a follow up to How to read a .txt file to graph a plot.

I have a file with time series data in the following format:

00:01:28,102,103,103 20-03-2024
00:02:16,111,110,110
00:02:33,108,109,109
00:02:49,107,108,108
...24 hours read...  # not in the measurement file
23:58:54,111,112,112
23:59:11,109,110,110
23:59:47,115,116,117
00:00:04,115,116,116 21-03-2024
00:00:20,121,122,120
00:00:36,124,125,125
...24 hours read...
23:59:02,115,115,116
23:59:19,114,114,114
23:59:51,113,114,115
00:00:07,113,114,115 22-03-2024
00:00:24,116,117,115
00:00:45,115,115,116
...24 hours read...
23:59:08,101,101,100
23:59:32,103,103,102
23:59:48,102,102,102
...Next day...

Each line includes a timestamp, three numerical readings, and occasionally a date indicating the start of a new day. I am trying to plot this data with pandas and matplotlib but encounter two main issues: the x-axis labels (hours) overlap and the plot loads slowly.

Here's my current approach to plotting:

plt.figure(figsize=(15,9))
plt.xlabel('Day')
plt.ylabel('Voltage')
# Plot three series from the data
plt.plot(C0Temp, C1Temp, label="Voltage", color=LineColorTemp1Text)
plt.plot(C2Temp, C3Temp, label="Max", color='r')
plt.plot(C4Temp, C5Temp, label="Min", color='g')
plt.legend()

# Attempt to format x-axis to handle daily data
locator = mdates.AutoDateLocator(minticks=12, maxticks=24)
plt.gcf().axes[0].xaxis.set_major_locator(locator)
plt.xticks(rotation=45)

I'm looking for guidance on how to effectively plot this data day by day or even across months, ensuring the x-axis labels are readable and the plot loads efficiently.

1

There are 1 best solutions below

0
Trenton McKinney On

Given the non-uniform format of the text file, it will need to be parsed line-by-line. This method allows for handling variations in data representation, such as the presence or absence of dates on certain lines and the inclusion of non-data lines (e.g., "24 hours read..." and "Next day"). By reading each line, the script differentiates between data entries and metadata or comments, ensuring that only relevant information is processed. This approach prepares a structured dataset for analysis and visualization, despite the file's initial irregularities.

My recommendation is to standardize the measurement output format.

Parse File

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd

# Initialize variables
timestamps = []
values1 = []
values2 = []
values3 = []
current_date = None

# Implement parsing logic to accurately handle the lines with and without dates
# 00_test.txt is the data from the OP in a text file
with open('00_test.txt', "r") as file:
    for line in file:
        line = line.strip()

        if not line:
            continue  # Skip non-data lines if they exist
        
        parts = line.split(',')
        if len(parts) == 4 and parts[-1].count('-') == 2:  # Checking if the last part is a date
            # Extract date from the last part
            time, val1, val2, val3, date = parts[0], parts[1], parts[2], parts[3].split(' ')[0], parts[3].split(' ')[1]
            current_date = pd.to_datetime(date, format="%d-%m-%Y")
        else:
            # Process data lines without a date
            time, val1, val2, val3 = parts[0], parts[1], parts[2], parts[3]
            if current_date:  # Ensure a date has been set
                datetime_str = f"{current_date.date()} {time}"
                datetime_obj = pd.to_datetime(datetime_str, format="%Y-%m-%d %H:%M:%S")
                timestamps.append(datetime_obj)
                values1.append(float(val1))
                values2.append(float(val2))
                values3.append(float(val3))

Create DataFrame

# Ensure the DataFrame is created outside the loop
df = pd.DataFrame({'DateTime': timestamps, 'Value1': values1, 'Value2': values2, 'Value3': values3})
df.set_index('DateTime', inplace=True)

Plot

The plot displays a DataFrame with markers for each data point, set to a specific size and labeled axes. Major ticks on the x-axis show dates in 'Y-m-d' format, with minor ticks indicating times every 4 hours within a specified range. Major tick labels are rotated 90 degrees and centered, while minor tick labels remain horizontal and centered. The plot features grid lines for both major and minor intervals, styled differently to distinguish days from times. The layout is adjusted for clarity, accommodating rotated labels for better visibility.

Numerous questions already address plotting with pandas DataFrames and formatting the datetime x-axis of a pandas DataFrame. I encourage you to explore these resources and adjust the plot according to your requirements. For further plotting inquiries or specific adjustments, please consider posting a new question with a reference to the existing discussions.

# Plot the DataFrame directly
ax = df.plot(marker='.', figsize=(15, 9), xlabel='Time', ylabel='Voltage')

# Setting the major ticks to display the date in 'Y-m-d' format
ax.xaxis.set_major_locator(mdates.DayLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))

# Setting the minor ticks to display the time
ax.xaxis.set_minor_locator(mdates.HourLocator(byhour=range(4, 21, 4)))  # Adjust the interval as needed
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%H:%M'))

# Enhance the display for readability
plt.setp(ax.xaxis.get_majorticklabels(), rotation=90, ha="center")  # Rotate major ticks for better visibility
plt.setp(ax.xaxis.get_minorticklabels(), rotation=0, ha="center")  # Rotate and right-align minor ticks

ax.xaxis.grid(True, which='major', linestyle='-', linewidth='0.5', color='black')  # Major grid lines
ax.xaxis.grid(True, which='minor', linestyle=':', linewidth='0.5', color='gray')  # Minor grid lines

plt.tight_layout()  # Adjust layout to make room for tick labels
plt.show()

enter image description here


df

                     Value1  Value2  Value3
DateTime                                   
2024-03-20 00:02:16   111.0   110.0   110.0
2024-03-20 00:02:33   108.0   109.0   109.0
2024-03-20 00:02:49   107.0   108.0   108.0
2024-03-20 23:58:54   111.0   112.0   112.0
2024-03-20 23:59:11   109.0   110.0   110.0
2024-03-20 23:59:47   115.0   116.0   117.0
2024-03-21 00:00:20   121.0   122.0   120.0
2024-03-21 00:00:36   124.0   125.0   125.0
2024-03-21 23:59:02   115.0   115.0   116.0
2024-03-21 23:59:19   114.0   114.0   114.0
2024-03-21 23:59:51   113.0   114.0   115.0
2024-03-22 00:00:24   116.0   117.0   115.0
2024-03-22 00:00:45   115.0   115.0   116.0
2024-03-22 23:59:08   101.0   101.0   100.0
2024-03-22 23:59:32   103.0   103.0   102.0
2024-03-22 23:59:48   102.0   102.0   102.0