I'd like to parse a string to an object representing a timestamp, but I'm having some trouble adding the right timezone to it.
Specifically, when parsing, I can't find a way to make a distinction between strings that have a timezone-offset added, and those that don't.
Use case
I'm reading in several xml-files. In those files, several timestamps are present, which can be either UTC, in my local timezone, or some third timezone.
Or they may not have any timezone-information as part of the string, in which case we should fall back to a default-timezone, specified elsewhere in the xml. That default can, again, be either UTC, my local timezone, or a third zone.
Ultimately, I want to translate all of these timestamps to UTC.
So, I may have the following data (on a PC with system-timezone Europe/Amsterdam, currently UTC+2):
File_one.xml
<data>
<timestamp eventName="First">2023-5-1T12:01:00Z</timestamp>
<timestamp eventName="Second">2023-5-1T12:02:00+02:00</timestamp>
<timestamp eventName="Third">2023-5-1T12:03:00+04:00</timestamp>
<timestamp eventName="Fourth">2023-5-1T12:04:00</timestamp>
</data>
<configuration>
<timezone>Europe/Amsterdam</timezone>
</configuration>
And file_two.xml
<data>
<timestamp eventName="Fifth">2023-5-1T12:05:00Z</timestamp>
<timestamp eventName="Sixth">2023-5-1T12:06:00+02:00</timestamp>
<timestamp eventName="Seventh">2023-5-1T12:07:00+04:00</timestamp>
<timestamp eventName="Eighth">2023-5-1T12:08:00</timestamp>
</data>
<configuration>
<timezone>America/New_York</timezone>
</configuration>
Which, after all parsing, should result in the following timestamps in UTC:
First 2023-05-01T12:01:00
Second 2023-05-01T10:02:00
Third 2023-05-01T08:03:00
Fourth 2023-05-01T10:04:00
Fifth 2023-05-01T12:05:00
Sixth 2023-05-01T10:06:00
Seventh 2023-05-01T08:07:00
Eighth 2023-05-01T16:08:00
My main problem, what this question is about, is distinguishing Eighth from Sixth.
Approaches that don't work.
I've tried using both DateTime and DateTimeOffset, with their Parse/TryParse-methods. And both seem to assume my local timezone when parsing the values without a timezone. Adding a timezone later on for values with a local timezone isn't going to work either, because that would mean messing up those values that actually are in my local timezone, i.e. Second and Sixth.
Another approach I tried was using TryParseExact to first parse those values with or without timezone-information, but unfortunately my actual timestamps aren't as tidy as the example here, and I'm not sure about all the exact formats that I can expect. What I would like is just distinguishing between an offset and no offset at all.
(I may have mixed up the words timezone and time-offset. In this context, the difference doesn't really matter)
I ran into a similar situation so i'll talk about what i did in that case in the hopes that it will send you on the right way. If its completely off, let me know in a comment and i'll remove this.
I'm assuming you have no control over how the files are generated, and you can't just fix the documents to use a single format including timezone.
I have a couple tabular data files in various formats (excel/csv/tsv/json and more) which all use various formats for date time. I wrote the the following helper method which checks against a bunch of formats for various cases from most to least-specific.
Speed here was not important for me. I can imagine you can adopt the same approach, passing different formats, timezones and cultures so that your cases are covered. (it's not pretty i know).
Another option i'd consider exploring is using the library NodaTime, which was created for dealing with these types of situations.
See for ex. this doc page:
https://nodatime.org/2.2.x/userguide/type-choices
Nodatime also inclused a timezone db which can work with values like 'Europe/Amsterdam' (which .nets TimeZoneInfo does not use. See for example this snippet that converts a php datetime type to a .net DateTime;