Basically I want a timestring parsing function whose output behaves like the result from
x = as.POSIXct("2001-01-01 01:00:00 -0600", tz="America/Chicago")
y = as.POSIXct("2001-01-01 07:00:00Z", tz="UTC")
z = as.POSIXct("2001-01-01 13:00:00 +0600", tz="Asia/Omsk")
...but without requiring me to manually set tz.
By "behaves like" I mean times with equal UTC representations should compare equal
(x == y
==> TRUE, z == y
==> TRUE) but their original timezone info should still be available as needed
(attr(x, "tzone") == attr(y, "tzone")
==> FALSE, strftime(x, format="%H", tz="America/Chicago")
==> "01",
strftime(x, format="%H", tz="UTC")
==> "07").
As far as I can tell, all three of base::strptime
, lubridate::parse_date_time
, and anytime::anytime
do parse %z correctly, but their options for using it are limited to "discard %z entirely"
( f("12:00:00-0600", tz="UTC")
==> 12:00:00Z) or "use %z to adjust time to $TZ before returning"
(f("12:00:00-0600", tz="UTC")
==> 18:00:00Z).
I want f("12:00:00-0600", tz="as_parsed")
==> 12:00:00-0600.
I'm writing functions that collect weather data from many different sources in many timezones. Most of the decisions to be made with the timestamps are explictly about local time, i.e. finding solar noon or sunset, so converting to UTC (or any other fixed time zone) makes life harder rather than easier, and discarding time zone info seems certain to create bugs eventually -- I do sometimes need to compare times between sources.
Essentially, R seems to support two major approaches to timezone usage: "treat everything as UTC", or "treat everything as my machine's local time." I wish there were a third option: "treat everything as the datasource's local time, which is already recorded in the timestamp," and this whole post is just to say I'm mad because I can't find any way to support that third way without manually re-parsing strings.
On consideration, here's a reasonably simple way of extracting the offset without completely reinventing the timezone parser:
This obviously imposes very strict input format requirements, but arguably captures most of the formats whose timezone information can be trusted in the first place.