Skip to content

Instantly share code, notes, and snippets.

@infotroph
Last active April 27, 2017 19:31
Show Gist options
  • Save infotroph/b6600cd58889a4116082da8143d873a9 to your computer and use it in GitHub Desktop.
Save infotroph/b6600cd58889a4116082da8143d873a9 to your computer and use it in GitHub Desktop.
The R datetime behavior I want (that probably doesn't exist)

What I want

Basically I want a timestring parsing function whose output behaves like the result from

x = as.POSIXct("2001-01-01 01:00:00 -0600", tz="America/Chicago")
y = as.POSIXct("2001-01-01 07:00:00Z", tz="UTC")
z = as.POSIXct("2001-01-01 13:00:00 +0600", tz="Asia/Omsk")

...but without requiring me to manually set tz. By "behaves like" I mean times with equal UTC representations should compare equal (x == y ==> TRUE, z == y ==> TRUE) but their original timezone info should still be available as needed (attr(x, "tzone") == attr(y, "tzone") ==> FALSE, strftime(x, format="%H", tz="America/Chicago") ==> "01", strftime(x, format="%H", tz="UTC") ==> "07").

As far as I can tell, all three of base::strptime, lubridate::parse_date_time, and anytime::anytime do parse %z correctly, but their options for using it are limited to "discard %z entirely" ( f("12:00:00-0600", tz="UTC") ==> 12:00:00Z) or "use %z to adjust time to $TZ before returning" (f("12:00:00-0600", tz="UTC") ==> 18:00:00Z). I want f("12:00:00-0600", tz="as_parsed") ==> 12:00:00-0600.

Why I want it

I'm writing functions that collect weather data from many different sources in many timezones. Most of the decisions to be made with the timestamps are explictly about local time, i.e. finding solar noon or sunset, so converting to UTC (or any other fixed time zone) makes life harder rather than easier, and discarding time zone info seems certain to create bugs eventually -- I do sometimes need to compare times between sources.

Essentially, R seems to support two major approaches to timezone usage: "treat everything as UTC", or "treat everything as my machine's local time." I wish there were a third option: "treat everything as the datasource's local time, which is already recorded in the timestamp," and this whole post is just to say I'm mad because I can't find any way to support that third way without manually re-parsing strings.

@infotroph
Copy link
Author

On consideration, here's a reasonably simple way of extracting the offset without completely reinventing the timezone parser:

times = c("2001-01-01 01:00:00 -0600", "2001-01-01 01:00:00Z", "2001-01-01 01:00:00 +0600")
difftime(
    lubridate::parse_date_time(sub(" ?([-+]\\d{4}|Z)$", "", times), "ymdHMS"),
    lubridate::parse_date_time(times, "ymdHMSz"),
    units="hours")
# Time differences in hours
# [1] -6  0  6

This obviously imposes very strict input format requirements, but arguably captures most of the formats whose timezone information can be trusted in the first place.

@ashiklom
Copy link

You should post this as an issue in lubridate. Somebody might pick it up. FWIW, it doesn't look like Unix's date handles this particularly well either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment