Skip to content

Instantly share code, notes, and snippets.

@Ironholds
Created October 20, 2014 21:18
Show Gist options
  • Select an option

  • Save Ironholds/c6ddcd7b377844e66803 to your computer and use it in GitHub Desktop.

Select an option

Save Ironholds/c6ddcd7b377844e66803 to your computer and use it in GitHub Desktop.
with_tz and data.table
#For our test object, we have x, a data.table
timestamp tz order
1: 20140907142430 Europe/Warsaw 1
2: 20141005162245 Europe/Paris 2
3: 20140908142859 Europe/Rome 3
4: 20140903093741 Asia/Yerevan 4
5: 20140805102142 America/Sao_Paulo 5
6: 20140905142222 Europe/Brussels 6
7: 20140819225906 Europe/Rome 7
8: 20140811190410 Europe/London 8
9: 20140815171858 Europe/Moscow 9
10: 20140723095634 Europe/Berlin 10
11: 20140729061649 Asia/Phnom_Penh 11
12: 20140726132147 Europe/Moscow 12
13: 20140817230428 Europe/Paris 13
14: 20140913020824 America/Denver 14
15: 20140828202308 Europe/Paris 15
16: 20140919102124 Europe/Warsaw 16
17: 20140915154949 Asia/Calcutta 17
18: 20140919213634 Europe/Amsterdam 18
19: 20140828082859 America/New_York 19
20: 20141005142320 America/New_York 20
#Okay, so let's use with_tz over those timestamp/tz combos. Of course, it doesn't work with multiple tzs,
#So we'll have to use j= and by="tz" - classic data.table use case, really.
x <- x[, j = {
+
+ #Copy object
+ copy.dt <- copy(.SD)
+
+ #Convert timestamps
+ timestamps <- strptime(x, format = "%Y%m%d%H%M%S", tz = "UTC")
+
+ #Extract and include the original, and also the modified one, both as character strings for safety
+ copy.dt$original_timestamp <- as.character(timestamps)
+ copy.dt$new_timestamp <- as.character(with_tz(timestamps,tzone = copy.dt$tz[1]))
+
+ #Return
+ copy.dt <- copy.dt[,"timestamp" :=NULL]
+ copy.dt
+
+ }, by = "tz"]
#Reorder so the two sets are comparable.
>x <- x[order(x$order),]
> x
tz order original_timestamp new_timestamp
1: Europe/Warsaw 1 2014-09-07 14:24:30 2014-09-07 14:24:30
2: Europe/Paris 2 2014-10-05 16:22:45 2014-10-05 16:22:45
3: Europe/Rome 3 2014-09-08 14:28:59 2014-09-08 14:28:59
4: Asia/Yerevan 4 2014-09-03 09:37:41 2014-09-03 09:37:41
5: America/Sao_Paulo 5 2014-08-05 10:21:42 2014-08-05 10:21:42
6: Europe/Brussels 6 2014-09-05 14:22:22 2014-09-05 14:22:22
7: Europe/Rome 7 2014-08-19 22:59:06 2014-08-19 22:59:06
8: Europe/London 8 2014-08-11 19:04:10 2014-08-11 19:04:10
9: Europe/Moscow 9 2014-08-15 17:18:58 2014-08-15 17:18:58
10: Europe/Berlin 10 2014-07-23 09:56:34 2014-07-23 09:56:34
11: Asia/Phnom_Penh 11 2014-07-29 06:16:49 2014-07-29 06:16:49
12: Europe/Moscow 12 2014-07-26 13:21:47 2014-07-26 13:21:47
13: Europe/Paris 13 2014-08-17 23:04:28 2014-08-17 23:04:28
14: America/Denver 14 2014-09-13 02:08:24 2014-09-13 02:08:24
15: Europe/Paris 15 2014-08-28 20:23:08 2014-08-28 20:23:08
16: Europe/Warsaw 16 2014-09-19 10:21:24 2014-09-19 10:21:24
17: Asia/Calcutta 17 2014-09-15 15:49:49 2014-09-15 15:49:49
18: Europe/Amsterdam 18 2014-09-19 21:36:34 2014-09-19 21:36:34
19: America/New_York 19 2014-08-28 08:28:59 2014-08-28 08:28:59
20: America/New_York 20 2014-10-05 14:23:20 2014-10-05 14:23:20
#...at which point we discover that no modification took place at all.
@arunsrinivasan

Copy link
Copy Markdown

I think the line

timestamps <- strptime(x, format = "%Y%m%d%H%M%S", tz = "UTC")

should be:

timestamps <- strptime(timestamp, format = "%Y%m%d%H%M%S", tz = "UTC")

Here's what I did:

require(data.table) ## 1.9.5
require(lubridate) ## 1.3.3

x = setDT(read.table(text="timestamp                tz order
 1: 20140907142430     Europe/Warsaw     1
 2: 20141005162245      Europe/Paris     2
 3: 20140908142859       Europe/Rome     3
 4: 20140903093741      Asia/Yerevan     4
 5: 20140805102142 America/Sao_Paulo     5
 6: 20140905142222   Europe/Brussels     6
 7: 20140819225906       Europe/Rome     7
 8: 20140811190410     Europe/London     8
 9: 20140815171858     Europe/Moscow     9
10: 20140723095634     Europe/Berlin    10
11: 20140729061649   Asia/Phnom_Penh    11
12: 20140726132147     Europe/Moscow    12
13: 20140817230428      Europe/Paris    13
14: 20140913020824    America/Denver    14
15: 20140828202308      Europe/Paris    15
16: 20140919102124     Europe/Warsaw    16
17: 20140915154949     Asia/Calcutta    17
18: 20140919213634  Europe/Amsterdam    18
19: 20140828082859  America/New_York    19
20: 20141005142320  America/New_York    20
", header=TRUE, stringsAsFactors=FALSE))

x[, j = {    
    #Copy object
    copy.dt <- copy(.SD)

    #Convert timestamps
    timestamps <- strptime(timestamp, format = "%Y%m%d%H%M%S", tz = "UTC")

    #Extract and include the original, and also the modified one, both as character strings for safety
    copy.dt$original_timestamp <- as.character(timestamps)
    copy.dt$new_timestamp <- as.character(with_tz(timestamps,tzone = copy.dt$tz[1]))

    #Return
    copy.dt <- copy.dt[,"timestamp" :=NULL]
    copy.dt    
}, by = "tz"]

And I get this:

#                    tz order  original_timestamp       new_timestamp
#  1:     Europe/Warsaw     1 2014-09-07 14:24:30 2014-09-07 16:24:30
#  2:     Europe/Warsaw    16 2014-09-19 10:21:24 2014-09-19 12:21:24
#  3:      Europe/Paris     2 2014-10-05 16:22:45 2014-10-05 18:22:45
#  4:      Europe/Paris    13 2014-08-17 23:04:28 2014-08-18 01:04:28
#  5:      Europe/Paris    15 2014-08-28 20:23:08 2014-08-28 22:23:08
#  6:       Europe/Rome     3 2014-09-08 14:28:59 2014-09-08 16:28:59
#  7:       Europe/Rome     7 2014-08-19 22:59:06 2014-08-20 00:59:06
#  8:      Asia/Yerevan     4 2014-09-03 09:37:41 2014-09-03 11:37:41
#  9: America/Sao_Paulo     5 2014-08-05 10:21:42 2014-08-05 12:21:42
#10:   Europe/Brussels     6 2014-09-05 14:22:22 2014-09-05 16:22:22
#11:     Europe/London     8 2014-08-11 19:04:10 2014-08-11 21:04:10
#12:     Europe/Moscow     9 2014-08-15 17:18:58 2014-08-15 19:18:58
#13:     Europe/Moscow    12 2014-07-26 13:21:47 2014-07-26 15:21:47
#14:     Europe/Berlin    10 2014-07-23 09:56:34 2014-07-23 11:56:34
#15:   Asia/Phnom_Penh    11 2014-07-29 06:16:49 2014-07-29 08:16:49
#16:    America/Denver    14 2014-09-13 02:08:24 2014-09-13 04:08:24
#17:     Asia/Calcutta    17 2014-09-15 15:49:49 2014-09-15 17:49:49
#18:  Europe/Amsterdam    18 2014-09-19 21:36:34 2014-09-19 23:36:34
#19:  America/New_York    19 2014-08-28 08:28:59 2014-08-28 10:28:59
#20:  America/New_York    20 2014-10-05 14:23:20 2014-10-05 16:23:20

What am I missing?

@Ironholds

Copy link
Copy Markdown
Author

Bah; typo in my code! Not the intended error. What I wanted to display was; look at the time offsets and how they vary by tz. The crucial element is that they actually don't seem to.

@arunsrinivasan

Copy link
Copy Markdown

In your code, I see that the last two columns are identical. But in mine, IIUC, they work as they are intended to. If we both agree to that, then:

  1. what's the data.table version and lubridate version you have?
  2. What's the R-version, your OS, 32/64-bit?
    Maybe ideal is to post your sessionInfo().

@Ironholds

Copy link
Copy Markdown
Author

No, they don't work as intended in yours.

The code should be producing different offsets in each new_timestamp for each timezone, since timezone offsets vary. Instead, we see the same 2-hour offset each time, presumably becaise the first timezone was Europe/Warsaw, which is UTC+2. Take a look at rows 8 and 19 for a prominent demonstration.

@Ironholds

Copy link
Copy Markdown
Author

Session info:

R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] lubridate_1.3.3

loaded via a namespace (and not attached):
[1] digest_0.6.4 memoise_0.2.1 plyr_1.8.1 Rcpp_0.11.3 stringr_0.6.2
[6] tools_3.0.2

I'm wondering if this could be memoise cacheing the results, and it being cached throughout the calls because of the expression's scope.

@Ironholds

Copy link
Copy Markdown
Author

Reported here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment