Skip to content

Instantly share code, notes, and snippets.

@okumin
Last active January 9, 2025 16:30
Show Gist options
  • Save okumin/b33876398a106068b4fd7ddd3049ae8c to your computer and use it in GitHub Desktop.
Save okumin/b33876398a106068b4fd7ddd3049ae8c to your computer and use it in GitHub Desktop.

Hive's TIMESTAMP

Types

INT96

Impala started using INT96 as timestamp, and Hive and Spark also followed Impala for compabibility. This is the discussion in ML, and PARQUET-323 is a related ticket. As for timezone, it has a bit more complicated context.

TIMESTAMP logical type over INT64

TIMESTAMP is a new TIMESTAMP format defined in Parquet.

Questions

Offset of INT96 timestamp

  • Hive: Unix timestamp(UTC's timestamp)
  • Other tools or specific Hive: Various

Why Hive uses INT96 on write?

That's because hive.parquet.write.int64.timestamp is disabled by default.

Should Iceberg tables allow INT96 as timestamp?

Yes, only for compatibility.

How does Iceberg decode INT96 in Iceberg tables?

The INT96 is decoded as a unix timestamp, and it is converted into OffsetDateTime with UTC.

Should Iceberg decode INT96 as OffsetDateTime and Hive translates it into LocalDateTime?

I am not sure why Iceberg doesn't decode it as LocalDateTime. The original TimestampInt96Reader returned LocalDateTime but this PR changed it into OffsetDateTime.

https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit?tab=t.0#heading=h.n699ftkvhjlo

Should we always decode INT96 as it is?

Maybe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment