Impala started using INT96 as timestamp, and Hive and Spark also followed Impala for compabibility. This is the discussion in ML, and PARQUET-323 is a related ticket. As for timezone, it has a bit more complicated context.
TIMESTAMP is a new TIMESTAMP format defined in Parquet.
- Iceberg's timestamp w/ timezone(isAdjustedToUTC=true, unit=MICROS)
- Epoch micro seconds of OffsetDateTime
- Iceberg's timestamp w/o timezone(isAdjustedToUTC=false, unit=MICROS)
- Epoch micro seconds of
localDateTime.atOffset(ZoneOffset.UTC)
- Epoch micro seconds of
- Hive: Unix timestamp(UTC's timestamp)
- Other tools or specific Hive: Various
That's because hive.parquet.write.int64.timestamp is disabled by default.
The INT96 is decoded as a unix timestamp, and it is converted into OffsetDateTime with UTC.
I am not sure why Iceberg doesn't decode it as LocalDateTime. The original TimestampInt96Reader returned LocalDateTime but this PR changed it into OffsetDateTime.
Maybe.