rifkiamil / Trino_JSON_Processing_chatgpt.md

Created May 5, 2025 15:07

Trino JSON Processing in Data Lakes: Engine Internals, Comparisons, and SIMD Aspects

Engine Internals and JSON Processing Enhancements

Trino’s JSON architecture: Trino (formerly PrestoSQL) is a distributed MPP query engine where workers scan data in parallel and pipeline results in memory. JSON in a data lake (e.g. files on S3 or HDFS) is typically handled via the Hive connector, which treats JSON files as line-oriented text. Each JSON object (or array) is expected to be a record – often one JSON per line (NDJSON). Trino splits large JSON files into segments for parallel reading, aligning splits on record boundaries (usually newline delimited) so that no JSON object is cut in half between workers. This ensures each split contains whole JSON records for valid parsing. Internally, Trino uses a LinePageSource to read text files and find record boundaries (e.g. newline positions) so that each worker thread reads a chunk of the file and emits complete JSON rows. For extremely large JSON objects th

rifkiamil / Trino_JSON_Processing_gemini.md

Created May 5, 2025 15:13

Trino's Efficiency in Processing JSON Files in Data Lakes: A Technical Deep Dive

Introduction

Trino, the distributed SQL query engine formerly known as PrestoSQL, is engineered for high-performance, interactive analytics across a multitude of heterogeneous data sources.1 Its architecture is particularly well-suited for querying large datasets residing in data lakes, whether deployed on-premises using HDFS or in the cloud on object storage systems like Amazon S3, Google Cloud Storage, or Azure Blob Storage.2 A key capability enabling this is Trino's schema-on-read approach, allowing users to query data in various formats directly where it resides, without requiring upfront transformation and loading into a proprietary storage system.3

JSON (JavaScript Object Notation) has become ubiquitous in modern data ecosystems, frequently used for API payloads, application logs, configuration files, and semi-structured data exchange.7 However, querying JSON efficiently at scale presents significant challen

rifkiamil / Looker_Studio_Community_Connector_for_Speckle.md

Created May 11, 2025 19:06

Building a Looker Studio Community Connector for Speckle API

Let's look at the Speckle APIs and 3 methods getting data into Looker Studio.

Speckle Service APIs: GraphQL vs. REST

Speckle provides both GraphQL and REST APIs to interact with your data on a Speckle Server. They serve similar purposes (accessing streams, commits, objects, etc.) but differ in how you structure your requests and receive data.

Rif rifkiamil

Trino JSON Processing in Data Lakes: Engine Internals, Comparisons, and SIMD Aspects

Engine Internals and JSON Processing Enhancements

Trino's Efficiency in Processing JSON Files in Data Lakes: A Technical Deep Dive

Introduction