Skip to content

Instantly share code, notes, and snippets.

View venkyvb's full-sized avatar

Venki Balakrishnan venkyvb

  • @linkedin.com
  • 17:46 (UTC -12:00)
View GitHub Profile
@venkyvb
venkyvb / README.md
Created January 13, 2023 16:24 — forked from PawaritL/README.md
Parse nested JSON into your ideal, customizable Spark schema (StructType)

Is Spark's JSON schema inference too inflexible for your liking?

Common Scenarios:

  • Automatic schema inference from Spark is not applying your desired type casting
  • You want to completely drop irrelevant fields when parsing
  • You want to avoid some highly nested fields simply by casting some outer fields as strings

Step 1: Provide your (ideal) JSON data example

REFERENCE_EXAMPLE = {
  "firstName": "Will",
@venkyvb
venkyvb / sparkJsonSchema
Created February 25, 2023 20:16
FHIRTransactionSchema
root
|-- abatementAge: struct (nullable = true)
| |-- code: string (nullable = true)
| |-- system: string (nullable = true)
| |-- unit: string (nullable = true)
| |-- value: long (nullable = true)
|-- abatementDateTime: string (nullable = true)
|-- abatementString: string (nullable = true)
|-- address: array (nullable = true)
| |-- element: struct (containsNull = true)