Skip to content

Instantly share code, notes, and snippets.

@paulnicholsen27
Created September 10, 2025 14:06
Show Gist options
  • Select an option

  • Save paulnicholsen27/f26bbcc7521f2e9110cbd28c011a4f12 to your computer and use it in GitHub Desktop.

Select an option

Save paulnicholsen27/f26bbcc7521f2e9110cbd28c011a4f12 to your computer and use it in GitHub Desktop.
25/09/10 10:02:45 INFO FileSourceStrategy: Post-Scan Filters:
25/09/10 10:02:45 INFO ShufflePartitionsUtil: For shuffle(6), advisory target size: 67108864, actual target size 1048576, minimum partition size: 1048576
25/09/10 10:02:45 INFO SparkContext: Starting job: $anonfun$withThreadLocalCaptured$1 at FutureTask.java:264
25/09/10 10:02:45 INFO DAGScheduler: Got job 16 ($anonfun$withThreadLocalCaptured$1 at FutureTask.java:264) with 1 output partitions
25/09/10 10:02:45 INFO DAGScheduler: Final stage: ResultStage 26 ($anonfun$withThreadLocalCaptured$1 at FutureTask.java:264)
25/09/10 10:02:45 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 25)
25/09/10 10:02:45 INFO DAGScheduler: Missing parents: List()
25/09/10 10:02:45 INFO DAGScheduler: Submitting ResultStage 26 (MapPartitionsRDD[65] at $anonfun$withThreadLocalCaptured$1 at FutureTask.java:264), which has no missing parents
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_29 stored as values in memory (estimated size 113.4 KiB, free 426.6 MiB)
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_29_piece0 stored as bytes in memory (estimated size 40.8 KiB, free 426.6 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Added broadcast_29_piece0 in memory on ncias-d3613-v.nci.nih.gov:43165 (size: 40.8 KiB, free: 433.8 MiB)
25/09/10 10:02:45 INFO SparkContext: Created broadcast 29 from broadcast at DAGScheduler.scala:1611
25/09/10 10:02:45 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 26 (MapPartitionsRDD[65] at $anonfun$withThreadLocalCaptured$1 at FutureTask.java:264) (first 15 tasks are for partitions Vector(0))
25/09/10 10:02:45 INFO TaskSchedulerImpl: Adding task set 26.0 with 1 tasks resource profile 0
25/09/10 10:02:45 INFO TaskSetManager: Starting task 0.0 in stage 26.0 (TID 17) (ncias-d3613-v.nci.nih.gov, executor driver, partition 0, NODE_LOCAL, 9265 bytes)
25/09/10 10:02:45 INFO Executor: Running task 0.0 in stage 26.0 (TID 17)
25/09/10 10:02:45 INFO ShuffleBlockFetcherIterator: Getting 1 (12.3 KiB) non-empty blocks including 1 (12.3 KiB) local and 0 (0.0 B) host-local and 0 (0.0 B) push-merged-local and 0 (0.0 B) remote blocks
25/09/10 10:02:45 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
25/09/10 10:02:45 INFO CodeGenerator: Code generated in 5.752978 ms
25/09/10 10:02:45 INFO CodeGenerator: Code generated in 9.67193 ms
25/09/10 10:02:45 INFO Executor: Finished task 0.0 in stage 26.0 (TID 17). 27006 bytes result sent to driver
25/09/10 10:02:45 INFO TaskSetManager: Finished task 0.0 in stage 26.0 (TID 17) in 59 ms on ncias-d3613-v.nci.nih.gov (executor driver) (1/1)
25/09/10 10:02:45 INFO TaskSchedulerImpl: Removed TaskSet 26.0, whose tasks have all completed, from pool
25/09/10 10:02:45 INFO DAGScheduler: ResultStage 26 ($anonfun$withThreadLocalCaptured$1 at FutureTask.java:264) finished in 0.076 s
25/09/10 10:02:45 INFO DAGScheduler: Job 16 is finished. Cancelling potential speculative or zombie tasks for this job
25/09/10 10:02:45 INFO TaskSchedulerImpl: Killing all running tasks in stage 26: Stage finished
25/09/10 10:02:45 INFO DAGScheduler: Job 16 finished: $anonfun$withThreadLocalCaptured$1 at FutureTask.java:264, took 0.079833 s
25/09/10 10:02:45 INFO CodeGenerator: Code generated in 7.817072 ms
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_30 stored as values in memory (estimated size 1026.0 KiB, free 425.6 MiB)
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_30_piece0 stored as bytes in memory (estimated size 5.8 KiB, free 425.6 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Added broadcast_30_piece0 in memory on ncias-d3613-v.nci.nih.gov:43165 (size: 5.8 KiB, free: 433.8 MiB)
25/09/10 10:02:45 INFO SparkContext: Created broadcast 30 from $anonfun$withThreadLocalCaptured$1 at FutureTask.java:264
25/09/10 10:02:45 INFO FileSourceStrategy: Pushed Filters:
25/09/10 10:02:45 INFO FileSourceStrategy: Post-Scan Filters:
25/09/10 10:02:45 INFO FileSourceStrategy: Pushed Filters:
25/09/10 10:02:45 INFO FileSourceStrategy: Post-Scan Filters:
25/09/10 10:02:45 INFO ParquetUtils: Using default output committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
25/09/10 10:02:45 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
25/09/10 10:02:45 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
25/09/10 10:02:45 INFO SQLHadoopMapReduceCommitProtocol: Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
25/09/10 10:02:45 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
25/09/10 10:02:45 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
25/09/10 10:02:45 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
25/09/10 10:02:45 INFO CodeGenerator: Code generated in 23.576945 ms
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_31 stored as values in memory (estimated size 203.2 KiB, free 425.4 MiB)
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_31_piece0 stored as bytes in memory (estimated size 35.7 KiB, free 425.3 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Added broadcast_31_piece0 in memory on ncias-d3613-v.nci.nih.gov:43165 (size: 35.7 KiB, free: 433.8 MiB)
25/09/10 10:02:45 INFO SparkContext: Created broadcast 31 from parquet at ekg_export.java:1285
25/09/10 10:02:45 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
25/09/10 10:02:45 INFO CodeGenerator: Code generated in 29.606664 ms
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_32 stored as values in memory (estimated size 203.6 KiB, free 425.1 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_23_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 22.6 KiB, free: 433.8 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_28_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 54.2 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_32_piece0 stored as bytes in memory (estimated size 35.8 KiB, free 425.4 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Added broadcast_32_piece0 in memory on ncias-d3613-v.nci.nih.gov:43165 (size: 35.8 KiB, free: 433.8 MiB)
25/09/10 10:02:45 INFO SparkContext: Created broadcast 32 from parquet at ekg_export.java:1285
25/09/10 10:02:45 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4194304 bytes, open cost is considered as scanning 4194304 bytes.
25/09/10 10:02:45 INFO SparkContext: Starting job: parquet at ekg_export.java:1285
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_9_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 8.5 KiB, free: 433.8 MiB)
25/09/10 10:02:45 INFO DAGScheduler: Got job 17 (parquet at ekg_export.java:1285) with 2 output partitions
25/09/10 10:02:45 INFO DAGScheduler: Final stage: ResultStage 27 (parquet at ekg_export.java:1285)
25/09/10 10:02:45 INFO DAGScheduler: Parents of final stage: List()
25/09/10 10:02:45 INFO DAGScheduler: Missing parents: List()
25/09/10 10:02:45 INFO DAGScheduler: Submitting ResultStage 27 (MapPartitionsRDD[73] at parquet at ekg_export.java:1285), which has no missing parents
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_6_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 7.0 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_20_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 22.0 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_19_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 22.6 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_7_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 15.0 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_33 stored as values in memory (estimated size 254.4 KiB, free 425.5 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_29_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 40.8 KiB, free: 434.0 MiB)
25/09/10 10:02:45 INFO MemoryStore: Block broadcast_33_piece0 stored as bytes in memory (estimated size 86.8 KiB, free 425.5 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Added broadcast_33_piece0 in memory on ncias-d3613-v.nci.nih.gov:43165 (size: 86.8 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_26_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 24.7 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO SparkContext: Created broadcast 33 from broadcast at DAGScheduler.scala:1611
25/09/10 10:02:45 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 27 (MapPartitionsRDD[73] at parquet at ekg_export.java:1285) (first 15 tasks are for partitions Vector(0, 1))
25/09/10 10:02:45 INFO TaskSchedulerImpl: Adding task set 27.0 with 2 tasks resource profile 0
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_11_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 8.5 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO TaskSetManager: Starting task 0.0 in stage 27.0 (TID 18) (ncias-d3613-v.nci.nih.gov, executor driver, partition 0, PROCESS_LOCAL, 10115 bytes)
25/09/10 10:02:45 INFO TaskSetManager: Starting task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov, executor driver, partition 1, PROCESS_LOCAL, 10064 bytes)
25/09/10 10:02:45 INFO Executor: Running task 0.0 in stage 27.0 (TID 18)
25/09/10 10:02:45 INFO Executor: Running task 1.0 in stage 27.0 (TID 19)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_14_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 8.3 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_15_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 16.9 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO BlockManagerInfo: Removed broadcast_25_piece0 on ncias-d3613-v.nci.nih.gov:43165 in memory (size: 25.3 KiB, free: 433.9 MiB)
25/09/10 10:02:45 INFO CodeGenerator: Code generated in 21.971419 ms
25/09/10 10:02:45 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
25/09/10 10:02:45 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
25/09/10 10:02:45 INFO SQLHadoopMapReduceCommitProtocol: Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
25/09/10 10:02:45 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
25/09/10 10:02:45 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
25/09/10 10:02:45 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
25/09/10 10:02:45 INFO CodeGenerator: Code generated in 26.947958 ms
25/09/10 10:02:45 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
25/09/10 10:02:45 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
25/09/10 10:02:45 INFO SQLHadoopMapReduceCommitProtocol: Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
25/09/10 10:02:45 INFO FileOutputCommitter: File Output Committer Algorithm version is 1
25/09/10 10:02:45 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
25/09/10 10:02:45 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
25/09/10 10:02:45 INFO FileScanRDD: Reading File path: file:///data/users/nicholsenpm/airflow_extractions/BTRIS_for_Databricks_09082025_140444/input/ekg_notional/ekg-notional.parquet, range: 0-33734, partition values: [empty row]
25/09/10 10:02:45 INFO CodecConfig: Compression: GZIP
25/09/10 10:02:45 INFO CodecConfig: Compression: GZIP
25/09/10 10:02:45 INFO CodecPool: Got brand-new decompressor [.zstd]
25/09/10 10:02:45 INFO ParquetOutputFormat: ParquetRecordWriter [block size: 134217728b, row group padding size: 8388608b, validating: false]
25/09/10 10:02:45 ERROR Executor: Exception in task 1.0 in stage 27.0 (TID 19)
java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
25/09/10 10:02:45 INFO ParquetWriteSupport: Initialized Parquet WriteSupport with Catalyst schema:
{
"type" : "struct",
"fields" : [ {
"name" : "date_administered",
"type" : "date",
"nullable" : true,
"metadata" : { }
}, {
"name" : "observation_text",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "btris_category",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "administering_provider",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "cris_orderset_category",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "cris_orderset_subcategory",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "ekg_id",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "sequence",
"type" : "double",
"nullable" : true,
"metadata" : { }
}, {
"name" : "observation_name",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "unit_of_measure",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "timestamp_date_administered",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "allocated_to_protocol",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "other_order_detail",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "priority",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "protocol_number_fbum",
"type" : {
"type" : "array",
"elementType" : "string",
"containsNull" : true
},
"nullable" : true,
"metadata" : { }
}, {
"name" : "mrn",
"type" : "string",
"nullable" : true,
"metadata" : { }
}, {
"name" : "protocol_number",
"type" : {
"type" : "array",
"elementType" : "string",
"containsNull" : true
},
"nullable" : true,
"metadata" : { }
}, {
"name" : "primary_date_time",
"type" : "date",
"nullable" : true,
"metadata" : { }
} ]
}
and corresponding Parquet message type:
message spark_schema {
optional int32 date_administered (DATE);
optional binary observation_text (STRING);
optional binary btris_category (STRING);
optional binary administering_provider (STRING);
optional binary cris_orderset_category (STRING);
optional binary cris_orderset_subcategory (STRING);
optional binary ekg_id (STRING);
optional double sequence;
optional binary observation_name (STRING);
optional binary unit_of_measure (STRING);
optional binary timestamp_date_administered (STRING);
optional binary allocated_to_protocol (STRING);
optional binary other_order_detail (STRING);
optional binary priority (STRING);
optional group protocol_number_fbum (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional binary mrn (STRING);
optional group protocol_number (LIST) {
repeated group list {
optional binary element (STRING);
}
}
optional int32 primary_date_time (DATE);
}
25/09/10 10:02:45 WARN TaskSetManager: Lost task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov executor driver): java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
25/09/10 10:02:45 ERROR TaskSetManager: Task 1 in stage 27.0 failed 1 times; aborting job
25/09/10 10:02:45 INFO TaskSchedulerImpl: Cancelling stage 27
25/09/10 10:02:45 INFO TaskSchedulerImpl: Killing all running tasks in stage 27: Stage cancelled: Job aborted due to stage failure: Task 1 in stage 27.0 failed 1 times, most recent failure: Lost task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov executor driver): java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Driver stacktrace:
25/09/10 10:02:45 INFO Executor: Executor is trying to kill task 0.0 in stage 27.0 (TID 18), reason: Stage cancelled: Job aborted due to stage failure: Task 1 in stage 27.0 failed 1 times, most recent failure: Lost task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov executor driver): java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Driver stacktrace:
25/09/10 10:02:45 INFO TaskSchedulerImpl: Stage 27 was cancelled
25/09/10 10:02:45 INFO DAGScheduler: ResultStage 27 (parquet at ekg_export.java:1285) failed in 0.196 s due to Job aborted due to stage failure: Task 1 in stage 27.0 failed 1 times, most recent failure: Lost task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov executor driver): java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Driver stacktrace:
25/09/10 10:02:45 INFO DAGScheduler: Job 17 failed: parquet at ekg_export.java:1285, took 0.203010 s
25/09/10 10:02:45 ERROR FileFormatWriter: Aborting job b03decbb-2ceb-49e5-b107-bde4d9b57b78.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 27.0 failed 1 times, most recent failure: Lost task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov executor driver): java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2898)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2834)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2833)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2833)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1253)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1253)
at scala.Option.foreach(Option.scala:437)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1253)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3102)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3036)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3025)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:995)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$4(FileFormatWriter.scala:307)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:271)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:392)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:420)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:392)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:869)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:391)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:364)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:243)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:802)
at ekg_export.main(ekg_export.java:1285)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
25/09/10 10:02:45 INFO CodecPool: Got brand-new compressor [.gz]
25/09/10 10:02:45 ERROR Utils: Aborting task
org.apache.spark.TaskKilledException
at org.apache.spark.TaskContextImpl.killTaskIfInterrupted(TaskContextImpl.scala:267)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:130)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage21.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage21.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:91)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:403)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1397)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:410)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
25/09/10 10:02:45 WARN FileOutputCommitter: Could not delete file:/data/users/nicholsenpm/airflow_extractions/BTRIS_for_Databricks_09082025_140444/output/ekg_export/_temporary/0/_temporary/attempt_202509101002456434935337330780011_0027_m_000000_18
25/09/10 10:02:45 ERROR FileFormatWriter: Job job_202509101002456434935337330780011_0027 aborted.
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 27.0 failed 1 times, most recent failure: Lost task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov executor driver): java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2898)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2834)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2833)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2833)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1253)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1253)
at scala.Option.foreach(Option.scala:437)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1253)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3102)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3036)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3025)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:995)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2393)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeWrite$4(FileFormatWriter.scala:307)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.writeAndCommit(FileFormatWriter.scala:271)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeWrite(FileFormatWriter.scala:304)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:190)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:190)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:392)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:420)
at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:392)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:107)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:201)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:108)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:66)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:107)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:32)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:85)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:83)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:142)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:869)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:391)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:364)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:243)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:802)
at ekg_export.main(ekg_export.java:1285)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
25/09/10 10:02:46 INFO SparkContext: Invoking stop() from shutdown hook
25/09/10 10:02:46 INFO SparkContext: SparkContext is stopping with exitCode 0.
25/09/10 10:02:46 INFO Executor: Executor interrupted and killed task 0.0 in stage 27.0 (TID 18), reason: Stage cancelled: Job aborted due to stage failure: Task 1 in stage 27.0 failed 1 times, most recent failure: Lost task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov executor driver): java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Driver stacktrace:
25/09/10 10:02:46 WARN TaskSetManager: Lost task 0.0 in stage 27.0 (TID 18) (ncias-d3613-v.nci.nih.gov executor driver): TaskKilled (Stage cancelled: Job aborted due to stage failure: Task 1 in stage 27.0 failed 1 times, most recent failure: Lost task 1.0 in stage 27.0 (TID 19) (ncias-d3613-v.nci.nih.gov executor driver): java.lang.UnsatisfiedLinkError: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: /tmp/libzstd-jni-1.5.5-411526472785064416777.so: failed to map segment from shared object
no zstd-jni-1.5.5-4 in java.library.path: [/usr/java/packages/lib, /usr/lib64, /lib64, /lib, /usr/lib]
Unsupported OS/arch, cannot find /linux/amd64/libzstd-jni-1.5.5-4.so or load zstd-jni-1.5.5-4 from system libraries. Please try building from source the jar or providing libzstd-jni-1.5.5-4 in your system.
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2678)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1890)
at com.github.luben.zstd.util.Native$1.run(Native.java:69)
at com.github.luben.zstd.util.Native$1.run(Native.java:67)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.github.luben.zstd.util.Native.loadLibrary(Native.java:67)
at com.github.luben.zstd.util.Native.load(Native.java:154)
at com.github.luben.zstd.util.Native.load(Native.java:85)
at com.github.luben.zstd.ZstdOutputStreamNoFinalizer.<clinit>(ZstdOutputStreamNoFinalizer.java:18)
at com.github.luben.zstd.RecyclingBufferPool.<clinit>(RecyclingBufferPool.java:18)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:90)
at org.apache.parquet.hadoop.codec.ZstandardCodec.createInputStream(ZstandardCodec.java:83)
at org.apache.parquet.hadoop.CodecFactory$HeapBytesDecompressor.decompress(CodecFactory.java:112)
at org.apache.parquet.hadoop.ColumnChunkPageReadStore$ColumnChunkPageReader.readDictionaryPage(ColumnChunkPageReadStore.java:236)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.<init>(VectorizedColumnReader.java:120)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initColumnReader(VectorizedParquetRecordReader.java:437)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:427)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:335)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:233)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:286)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:131)
at org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext(DataSourceScanExec.scala:593)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.columnartorow_nextBatch_0$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage22.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:385)
at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:893)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:893)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:621)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:624)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Driver stacktrace:)
25/09/10 10:02:46 INFO TaskSchedulerImpl: Removed TaskSet 27.0, whose tasks have all completed, from pool
25/09/10 10:02:46 INFO SparkUI: Stopped Spark web UI at http://ncias-d3613-v.nci.nih.gov:4042
25/09/10 10:02:46 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
25/09/10 10:02:46 INFO MemoryStore: MemoryStore cleared
25/09/10 10:02:46 INFO BlockManager: BlockManager stopped
25/09/10 10:02:46 INFO BlockManagerMaster: BlockManagerMaster stopped
25/09/10 10:02:46 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
25/09/10 10:02:46 INFO SparkContext: Successfully stopped SparkContext
25/09/10 10:02:46 INFO ShutdownHookManager: Shutdown hook called
25/09/10 10:02:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-9f42b74d-0290-4162-acc7-fc8ae3aeb5b4
25/09/10 10:02:46 INFO ShutdownHookManager: Deleting directory /tmp/spark-162e62e9-074b-48dd-8df6-95a706b47fe0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment