Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save paulnicholsen27/5dbc9ff9f44b63ee56a4ac33dc39b5df to your computer and use it in GitHub Desktop.

Select an option

Save paulnicholsen27/5dbc9ff9f44b63ee56a4ac33dc39b5df to your computer and use it in GitHub Desktop.
(pipeline_extractor) nicholsenpm@ncias-d3613-v /data/users/nicholsenpm $ /opt/pipelines/spark-3.5.6-bin-hadoop3-scala2.13/bin/spark-submit --conf spark.sql.parquet.compression.codec=gzip airflow_extractions/BTRIS_CB_Color_Coding_08212025_140528/src/btris_procedure_processed/btris_procedure_processed.jar airflow_extractions/BTRIS_CB_Color_Coding_08212025_140528/output/btris_observation_general_clean airflow_extractions/BTRIS_CB_Color_Coding_08212025_140528/output/btris_red_ancestor_descendant_clean airflow_extractions/BTRIS_CB_Color_Coding_08182025_133543/airflow_extractions/BTRIS_CB_Color_Coding_08212025_140528/output/btris_procedure_processed
25/08/26 11:49:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
25/08/26 11:49:51 INFO SparkContext: Running Spark version 3.5.6
25/08/26 11:49:51 INFO SparkContext: OS info Linux, 4.18.0-553.69.1.el8_10.x86_64, amd64
25/08/26 11:49:51 INFO SparkContext: Java version 11.0.27
25/08/26 11:49:51 INFO ResourceUtils: ==============================================================
25/08/26 11:49:51 INFO ResourceUtils: No custom resources configured for spark.driver.
25/08/26 11:49:51 INFO ResourceUtils: ==============================================================
25/08/26 11:49:51 INFO SparkContext: Submitted application: btris_procedure_processed
25/08/26 11:49:51 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
25/08/26 11:49:51 INFO ResourceProfile: Limiting resource is cpu
25/08/26 11:49:51 INFO ResourceProfileManager: Added ResourceProfile id: 0
25/08/26 11:49:51 INFO SecurityManager: Changing view acls to: nicholsenpm
25/08/26 11:49:51 INFO SecurityManager: Changing modify acls to: nicholsenpm
25/08/26 11:49:51 INFO SecurityManager: Changing view acls groups to:
25/08/26 11:49:51 INFO SecurityManager: Changing modify acls groups to:
25/08/26 11:49:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: nicholsenpm; groups with view permissions: EMPTY; users with modify permissions: nicholsenpm; groups with modify permissions: EMPTY
25/08/26 11:49:52 INFO Utils: Successfully started service 'sparkDriver' on port 36645.
25/08/26 11:49:52 INFO SparkEnv: Registering MapOutputTracker
25/08/26 11:49:52 INFO SparkEnv: Registering BlockManagerMaster
25/08/26 11:49:52 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
25/08/26 11:49:52 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
25/08/26 11:49:52 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
25/08/26 11:49:52 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5363a0fb-85e6-40b1-b9bd-2bfa4ebf76ea
25/08/26 11:49:52 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
25/08/26 11:49:52 INFO SparkEnv: Registering OutputCommitCoordinator
25/08/26 11:49:52 INFO JettyUtils: Start Jetty 0.0.0.0:4040 for SparkUI
25/08/26 11:49:52 INFO Utils: Successfully started service 'SparkUI' on port 4040.
25/08/26 11:49:52 INFO SparkContext: Added JAR file:/data/users/nicholsenpm/airflow_extractions/BTRIS_CB_Color_Coding_08212025_140528/src/btris_procedure_processed/btris_procedure_processed.jar at spark://ncias-d3613-v.nci.nih.gov:36645/jars/btris_procedure_processed.jar with timestamp 1756223391745
25/08/26 11:49:52 INFO Executor: Starting executor ID driver on host ncias-d3613-v.nci.nih.gov
25/08/26 11:49:52 INFO Executor: OS info Linux, 4.18.0-553.69.1.el8_10.x86_64, amd64
25/08/26 11:49:52 INFO Executor: Java version 11.0.27
25/08/26 11:49:52 INFO Executor: Starting executor with user classpath (userClassPathFirst = false): ''
25/08/26 11:49:52 INFO Executor: Created or updated repl class loader org.apache.spark.util.MutableURLClassLoader@1dcedc93 for default.
25/08/26 11:49:52 INFO Executor: Fetching spark://ncias-d3613-v.nci.nih.gov:36645/jars/btris_procedure_processed.jar with timestamp 1756223391745
25/08/26 11:49:52 INFO TransportClientFactory: Successfully created connection to ncias-d3613-v.nci.nih.gov/10.133.109.39:36645 after 34 ms (0 ms spent in bootstraps)
25/08/26 11:49:52 INFO Utils: Fetching spark://ncias-d3613-v.nci.nih.gov:36645/jars/btris_procedure_processed.jar to /tmp/spark-00d6c64f-8cd2-4e4c-8d6d-0458da8711f8/userFiles-cdd97551-e423-4b0c-b985-ce35a52481a1/fetchFileTemp17430073055883655304.tmp
25/08/26 11:49:52 INFO Executor: Adding file:/tmp/spark-00d6c64f-8cd2-4e4c-8d6d-0458da8711f8/userFiles-cdd97551-e423-4b0c-b985-ce35a52481a1/btris_procedure_processed.jar to class loader default
25/08/26 11:49:52 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40415.
25/08/26 11:49:52 INFO NettyBlockTransferService: Server created on ncias-d3613-v.nci.nih.gov:40415
25/08/26 11:49:52 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
25/08/26 11:49:52 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ncias-d3613-v.nci.nih.gov, 40415, None)
25/08/26 11:49:52 INFO BlockManagerMasterEndpoint: Registering block manager ncias-d3613-v.nci.nih.gov:40415 with 434.4 MiB RAM, BlockManagerId(driver, ncias-d3613-v.nci.nih.gov, 40415, None)
25/08/26 11:49:52 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ncias-d3613-v.nci.nih.gov, 40415, None)
25/08/26 11:49:52 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ncias-d3613-v.nci.nih.gov, 40415, None)
25/08/26 11:49:53 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
25/08/26 11:49:53 INFO SharedState: Warehouse path is 'file:/data/users/nicholsenpm/spark-warehouse'.
25/08/26 11:49:54 INFO InMemoryFileIndex: It took 53 ms to list leaf files for 1 paths.
25/08/26 11:49:54 INFO SparkContext: Starting job: parquet at btris_procedure_processed.java:169
25/08/26 11:49:54 INFO DAGScheduler: Got job 0 (parquet at btris_procedure_processed.java:169) with 1 output partitions
25/08/26 11:49:54 INFO DAGScheduler: Final stage: ResultStage 0 (parquet at btris_procedure_processed.java:169)
25/08/26 11:49:54 INFO DAGScheduler: Parents of final stage: List()
25/08/26 11:49:54 INFO DAGScheduler: Missing parents: List()
25/08/26 11:49:54 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at btris_procedure_processed.java:169), which has no missing parents
25/08/26 11:49:54 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 103.5 KiB, free 434.3 MiB)
25/08/26 11:49:54 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 37.3 KiB, free 434.3 MiB)
25/08/26 11:49:54 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ncias-d3613-v.nci.nih.gov:40415 (size: 37.3 KiB, free: 434.4 MiB)
25/08/26 11:49:54 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1611
25/08/26 11:49:54 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at btris_procedure_processed.java:169) (first 15 tasks are for partitions Vector(0))
25/08/26 11:49:54 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks resource profile 0
25/08/26 11:49:54 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (ncias-d3613-v.nci.nih.gov, executor driver, partition 0, PROCESS_LOCAL, 9591 bytes)
25/08/26 11:49:54 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
25/08/26 11:49:55 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 3045 bytes result sent to driver
25/08/26 11:49:55 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 489 ms on ncias-d3613-v.nci.nih.gov (executor driver) (1/1)
25/08/26 11:49:55 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
25/08/26 11:49:55 INFO DAGScheduler: ResultStage 0 (parquet at btris_procedure_processed.java:169) finished in 0.645 s
25/08/26 11:49:55 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
25/08/26 11:49:55 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
25/08/26 11:49:55 INFO DAGScheduler: Job 0 finished: parquet at btris_procedure_processed.java:169, took 0.690863 s
25/08/26 11:49:55 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ncias-d3613-v.nci.nih.gov:40415 in memory (size: 37.3 KiB, free: 434.4 MiB)
25/08/26 11:49:56 INFO InMemoryFileIndex: It took 3 ms to list leaf files for 1 paths.
25/08/26 11:49:56 INFO SparkContext: Starting job: parquet at btris_procedure_processed.java:170
25/08/26 11:49:56 INFO DAGScheduler: Got job 1 (parquet at btris_procedure_processed.java:170) with 1 output partitions
25/08/26 11:49:56 INFO DAGScheduler: Final stage: ResultStage 1 (parquet at btris_procedure_processed.java:170)
25/08/26 11:49:56 INFO DAGScheduler: Parents of final stage: List()
25/08/26 11:49:56 INFO DAGScheduler: Missing parents: List()
25/08/26 11:49:56 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at parquet at btris_procedure_processed.java:170), which has no missing parents
25/08/26 11:49:56 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 103.5 KiB, free 434.3 MiB)
25/08/26 11:49:56 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.3 KiB, free 434.3 MiB)
25/08/26 11:49:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ncias-d3613-v.nci.nih.gov:40415 (size: 37.3 KiB, free: 434.4 MiB)
25/08/26 11:49:56 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1611
25/08/26 11:49:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at parquet at btris_procedure_processed.java:170) (first 15 tasks are for partitions Vector(0))
25/08/26 11:49:56 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks resource profile 0
25/08/26 11:49:56 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1) (ncias-d3613-v.nci.nih.gov, executor driver, partition 0, PROCESS_LOCAL, 9595 bytes)
25/08/26 11:49:56 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
25/08/26 11:49:56 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 2137 bytes result sent to driver
25/08/26 11:49:56 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 23 ms on ncias-d3613-v.nci.nih.gov (executor driver) (1/1)
25/08/26 11:49:56 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
25/08/26 11:49:56 INFO DAGScheduler: ResultStage 1 (parquet at btris_procedure_processed.java:170) finished in 0.048 s
25/08/26 11:49:56 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
25/08/26 11:49:56 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
25/08/26 11:49:56 INFO DAGScheduler: Job 1 finished: parquet at btris_procedure_processed.java:170, took 0.052047 s
Exception in thread "main" java.lang.RuntimeException: Error invoking transform
at btris_procedure_processed.main(btris_procedure_processed.java:191)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1034)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:199)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:222)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1125)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1134)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at btris_procedure_processed.main(btris_procedure_processed.java:188)
... 12 more
Caused by: org.apache.spark.sql.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `ancestor_concept` cannot be resolved. Did you mean one of the following? [`status_concept`, `domain_concept`, `sequence`, `appl_source_cd`, `event_guid`].;
'Filter coalesce(('ancestor_concept <=> C3146179), false)
+- Relation [observation_guid#0,subject_guid#1,child_flag#2,parent_guid#3,event_guid#4,sequence#5,observation_name#6,observation_name_concept#7,observation_value_text#8,observation_value_numeric#9,observation_value_concept#10,observation_value_name#11,unit_of_measure#12,date_administered#13,appl_source_cd#14,observation_note#15,status#16,status_concept#17,domain_concept#18,timestamp_date_administered#19] parquet
at org.apache.spark.sql.errors.QueryCompilationErrors$.unresolvedAttributeError(QueryCompilationErrors.scala:306)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.org$apache$spark$sql$catalyst$analysis$CheckAnalysis$$failUnresolvedAttribute(CheckAnalysis.scala:141)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$6(CheckAnalysis.scala:299)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$6$adapted(CheckAnalysis.scala:297)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
at scala.collection.immutable.Vector.foreach(Vector.scala:1856)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:243)
at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
at scala.collection.AbstractIterable.foreach(Iterable.scala:926)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5(CheckAnalysis.scala:297)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$5$adapted(CheckAnalysis.scala:297)
at scala.collection.immutable.List.foreach(List.scala:333)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2(CheckAnalysis.scala:297)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis0$2$adapted(CheckAnalysis.scala:215)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:244)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0(CheckAnalysis.scala:215)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis0$(CheckAnalysis.scala:197)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis0(Analyzer.scala:202)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:193)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:171)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:202)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:225)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:222)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:77)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:138)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:219)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:546)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:219)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:900)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:218)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:206)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:212)
at org.apache.spark.sql.Dataset$.apply(Dataset.scala:76)
at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:4357)
at org.apache.spark.sql.Dataset.filter(Dataset.scala:1676)
at PipelineLogic.transform(btris_procedure_processed.java:24)
... 17 more
25/08/26 11:49:56 INFO SparkContext: Invoking stop() from shutdown hook
25/08/26 11:49:56 INFO SparkContext: SparkContext is stopping with exitCode 0.
25/08/26 11:49:56 INFO SparkUI: Stopped Spark web UI at http://ncias-d3613-v.nci.nih.gov:4040
25/08/26 11:49:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
25/08/26 11:49:56 INFO MemoryStore: MemoryStore cleared
25/08/26 11:49:56 INFO BlockManager: BlockManager stopped
25/08/26 11:49:56 INFO BlockManagerMaster: BlockManagerMaster stopped
25/08/26 11:49:56 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
25/08/26 11:49:56 INFO SparkContext: Successfully stopped SparkContext
25/08/26 11:49:56 INFO ShutdownHookManager: Shutdown hook called
25/08/26 11:49:56 INFO ShutdownHookManager: Deleting directory /tmp/spark-00d6c64f-8cd2-4e4c-8d6d-0458da8711f8
25/08/26 11:49:56 INFO ShutdownHookManager: Deleting directory /tmp/spark-2be6c856-6795-4c90-a95d-df8735bd214a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment