Skip to content

Instantly share code, notes, and snippets.

View nsivabalan's full-sized avatar

Sivabalan Narayanan nsivabalan

View GitHub Profile
com.esotericsoftware.kryo.KryoException: java.lang.NullPointerException
Serialization trace:
filesToBeDeleted (org.apache.hudi.avro.model.HoodieListingBasedRollbackRequest)
at com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:144)
at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:543)
at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:731)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:391)
at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.read(DefaultArraySerializers.java:302)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:813)
at com.twitter.chill.WrappedArraySerializer.read(WrappedArraySerializer.scala:36)
21/09/10 14:34:34 WARN DagScheduler: Executing node "first_delete" :: {"name":"ec2e6f6d-8685-4828-831e-fba844da1ef2","num_partitions_delete":50,"num_records_delete":8000,"config":"first_delete"}
21/09/10 14:38:35 ERROR Executor: Exception in task 0.0 in stage 106.0 (TID 2445)
java.io.InterruptedIOException: getFileStatus on s3a://siva-test-bucket-june-16/hudi_testing/hudi_metadata1/output/1970/01/15/76112e22-c3b9-4497-9e9b-2c5a449d7477-0_20-21-148_20210910143308.parquet: com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException(S3AUtils.java:352)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:177)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:151)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2201)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2163)
at org.apache.hadoop.fs.s3a.S3
21/09/09 20:25:13 WARN SparkSqlCreateTableNode: ----- Running the following Spark SQL query -----
21/09/09 20:25:13 WARN SparkSqlCreateTableNode: create table table1 (timestamp bigint,
_row_key string,
rider string,
driver string,
begin_lat double,
begin_lon double,
end_lat double,
end_lon double,
fare double,
val df = spark.read.format("parquet").load("/tmp/bootstrap_src").limit(1000)
val df1 = df.select(col("*"), (substring(col("created_at"), 0, 10)).as("date_col")).drop("created_at")
df1.write.parquet("/tmp/bootstrap_src_parquet/")
// Prepare bootstrap base path with 1000 records
val parquetDf = spark.read.format("parquet").load("/tmp/bootstrap_src_parquet/")
scala> parquetDf.printSchema
---
title: "Release 0.9.0"
sidebar_position: 2
layout: releases
toc: true
last_modified_at: 2021-08-26T08:40:00-07:00
---
# [Release 0.9.0](https://github.com/apache/hudi/releases/tag/release-0.9.0) ([docs](/docs/quick-start-guide))
## Download Information
spark.time(updatesDf.write.format("hudi").option("hoodie.upsert.shuffle.parallelism","500").option(PRECOMBINE_FIELD.key(), "created_at").option(RECORDKEY_FIELD.key(), "id").option(PARTITIONPATH_FIELD.key(), "type").option("hoodie.parquet.compression.codec", "SNAPPY").option(OPERATION.key(),"upsert").option("hoodie.datasource.write.table.name", "hudi_3").option("hoodie.table.name","hudi_4").mode("Append").save("s3a://siva-test-bucket-june-16/hudi_testing/hudi_4/"))
21/08/23 06:20:56 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Requesting driver to remove executor 14 for reason Container from a bad node: container_1629694615075_0009_01_000015 on host: ip-172-31-41-172.us-east-2.compute.internal. Exit status: 143. Diagnostics: [2021-08-23 06:20:56.504]Container killed on request. Exit code is 143
[2021-08-23 06:20:56.504]Container exited with a non-zero exit code 143.
[2021-08-23 06:20:56.505]Killed by external signal
.
21/08/23 06:20:56 ERROR cluster.YarnScheduler: Lost executor 14 on ip-172-31-41-
scala> import org.apache.hudi.QuickstartUtils._
import org.apache.hudi.QuickstartUtils._
scala> import scala.collection.JavaConversions._
import scala.collection.JavaConversions._
scala> import org.apache.spark.sql.SaveMode._
import org.apache.spark.sql.SaveMode._
scala> import org.apache.hudi.DataSourceReadOptions._
21/08/18 07:00:38 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1184
21/08/18 07:00:38 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[6] at map at AvroDFSSource.java:65) (first 15 tasks are for partitions Vector(0))
21/08/18 07:00:38 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
21/08/18 07:00:38 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 0, PROCESS_LOCAL, 8017 bytes)
21/08/18 07:00:38 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
21/08/18 07:00:38 INFO rdd.NewHadoopRDD: Input split: s3a://siva-test-bucket-june-16/hudi_testing/hudi-integ-test-suite/input/1/d7a2ecaa-5acc-4b04-a2fb-87b88eea6908.avro:0+158531
21/08/18 07:00:38 WARN mapreduce.AvroKeyInputFormat: Reader schema was not set. Use AvroJob.setInputKeySchema() if desired.
21/08/18 07:00:38 INFO mapreduce.AvroKeyInputFormat: Using a reader schema equal to the writer schema.
21/08/17 04:15:02 INFO netty.NettyBlockTransferService: Server created on ip-172-31-33-9.us-east-2.compute.internal:38471
21/08/17 04:15:02 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/08/17 04:15:02 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-172-31-33-9.us-east-2.compute.internal, 38471, None)
21/08/17 04:15:02 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-172-31-33-9.us-east-2.compute.internal:38471 with 2.7 GiB RAM, BlockManagerId(driver, ip-172-31-33-9.us-east-2.compute.internal, 38471, None)
21/08/17 04:15:02 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-172-31-33-9.us-east-2.compute.internal, 38471, None)
21/08/17 04:15:02 INFO storage.BlockManager: external shuffle service port = 7337
21/08/17 04:15:02 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-172-31-33-9.us-east-2.compute.internal, 38471, Non
gpg --list-keys
/Users/nsb/.gnupg/pubring.kbx
-----------------------------
pub rsa4096 2014-01-09 [SC]
546ADE39552C5326120BB450B4F1CCC4D3541808
uid [ unknown] Suneel Marthi (CODE SIGNING KEY) <[email protected]>
sub rsa4096 2014-01-09 [E]
pub rsa4096 2019-07-29 [SC]
AF9BAF79D311A3D3288E583F24A499037262AAA4