Gabor Somogyi gaborgsomogyi

0.10.0.1 => 2.0.0 (https://issues.apache.org/jira/browse/SPARK-18057):
- Java7 support dropped
- Test only: server.boundPort added ListenerName parameter
- Test only: server.replicaManager.getPartition instead of (tp.topic, tp.partition) only TopicPartition is the parameter
- Test only: server.getLogManager().logDirs moved to server.getLogManager().liveLogDirs
2.0.0 => 2.1.0 (https://issues.apache.org/jira/browse/SPARK-25954):
- No breaking
- Java11 support added
2.1.0 => 2.1.1 (https://issues.apache.org/jira/browse/SPARK-26916):
No breaking

Using Gawk:

git log --author="Your_Name_Here" --pretty=tformat: --numstat | gawk '{ add += $1; subs += $2; loc += $1 - $2 } END { printf "added lines: %s removed lines: %s total lines: %s\n", add, subs, loc }' -

Using Awk on Mac OSX:

git log --author="Your_Name_Here" --pretty=tformat: --numstat | awk '{ add += $1; subs += $2; loc += $1 - $2 } END { printf "added lines: %s, removed lines: %s, total lines: %s\n", add, subs, loc }' -

Lag calculation

The lag in Kafka topic calculated in the following way: endOffsets - committedOffsets

endOffsets

To fetch this the following steps can be made:

Get available TopicPartitions with AdminClient describeTopics(java.util.Collection<java.lang.String> topicNames) API.
Create a consumer and fetch the endOffsets with endOffsets(java.util.Collection<TopicPartition> partitions) API.

Common:

val groupIdPrefix = spark-kafka-sources or configured with kafka.groupIdPrefix

Driver:

var nextId = 0
s"${groupIdPrefix}-${UUID.randomUUID}-${metadataPath.hashCode}-driver-${nextId}"
nextId += 1

Executor:

s"${groupIdPrefix}-${UUID.randomUUID}-${metadataPath.hashCode}-executor"

The old scala API: kafka.(consumer|producer)
The new java API: org.apache.kafka.(consumer|producer)

Parameters are arriving to source/sink lowercase.

maxOffsetsPerTrigger parameter as an example:

KafkaSourceProvider uses caseInsensitiveParams which converts keys to lowercase
KafkaMicroBatchStream uses CaseInsensitiveStringMap where get operation uses lowercase conversion
KafkaSource uses CaseInsensitiveMap where get operation uses lowercase conversion

In the last case CaseInsensitiveMap extends Map and as said it provides lowercase key lookup but in the [interface](https://github.com/apache/spark/blob/3e4

	ContainerId string format is changed if RM restarts with work-preserving recovery enabled.

	It used to be such format:
	container_{clusterTimestamp}_{appId}_{attemptId}_{containerId}
	e.g.: container_1410901177871_0001_01_000005.

	It is now changed to:
	container_e{epoch}_{clusterTimestamp}_{appId}_{attemptId}_{containerId}
	e.g.: container_e17_1410901177871_0001_01_000005.

	$ cat consumer.properties
	security.protocol=SASL_SSL
	sasl.kerberos.service.name=kafka
	ssl.truststore.location=/etc/cdep-ssl-conf/CA_STANDARD/truststore.jks
	ssl.truststore.password=cloudera

	$ cat jaas.conf
	KafkaClient {
	com.sun.security.auth.module.Krb5LoginModule required
	useKeyTab=true

	$ spark-shell
	> spark.sql("SET -v").show(999, false)