- 0.10.0.1 => 2.0.0 (https://issues.apache.org/jira/browse/SPARK-18057):
- Java7 support dropped
- Test only:
server.boundPortaddedListenerNameparameter - Test only:
server.replicaManager.getPartitioninstead of(tp.topic, tp.partition)onlyTopicPartitionis the parameter - Test only:
server.getLogManager().logDirsmoved toserver.getLogManager().liveLogDirs
- 2.0.0 => 2.1.0 (https://issues.apache.org/jira/browse/SPARK-25954):
- No breaking
- Java11 support added
- 2.1.0 => 2.1.1 (https://issues.apache.org/jira/browse/SPARK-26916):
- No breaking
- Using Gawk:
git log --author="Your_Name_Here" --pretty=tformat: --numstat | gawk '{ add += $1; subs += $2; loc += $1 - $2 } END { printf "added lines: %s removed lines: %s total lines: %s\n", add, subs, loc }' -
- Using Awk on Mac OSX:
git log --author="Your_Name_Here" --pretty=tformat: --numstat | awk '{ add += $1; subs += $2; loc += $1 - $2 } END { printf "added lines: %s, removed lines: %s, total lines: %s\n", add, subs, loc }' -
The lag in Kafka topic calculated in the following way: endOffsets - committedOffsets
To fetch this the following steps can be made:
- Get available
TopicPartitionswithAdminClientdescribeTopics(java.util.Collection<java.lang.String> topicNames) API. - Create a consumer and fetch the endOffsets with endOffsets(java.util.Collection<TopicPartition> partitions) API.
Common:
- val groupIdPrefix =
spark-kafka-sourcesor configured withkafka.groupIdPrefix
Driver:
- var nextId = 0
- s"${groupIdPrefix}-${UUID.randomUUID}-${metadataPath.hashCode}-driver-${nextId}"
- nextId += 1
Executor:
- s"${groupIdPrefix}-${UUID.randomUUID}-${metadataPath.hashCode}-executor"
| ContainerId string format is changed if RM restarts with work-preserving recovery enabled. | |
| It used to be such format: | |
| container_{clusterTimestamp}_{appId}_{attemptId}_{containerId} | |
| e.g.: container_1410901177871_0001_01_000005. | |
| It is now changed to: | |
| container_e{epoch}_{clusterTimestamp}_{appId}_{attemptId}_{containerId} | |
| e.g.: container_e17_1410901177871_0001_01_000005. |
| rm -rf .git/gitk.cache |
| $ spark-shell | |
| > spark.sql("SET -v").show(999, false) |
- The old scala API: kafka.(consumer|producer)
- The new java API: org.apache.kafka.(consumer|producer)
Parameters are arriving to source/sink lowercase.
maxOffsetsPerTrigger parameter as an example:
KafkaSourceProvideruses caseInsensitiveParams which converts keys to lowercaseKafkaMicroBatchStreamusesCaseInsensitiveStringMapwhere get operation uses lowercase conversionKafkaSourceusesCaseInsensitiveMapwhere get operation uses lowercase conversion
In the last case CaseInsensitiveMap extends Map and as said it provides lowercase key lookup but in the [interface](https://github.com/apache/spark/blob/3e4
| $ cat consumer.properties | |
| security.protocol=SASL_SSL | |
| sasl.kerberos.service.name=kafka | |
| ssl.truststore.location=/etc/cdep-ssl-conf/CA_STANDARD/truststore.jks | |
| ssl.truststore.password=cloudera | |
| $ cat jaas.conf | |
| KafkaClient { | |
| com.sun.security.auth.module.Krb5LoginModule required | |
| useKeyTab=true |