Skip to content

Instantly share code, notes, and snippets.

View gaborgsomogyi's full-sized avatar

Gabor Somogyi gaborgsomogyi

View GitHub Profile
  • Using Gawk:

git log --author="Your_Name_Here" --pretty=tformat: --numstat | gawk '{ add += $1; subs += $2; loc += $1 - $2 } END { printf "added lines: %s removed lines: %s total lines: %s\n", add, subs, loc }' -

  • Using Awk on Mac OSX:

git log --author="Your_Name_Here" --pretty=tformat: --numstat | awk '{ add += $1; subs += $2; loc += $1 - $2 } END { printf "added lines: %s, removed lines: %s, total lines: %s\n", add, subs, loc }' -

Common:

  • val groupIdPrefix = spark-kafka-sources or configured with kafka.groupIdPrefix

Driver:

  • var nextId = 0
  • s"${groupIdPrefix}-${UUID.randomUUID}-${metadataPath.hashCode}-driver-${nextId}"
  • nextId += 1

Executor:

  • s"${groupIdPrefix}-${UUID.randomUUID}-${metadataPath.hashCode}-executor"
ContainerId string format is changed if RM restarts with work-preserving recovery enabled.
It used to be such format:
container_{clusterTimestamp}_{appId}_{attemptId}_{containerId}
e.g.: container_1410901177871_0001_01_000005.
It is now changed to:
container_e{epoch}_{clusterTimestamp}_{appId}_{attemptId}_{containerId}
e.g.: container_e17_1410901177871_0001_01_000005.
$ spark-shell
> spark.sql("SET -v").show(999, false)
  • The old scala API: kafka.(consumer|producer)
  • The new java API: org.apache.kafka.(consumer|producer)

Parameters are arriving to source/sink lowercase.

maxOffsetsPerTrigger parameter as an example:

  • KafkaSourceProvider uses caseInsensitiveParams which converts keys to lowercase
  • KafkaMicroBatchStream uses CaseInsensitiveStringMap where get operation uses lowercase conversion
  • KafkaSource uses CaseInsensitiveMap where get operation uses lowercase conversion

In the last case CaseInsensitiveMap extends Map and as said it provides lowercase key lookup but in the [interface](https://github.com/apache/spark/blob/3e4

$ cat consumer.properties
security.protocol=SASL_SSL
sasl.kerberos.service.name=kafka
ssl.truststore.location=/etc/cdep-ssl-conf/CA_STANDARD/truststore.jks
ssl.truststore.password=cloudera
$ cat jaas.conf
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true