Skip to content

Instantly share code, notes, and snippets.

@gwenshap
Last active July 11, 2019 03:22
Show Gist options
  • Save gwenshap/1d4477d3e54b732ab04c06474d2fb4c0 to your computer and use it in GitHub Desktop.
Save gwenshap/1d4477d3e54b732ab04c06474d2fb4c0 to your computer and use it in GitHub Desktop.
Kinda old demo of the use of Confluent Cloud to bridge AWS and GCP. With C3 and KSQL and stuff!
# Apache Kafka ecosystem “as a service” on GCP with Confluent Cloud
Gwen Shapira <[email protected]>
v0.01, 9 Jun 2018
## Setup: 24h prior to demo
* Make sure you have a user on https://confluent.cloud with 2 clusters, one on AWS and one on GCP
* Make sure you have ccloud cli on your laptop
* Create a "utility" node on AWS with CPE 5.0:
ssh -i ~/Dropbox/keys/gwen-demo-keys.pem [email protected]
wget https://s3-us-west-2.amazonaws.com/confluent-packages-5.0.0-beta/beta30/archive/5.0/confluent-5.0.0-beta30-2.11.tar.gz
tar -xzf confluent-5.0.0-beta30-2.11.tar.gz
* Create a "utility" node on GCP with CPE 5.0:
gcloud compute --project "gwen-test-202722" ssh --zone "us-central1-c" "gwen-demo"
wget https://s3-us-west-2.amazonaws.com/confluent-packages-5.0.0-beta/beta30/archive/5.0/confluent-5.0.0-beta30-2.11.tar.gz
tar -xzf confluent-5.0.0-beta30-2.11.tar.gz
sudo apt-get install default-jdk
and CPE 4.1, because we need Connect sink and they don't currently work in 5.0:
wget https://s3-us-west-2.amazonaws.com/confluent-packages-4.1.1/archive/4.1/confluent-4.1.1-2.11.tar.gz
tar -xzf confluent-4.1.1-2.11.tar.gz
* Make sure the following ports are open to the world on GCP: 9021 (C3), 8081 (Schema Registry)
* Start Confluent Platform on GCP:
** Copy ccloud-gcp/config to ~/.ccloud/config on GCP and ccloud-aws/config to ~/.ccloud/config on AWS
** Copy Yeva's delta config generator to both GCP and AWS: https://raw.githubusercontent.com/confluentinc/quickstart-demos/master/ccloud/ccloud-generate-cp-configs.sh
** Run Yeva's delta config generator to get the delta_config directory
** Update Schema Registry config with delta and start Schema Registry:
cat delta_configs/schema-registry-ccloud.delta >> confluent-5.0.0-beta30/etc/schema-registry/schema-registry.properties
sed -i 's/kafkastore.connection.url=localhost:2181/#kafkastore.connection.url=localhost:2181/g' confluent-5.0.0-beta30/etc/schema-registry/schema-registry.properties
confluent-5.0.0-beta30/bin/schema-registry-start -daemon confluent-5.0.0-beta30/etc/schema-registry/schema-registry.properties
** Install BigQuery Connector:
confluent-5.0.0-beta30/bin/confluent-hub install wepay/bigquery-sink-connector:1.1.0
and copy to 4.1.1 plugin path:
cp -r ./confluent-5.0.0-beta30/share/confluent-hub-components/wepay-kafka-connect-bigquery/ ./confluent-4.1.1/share/java/
** Update Connect configs and start Connect:
There's no delta script, so copy configs from cloud client page (Java). And replace the secrets with the actual secrets...
Get rid of the places where replication factor is 1.
confluent-4.1.1/bin/connect-distributed -daemon confluent-4.1.1/etc/kafka/connect-distributed.properties
** Update Control Center configs and start Control Center:
cat delta_configs/control-center-ccloud.delta >> confluent-5.0.0-beta30/etc/confluent-control-center/control-center-production.properties
Add same cloud configs for other cluster (with different credentials). Same configs but all start with "confluent.controlcenter.kafka.aws.<something>"
sudo confluent-5.0.0-beta30/bin/control-center-start -daemon confluent-5.0.0-beta30/etc/confluent-control-center/control-center-production.properties
* Load data clickstream data into AWS
ccloud -c ccloud-aws topic create aws.users
ccloud -c ccloud-aws topic create aws.pageviews
ccloud -c ccloud-aws topic list
ssh -i ~/Dropbox/keys/gwen-demo-keys.pem [email protected]
cat delta_configs/ksql-datagen.delta >> confluent-5.0.0-beta30/etc/ksql/datagen.properties
confluent-5.0.0-beta30/bin/ksql-datagen quickstart=users format=avro topic=aws.users maxInterval=1000 iterations=100 schemaRegistryUrl=http://35.232.79.78:8081 propertiesFile=confluent-5.0.0-beta30/etc/ksql/datagen.properties &>/dev/null &
confluent-5.0.0-beta30/bin/ksql-datagen quickstart=pageviews format=json topic=aws.pageviews maxInterval=100 iterations=1000 propertiesFile=confluent-5.0.0-beta30/etc/ksql/datagen.properties &>/dev/null &
* Start KSQL
cp delta_configs/ksql-server-ccloud.delta ./ksql-server-ccloud.properties
cat <<EOF >> ./ksql-server-ccloud.properties
listeners=http://0.0.0.0:8088
ksql.server.ui.enabled=true
auto.offset.reset=earliest
commit.interval.ms=0
cache.max.bytes.buffering=0
auto.offset.reset=earliest
ksql.schema.registry.url=http://localhost:8081
state.dir=ksql-server/data-ccloud/kafka-streams
EOF
confluent-5.0.0-beta30/bin/ksql-server-start ksql-server-ccloud.properties > ksql-server-ccloud.stdout 2>&1 &
* Print this script and take with you
## Setup: 10 min prior to demo
* Validate that you can connect to the clusters and view topics:
ccloud -c ccloud-gcp topic list
ccloud -c ccloud-aws topic list
* Open a browser window, logged in to https://confluent.cloud
* Open a browser window to C3: http://35.232.79.78:9021
* Make sure C3, Connect, Schema Registry and KSQL are all running in GCP
* Produce pageviews, slowly. To AWS if Replicator is working, to GCP if not:
confluent-5.0.0-beta30/bin/ksql-datagen quickstart=pageviews format=json topic=gcp.aws.pageviews maxInterval=1000 propertiesFile=confluent-5.0.0-beta30/etc/ksql/datagen.properties &>/dev/null &
## Demo:
Lets see how Confluent Cloud lets you easily spin up Kafka clusters and manage data across multiple cloud providers.
You can see that I already have two clusters here, one on AWS and one on GCP. If I try to create another cluster, I can select which cloud provider to use. You can even see the price change.
Now lets go to my management console and take a look.
I can see the topics in AWS and I can even drill down and look at the data in the topic.
Suppose that I actually need this data in GCP. Lets start a Replicator process and wait a bit. This process copies events from the AWS topic to the GCP topic in real time, as they happen. And it is highly configurable - you control what to copy and how, you can even modify events on the fly - hide sensitive information for example as it moves from on-premises to the cloud.
Now lets switch to the GCP cluster and look at the data, YAY! We can also check out the schema definitions on the topics.
We have the data, so we should probably do something useful with it. Lets count pageviews by region and gender in 30 seconds window.
We have the data in GCP, it doesn't need to stay in Kafka. As we've seen, Kafka is at its best when it integrates systems. Lets start streaming the data to BigQuery so we can run few reports.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment