saltmaster
,- kafka-0, ..., kafka-n
/
bastion --- cdh-edge
\
`- cdh-mgr1
cdh-dn-0
cdh-dn-1, ..., cdh-dn-n
kafka, kafka_manager, platform_testing_general, elk, zookeeper
cloudera_edge, console_frontend, console_backend_data_logger, console_backend_data_manager, graphite, gobblin, deployment_manager, package_repository, data_service, impala-shell, yarn-gateway, hbase_opentsdb_tables, hdfs_cleaner, master_dataset, elk, logserver, kibana_dashboard, jupyter, cloudera_manager, platform_testing_cdh, mysql_connector, pnda_restart
cron 0,30 mins runs gobblin-mapreduce.sh --conf /opt/pnda/gobblin/configs/mr.pull
KafkaSimpleSource ► PNDAConverter ► [SchemaRowCheckPolicy] ► PNDAKiteWriter to hdfs via cdh-mgr1
cloudera_namenode, mysql_connector, oozie_database, hue, opentsdb, grafana
$
Kite - http://kitesdk.org/docs/current/ - a dataset API for hadoop.
Impala - https://impala.incubator.apache.org/ - analytic queries on hadoop.
Spark - http://spark.apache.org/ - fast and general engine for large-scale data processing.
Oozie - http://oozie.apache.org/ - workflow scheduler.
Yarn - https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html - resource manager.
Hive - https://hive.apache.org/ - SQL access to hdfs.
HBase - https://hbase.apache.org/ - Big-table on hadoop hdfs.