Skip to content

Instantly share code, notes, and snippets.

View tspannhw's full-sized avatar
💭
Unstructured Data, Vector Database, Cloud, AI, Edge, Streaming, SQL

Timothy Spann tspannhw

💭
Unstructured Data, Vector Database, Cloud, AI, Edge, Streaming, SQL
View GitHub Profile
@ijokarumawak
ijokarumawak / Merge_XML_Records.xml
Created November 29, 2018 01:27
A NiFi example template to illustrate how to merge multiple XML files.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<template encoding-version="1.2">
<description></description>
<groupId>39379f66-0167-1000-9951-3cf7c004e310</groupId>
<name>Merge XML Records</name>
<snippet>
<controllerServices>
<id>36c4d83a-ff47-38e2-0000-000000000000</id>
<parentGroupId>376efa9a-48fc-3e3d-0000-000000000000</parentGroupId>
<bundle>
@ijokarumawak
ijokarumawak / AddTimestamp.xml
Created November 19, 2018 06:13
Example NiFi template to add new CSV timestamp column by UpdateRecord
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<template encoding-version="1.2">
<description></description>
<groupId>29cd7683-0167-1000-0886-c9dc91c022a5</groupId>
<name>AddTimestamp</name>
<snippet>
<connections>
<id>09785868-682a-3058-0000-000000000000</id>
<parentGroupId>818236f9-e91f-324b-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@abajwa-hw
abajwa-hw / deploy_hdf33.sh
Last active April 13, 2019 00:37
Automation to deploy HDF 3.3 on RHEL 7
#!/usr/bin/env bash
# Launch Centos/RHEL 7 Vm with at least 4 cores / 16Gb mem / 60Gb disk
# Then run:
# curl -sSL https://gist.github.com/abajwa-hw/b5565d7e7f9beffd8dd57a970dc54266/raw | sudo -E sh
export ambari_password=${ambari_password:-StrongPassword}
export db_password=${db_password:-StrongPassword}
export nifi_password=${nifi_password:-StrongPassword}
export ambari_services="ZOOKEEPER STREAMLINE NIFI KAFKA STORM REGISTRY NIFI_REGISTRY KNOX AMBARI_METRICS"
export cluster_name=${cluster_name:-hdf}
@abajwa-hw
abajwa-hw / deploy_hdp30hdf33.sh
Last active December 5, 2018 15:44
Deploy HDP 3.0 and HDF 3.3
#!/usr/bin/env bash
# Launch Centos/RHEL 7 VM with at least 8 vcpu / 32Gb+ memory / 100Gb disk
# Then run:
# curl -sSL https://gist.github.com/abajwa-hw/66c62bc860a47dfb0de53dfe5cbb4415/raw | sudo -E sh
export create_image=${create_image:-false}
export ambari_version=2.7.1.0
export mpack_url="http://public-repo-1.hortonworks.com/HDF/amazonlinux2/3.x/updates/3.3.0.0/tars/hdf_ambari_mp/hdf-ambari-mpack-3.3.0.0-165.tar.gz"
export hdf_vdf="http://s3.amazonaws.com/public-repo-1.hortonworks.com/HDF/centos7/3.x/updates/3.3.0.0/HDF-3.3.0.0-165.xml"
@ijokarumawak
ijokarumawak / 0.InvokeHTTP_Attributes.md
Last active February 15, 2022 11:00
This NiFi flow template illustrates how incoming FlowFile attributes are carried to the InvokeHTTP output FlowFile.

This NiFi flow template illustrates how incoming FlowFile attributes are carried to the InvokeHTTP output FlowFile.

@tspannhw
tspannhw / MDD.xml
Created September 15, 2018 22:15 — forked from pvillard31/MDD.xml
Template for Monitoring Driven Development in NiFi
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<template encoding-version="1.2">
<description></description>
<groupId>8927f4c0-0160-1000-597a-ea764ccd81a7</groupId>
<name>MDD</name>
<snippet>
<connections>
<id>a2098494-cce9-3fa4-0000-000000000000</id>
<parentGroupId>a8352767-434f-3321-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@pvillard31
pvillard31 / MDD.xml
Created August 29, 2018 16:15
Template for Monitoring Driven Development in NiFi
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<template encoding-version="1.2">
<description></description>
<groupId>8927f4c0-0160-1000-597a-ea764ccd81a7</groupId>
<name>MDD</name>
<snippet>
<connections>
<id>a2098494-cce9-3fa4-0000-000000000000</id>
<parentGroupId>a8352767-434f-3321-0000-000000000000</parentGroupId>
<backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold>
@mduhan
mduhan / SetKafkaConsumergroupOffset.java
Created August 18, 2018 14:09
Java code to update Consumer group offset
package com.operative.pipelinetracker.controller;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

What is Apache Nifi?

Apache NiFi is a data flow tool that is focused on moving data between systems.
NiFi's focus is on capabilities like visual command and control, filtering of data, enrichment of data, data provenance, and security, just to name a few. With NiFi, you aren't writing code and deploying it as a job, you are building a living data flow through the UI that is taking effect with each action.
Data flow tool is often complimentary and used to manage the flow of data from the sources to the streaming processing platforms, such as SAS ESP.

What is SAS Event Stream Processing (a.k.a. SAS ESP)?

Event stream processing applications typically perform real-time analytics on streams of events. These streams are continuously published into an event stream processing engine. Typical use cases for event stream processing include but are not limited to the following:

  • sensor data monitoring and management
  • operational systems monitoring and management
@ijokarumawak
ijokarumawak / 00_README.md
Last active February 13, 2019 03:18
NiFi example to ingest a set of files only when a complete set of files is ready.

This example flow can be used to process files with following requirements:

  • A group of files can only be processed when every files for a specific group is ready

  • Each filename has groupId (e.g. 123_456) and a type name (e.g. ab/cd/ef)

  • Example set of files for group '123_456'

    • file_123_456_ab.ex1
    • file_123_456_cd.ex1
    • file_123_456_ef.ex1
  • file_123_456.ex2