Skip to content

Instantly share code, notes, and snippets.

View johntbush's full-sized avatar

john bush johntbush

View GitHub Profile
package com.example.parquet.writing
import java.lang.Exception
import java.util
import org.apache.hadoop.conf.Configuration
import org.apache.parquet.hadoop.ParquetWriter
import org.apache.parquet.hadoop.metadata.CompressionCodecName
import org.apache.hadoop.fs.Path
import java.util.{Date, UUID}
FROM AWS s05 (singapore)
Datacenter: ap-southeast
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.252.2.191 80.38 GiB 256 ? d38f1bf8-f77e-454a-93d6-e0eddecffa06 1a
UN 10.252.12.212 76.63 GiB 256 ? e3d9201e-799f-4abe-bd53-d22585556bd4 1b
UN 10.252.2.160 76.33 GiB 256 ? 1ee587ca-e929-4057-85a0-94c2e5af5ab5 1a
@johntbush
johntbush / check_s3.py
Last active November 7, 2020 07:25
s3_inventory
import json
import gzip
import pandas as pd
import dateutil.parser
def load_df(data_file):
names = ['Bucket', 'Key', 'Size', 'LastModifiedDate', 'ETag', 'StorageClass', 'IsMultipartUploaded']
return pd.read_csv(data_file, names=names)
def old_files(df, year):
curl -s "http://sheets.s03.filex.com/2726daf0-7dbf-5dae-bd4d-944d5313944a?format=json&sort=recordid:desc&filter=audit_created_on:2017-10-31&size=-1" | jq -r '.data[] | .primary_key'
curl -s -XDELETE http://sheets.s03.filex.com/2726daf0-7dbf-5dae-bd4d-944d5313944a/40b42240-be7e-11e7-aa09-0e18d10715a6
curl -s -XDELETE http://sheets.s03.filex.com/2726daf0-7dbf-5dae-bd4d-944d5313944a/9844e706-be7e-11e7-8909-1284c881d488
curl -s -XDELETE http://sheets.s03.filex.com/2726daf0-7dbf-5dae-bd4d-944d5313944a/40fa68f4-be7e-11e7-aa09-0e18d10715a6
curl -s -XDELETE http://sheets.s03.filex.com/2726daf0-7dbf-5dae-bd4d-944d5313944a/9886100a-be7e-11e7-8909-1284c881d488
curl -s -XDELETE http://sheets.s03.filex.com/2726daf0-7dbf-5dae-bd4d-944d5313944a/98a6beae-be7e-11e7-8909-1284c881d488
curl -s -XDELETE http://sheets.s03.filex.com/2726daf0-7dbf-5dae-bd4d-944d5313944a/4160559c-be7e-11e7-aa09-0e18d10715a6
curl -s -XDELETE http://sheets.s03.filex.com/2726daf0-7dbf-5dae-bd4d-944d5313944a/4181268c-be7e-11e7-aa09-0e18d10715a6
curl -s
@johntbush
johntbush / cassandra_tables_examples.sql
Last active September 18, 2017 05:32
cassandra tables examples
create table suid (
suid UUID PRIMARY KEY,
context TEXT,
ownerkey TEXT,
label TEXT,
environment UUID,
created TIMESTAMP,
createdby TEXT,
modified TIMESTAMP,
modifiedby TEXT,
@johntbush
johntbush / sample_router_response.json
Created August 29, 2017 22:19
sample_router_response.json
{
"src_url": "smb://filex.com/comm/filerouter_virt",
"src_subfolder": "",
"route_id": 8205,
"sha256": "62b6ddddef3b34b9840275d7dc898c6949b2a4775b88e0cd0a4559531b2e79f8",
"dst": "",
"dst_url": "",
"subfolder": "",
"path": "\\\\filex.com\\comm\\filerouter_virt",
"x12_version": "",
@johntbush
johntbush / ics_schema.sql
Last active August 29, 2017 22:35
ICS Schema
CREATE TABLE [dbo].[Issues]
(
[seq_num] [int] NOT NULL,
[create date] [smalldatetime] NULL,
[customer] [nvarchar](50) NULL,
[issue_name] [varchar](MAX) NULL,
[submitter_name] [nvarchar](75) NULL,
[email_address] [varchar](MAX) NULL,
[category] [nvarchar](64) NULL,
[priority] [int] NULL,
@johntbush
johntbush / getDateSource.scala
Created August 29, 2017 16:50
getDataSource
def getDataSource(db: ConnectionName, write: Boolean, userName: String = user, pwd: String = password, isDomainLogon: Boolean = true, sendStringParametersAsUnicode: Option[Boolean] = None): HikariDataSource = {
val hconfig = new HikariConfig()
val url = baseUrl + db.sqlDns + "/" + db.databaseName
hconfig.setPoolName(db.connectionName + "_" + Utils.newUUID)
hconfig.setMaximumPoolSize(poolSize)
hconfig.setMinimumIdle(1)
hconfig.setJdbcUrl(url)
hconfig.setDriverClassName("net.sourceforge.jtds.jdbc.Driver")
hconfig.addDataSourceProperty("serverName", db.sqlDns)
@johntbush
johntbush / ratemetrics.py
Last active August 7, 2017 20:27
storm rate metrics logic
if (type = source or type is null) and queue = rate-metrics and msg is not from CDC
write log in cassandra
call rates API to apply patterns
write doc in elasticsearch
catch failure
if retries not exhausted
modify message to set retry flag and update num of retries
publish back on rate-metrics queue
ack message
{
"type" : "update",
"data" : {
"FbId": "FBLL1000681002182280595",
// or
"FbIds": ["FBLL1000681002182280595", "FBLL1000681002182280596"]
}
}