Skip to content

Instantly share code, notes, and snippets.

View vaquarkhan's full-sized avatar
:octocat:
while( !(succeed=try())){}

Vaquar Khan vaquarkhan

:octocat:
while( !(succeed=try())){}
View GitHub Profile
@vaquarkhan
vaquarkhan / GlueLastJobDuration.py
Created April 4, 2022 04:59 — forked from Lydon-01/GlueLastJobDuration.py
Script to get a specific AWS Glue Job and tell you the duration of the last run.
## Python 2.7
## GlueLastRunDuration.py
## Version 1
## by Lydon Carter October 2018
## USE
# Script to get a specific AWS Glue Job and tell you the duration of
# the last run.
# Notes:
# -- The script will use the location you setup for your Glue Context in the "Needed stuff"
@vaquarkhan
vaquarkhan / aws_signature_v4.py
Last active March 25, 2022 16:57 — forked from nivertech/aws_signature_v4.py
sign AWS Lambda HTTP API request - using AWS Version 4 signature
# AWS Version 4 signing example
# taken from:
# http://docs.aws.amazon.com/general/latest/gr/sigv4-signed-request-examples.html
# https://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html
# https://www.javaquery.com/2016/01/aws-version-4-signing-process-complete.html
# Lambda API (InvokeAsync)
# http://docs.aws.amazon.com/lambda/latest/dg/API_InvokeAsync.html
# See: http://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html
# This version makes a POST request and passes request parameters
@vaquarkhan
vaquarkhan / getDashboardUrl.js
Created March 24, 2022 05:20 — forked from fideliocc/getDashboardUrl.js
Lambda function hooked to API Gateway GET endpoint to attend: role assuming, registering for QuickSight and dashboard URL resolving for web app embedding feature
'use strict'
// IMPORTANT: Replace environment variables with your current values
const aws = require('aws-sdk')
aws.config.region = process.env.REGION
const sts = new aws.STS({apiVersion: '2011-06-15'})
module.exports.handler = (event, context, callback) => {
console.log('User email', event.queryStringParameters.email)
@vaquarkhan
vaquarkhan / access_log_parser.py
Created March 14, 2022 05:32 — forked from wolf0403/access_log_parser.py
Python re for Nginx access_log
# Modified by adding names from https://github.com/richardasaurus/nginx-access-log-parser/blob/master/main.py
pat = (r''
'(?P<ip>\d+.\d+.\d+.\d+)\s-\s-\s' #IP address
'\[(?P<time>.+)\]\s' #datetime
'"(?P<method>GET|POST)\s(?P<uri>.+)\s(?P<ver>\w+/.+)"\s(?P<status>\d+)\s' #requested file
'(?P<content_length>\d+)\s"(?P<referrer>.+)"\s' #referrer
'"(?P<user_agent>.+)"' #user agent
)
r = re.match(pat, logline)
r.groupdict()

Databricks Delta Lake - A Friendly Intro

This article introduces Databricks Delta Lake. A revolutionary storage layer that brings reliability and improve performance of data lakes using Apache Spark.

First, we'll go through the dry parts which explain what Apache Spark and data lakes are and it explains the issues faced with data lakes. Then it talks about Delta lake and how it solved these issues with a practical, easy-to-apply tutorial.

Introduction to Apache Spark

If you don't know what Spark is, Apache Spark is a large-scale data processing and unified analytics engine for big data and machine learning. It was originally developed at UC Berkeley in 2009. Apache Spark is 100% open source, hosted at the vendor-independent Apache Software Foundation.

@vaquarkhan
vaquarkhan / lambda_function.py
Created March 5, 2022 01:35 — forked from umihico/lambda_function.py
Publish any AWS quicksight dashboards to public with lambda and API gateway
import json
import boto3
"""
API gateway URL example. You have to allow your quicksight domin setting to be accessed from amazonaws.com including subdomains.
https://xxxxxxxx.execute-api.ap-northeast-1.amazonaws.com/xxx-gateway-stage-xxxx/your-lambda-func-name?dashboard_id=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxx
"""
def lambda_handler(event, context):
dashboard_id=event["queryStringParameters"]['dashboard_id']
AWSLambdaBasicExecutionPolicy
---
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
@vaquarkhan
vaquarkhan / avro_rw.py
Created February 13, 2022 15:20 — forked from gamame/avro_rw.py
Python Avro Data Read Write
# Import the schema, datafile and io submodules
# from avro (easy_install avro)
from avro import schema, datafile, io
OUTFILE_NAME = 'sample.avro'
SCHEMA_STR = """{
"type": "record",
"name": "sampleAvro",
"namespace": "AVRO",
@vaquarkhan
vaquarkhan / console.py
Created February 7, 2022 05:29 — forked from weavenet/console.py
Python script to assume STS role and generate AWS console URL.
#!/usr/bin/env python
import getpass
import json
import requests
import sys
import urllib
import boto3
@vaquarkhan
vaquarkhan / bootstrap.json
Created January 30, 2022 02:03 — forked from TimurFayruzov/bootstrap.json
Setup for running a Flink application on EMR
[
{
"Name": "Ship Flink runtime to cluster",
"Path": "s3://<your_bucket>/flink/ship_flink_runtime.sh"
},
{
"Name": "Ship application to cluster",
"Path": "s3://<your_bucket>/flink/ship_app.sh"
}
]