Skip to content

Instantly share code, notes, and snippets.

View holypriest's full-sized avatar

Marcelo Rabello Rossi holypriest

View GitHub Profile
@holypriest
holypriest / moving_threads.py
Created December 10, 2020 03:43
Threads won't get you far...
from multiprocessing import Pool
import boto3
def move_to_glacier(f):
sess = boto3.session.Session(region_name='us-east-1')
s3res = sess.resource('s3')
copy_source = {
@holypriest
holypriest / moving_slowly.py
Created December 10, 2020 02:13
Don't even start...
import boto3
sess = boto3.session.Session(region_name='us-east-1')
s3res = sess.resource('s3')
gazillion_files = [('my-source-bucket', 'file-000000000001'), ...]
for f in gazillion_files:
copy_source = {
'Bucket': f[0],
@holypriest
holypriest / moving_files.py
Created December 10, 2020 02:00
How to move files to Glacier leveraging Spark distributed capabilities
def move_file_to_glacier(list_of_rows):
sess = boto3.session.Session(region_name='us-east-1')
s3res = sess.resource('s3')
for row in list_of_rows:
copy_source = {
'Bucket': row[0],
'Key': row[1]
}
@holypriest
holypriest / namespace.yaml
Created July 20, 2020 04:22
Declaration of the airflow-on-k8s namespace
kind: Namespace
apiVersion: v1
metadata:
name: airflow-on-k8s
@holypriest
holypriest / serviceaccount-pods.yaml
Last active July 20, 2020 04:06
Example of a ServiceAccount for the pods, associating them with an AWS IAM role
apiVersion: v1
kind: ServiceAccount
metadata:
name: tasks-sva
namespace: airflow-on-k8s
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<your-aws-account-id>:role/AirflowK8SRole
@holypriest
holypriest / assume-role-policy.json
Last active July 20, 2020 04:10
AssumeRole policy to allow Kubernetes ServiceAccounts to access AWS resources
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<your-aws-account>:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/<oidc-of-your-eks-cluster>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
@holypriest
holypriest / serviceaccount-scheduler.yaml
Created July 17, 2020 02:32
ServiceAccount configuration for Airflow scheduler
kind: ServiceAccount
apiVersion: v1
metadata:
name: scheduler-sva
namespace: airflow-on-k8s
@holypriest
holypriest / service-webserver.yaml
Created July 17, 2020 02:29
Service configuration for Airflow webserver
kind: Service
apiVersion: v1
metadata:
name: webserver-svc
namespace: airflow-on-k8s
spec:
type: ClusterIP
selector:
tier: airflow
component: webserver
@holypriest
holypriest / deployment-scheduler.yaml
Last active May 26, 2021 10:03
Deployment configuration for the Airflow scheduler
kind: Deployment
apiVersion: apps/v1
metadata:
name: airflow-scheduler
namespace: airflow-on-k8s
spec:
replicas: 1
selector:
matchLabels:
tier: airflow
@holypriest
holypriest / efs-sc.yaml
Last active July 20, 2020 04:36
StorageClass configuration for an AWS EFS-based storage
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: efs-sc
provisioner: efs.csi.aws.com