-
Go to https://developer.apple.com/downloads/index.action and search for "Command line tools" and choose the one for your Mac OSX
-
Go to http://brew.sh/ and enter the one-liner into the Terminal, you now have
brew
installed (a better Mac ports) -
Install transmission-daemon with
brew install transmission
-
Copy the startup config for launchctl with
ln -sfv /usr/local/opt/transmission/*.plist ~/Library/LaunchAgents
version: '2' | |
services: | |
minio: | |
restart: always | |
image: docker.io/bitnami/minio:2021 | |
ports: | |
- '9000:9000' | |
environment: | |
- MINIO_ROOT_USER=miniokey |
# suppose my data file name has the following format "datatfile_YYYY_MM_DD.csv"; this file arrives in S3 every day. | |
file_suffix = "{{ execution_date.strftime('%Y-%m-%d') }}" | |
bucket_key_template = 's3://[bucket_name]/datatfile_{}.csv'.format(file_suffix) | |
file_sensor = S3KeySensor( | |
task_id='s3_key_sensor_task', | |
poke_interval=60 * 30, # (seconds); checking file every half an hour | |
timeout=60 * 60 * 12, # timeout in 12 hours | |
bucket_key=bucket_key_template, | |
bucket_name=None, | |
wildcard_match=False, |
from airflow import DAG | |
from airflow.operators.sensors import S3KeySensor | |
from airflow.operators import BashOperator | |
from datetime import datetime, timedelta | |
yday = datetime.combine(datetime.today() - timedelta(1), | |
datetime.min.time()) | |
default_args = { | |
'owner': 'msumit', |
with DAG(**dag_config) as dag: | |
# Declare pipeline start and end task | |
start_task = DummyOperator(task_id='pipeline_start') | |
end_task = DummyOperator(task_id='pipeline_end') | |
for account_details in pipeline_config['task_details']['accounts']: | |
#Declare Account Start and End Task | |
if account_details['runable']: | |
acct_start_task = DummyOperator(task_id=account_details['account_id'] + '_start') | |
acct_start_task.set_upstream(start_task) |
Setup Parquet-tools
brew install parquet-tools
Help parquet-tools -h
parquet-tools rowcount part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet
parquet-tools head -n 1 part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet
parquet-tools meta part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet
Context The Integration team has deployed a cron job to dump a CSV file containing all the new Shopify configurations daily at 2 AM UTC. The task will be to build a daily pipeline that will :
download the CSV file from https://alg-data-public.s3.amazonaws.com/[YYYY-MM-DD].csv, filter out each row with empty application_id, add a has_specific_prefix column set to true if the value of index_prefix differs from shopify_ else to false load the valid rows to a Postresql instance The pipeline should process files from 2019-04-01 to 2019-04-07.
play.modules.enabled += "com.samklr.KamonModule" | |
kamon { | |
environment { | |
service = "my-svc" | |
} | |
jaeger { |
A running example of the code from:
- http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang
- http://nesv.github.io/golang/2014/02/25/worker-queues-in-go.html
This gist creates a working example from blog post, and a alternate example using simple worker pool.
TLDR: if you want simple and controlled concurrency use a worker pool.
/* | |
* Licensed to the Apache Software Foundation (ASF) under one or more | |
* contributor license agreements. See the NOTICE file distributed with | |
* this work for additional information regarding copyright ownership. | |
* The ASF licenses this file to You under the Apache License, Version 2.0 | |
* (the "License"); you may not use this file except in compliance with | |
* the License. You may obtain a copy of the License at | |
* | |
* http://www.apache.org/licenses/LICENSE-2.0 | |
* |