Recommendations from others are noted in (parentheses). The rest are my personal recommendations.
- The Pragmatic Programmer - Hunt & Thomas
#http://geekgirl.io/concurrent-http-requests-with-python3-and-asyncio/ | |
Concurrent HTTP Requests with Python3 and asyncio | |
My friend who is a data scientist had wipped up a script that made lots (over 27K) of queries to the Google Places API. The problem was that it was synchronous and thus took over 2.5hours to complete. | |
Given that I'm currently attending Hacker School and get to spend all day working on any coding problems that interests me, I decided to go about trying to optimise it. | |
I'm new to Python so had to do a bit of groundwork first to determine which course of action was best. |
# The MIT License (MIT) | |
# Copyright (c) 2016 Vladimir Ignatev | |
# | |
# Permission is hereby granted, free of charge, to any person obtaining | |
# a copy of this software and associated documentation files (the "Software"), | |
# to deal in the Software without restriction, including without limitation | |
# the rights to use, copy, modify, merge, publish, distribute, sublicense, | |
# and/or sell copies of the Software, and to permit persons to whom the Software | |
# is furnished to do so, subject to the following conditions: | |
# |
from celery import Task | |
from celery.task import task | |
from my_app.models import FailedTask | |
from django.db import models | |
@task(base=LogErrorsTask) | |
def some task(): | |
return result | |
class LogErrorsTask(Task): |
http { | |
log_format bodylog '$remote_addr - $remote_user [$time_local] ' | |
'"$request" $status $body_bytes_sent ' | |
'"$http_referer" "$http_user_agent" $request_time ' | |
'<"$request_body" >"$resp_body"'; | |
lua_need_request_body on; | |
set $resp_body ""; | |
body_filter_by_lua ' |
## Simple Python module to upload files to Google Drive | |
# Needs a file 'client_secrets.json' in the directory | |
# The file can be obtained from https://console.developers.google.com/ | |
# under APIs&Auth/Credentials/Create Client ID for native application | |
# To test usage: | |
# import google_drive_util | |
# google_drive_util.login() | |
# google_drive_util.test() |
There are a number of solutions for installing supervisord and automatically running it on Ubuntu - this is what worked for me (on multiple installations...).
sudo bash < <(curl https://gist.githubusercontent.com/alexhayes/814fd0d0f7020e918a95/raw/full-install.sh)
Producer | |
Setup | |
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1 | |
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3 | |
Single thread, no replication | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 |