Alan Williams alanwill

I'm now working on big data processing with Pandas at scale, as a lightweight alternative to Spark. Fortunately, the Apache Arrow project brings with it an excellent and very fast Parquet reader and writer.

With the current push to ARM in both personal computers and the data center, I was curious to check the performance of my code on ARM - running on AWS' homegrown Graviton2 processor. Their c6g instance types are 20% cheaper than the equivalent Intel-based c5's, while promising faster performance. If that's the future, why not start getting ready now?

While there are already Python wheels for NumPy and Pandas, there is no official build yet for PyArrow. There's a pull request in the works,

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname

	# An example to get the remaining rate limit using the Github GraphQL API.

	import requests

	headers = {"Authorization": "Bearer YOUR API KEY"}


	def run_query(query): # A simple function to use requests.post to make the API call. Note the json= section.
	request = requests.post('https://api.github.com/graphql', json={'query': query}, headers=headers)
	if request.status_code == 200:

	import boto3
	import base64

	if __name__ == '__main__':
	session = boto3.session.Session()

	kms = session.client('kms')

	encrypted_password = 'AQECAHjgTiiE7TYRGp5Irf8jQ3HzlaQaHGYgsUJDaavnHcFm0gAAAGswaQYJKoZIhvcNAQcGoFwwWgIBADBVBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDDwxVQuG0oVwpkU7nQIBEIAoVGk1/wpserb+GVUOzE7PiL/Nr9fTDFKZfpKpF0ip2ct4B2q0Wn6ZZw=='
	binary_data = base64.b64decode(encrypted_password)