gfranxman’s gists

gfranxman / parallel_execution_serial_return_demo.py

Last active April 30, 2024 17:12

Parallel execution of tasks with results in submitted order. Execution model for STT and TTS

	import asyncio
	import random


	tasks = []

	DEBUG=False
	async def mock_event_generator():
	"""
	Mock event generator for parallel processing.

gfranxman / prepare-commit-msg

Last active August 18, 2023 21:14

Git hook that uses llm to prepare commit messages as release notes.

	#!/bin/sh
	# https://gist.github.com/gfranxman/e9d4a523397535c6dd82d1445c246b8d/edit
	# 2023-08-18

	COMMIT_MSG_FILE=$1
	COMMIT_SOURCE=$2
	SHA1=$3

	REL_NOTES_RAW=`git diff --staged \| llm -s "release notes" 2>/dev/null`
	REL_NOTES_RAW=$(echo "$REL_NOTES_RAW" \| sed 's/^#/* /')

gfranxman / README

Created July 12, 2023 13:40

Airflow: Fix for DAG not found in serialized_dag table

	While rapidly starting and stoping and changing dags during development, you may run into errors look like this for one or more of the dags:

	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:54,767] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:55,800] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-full' not found in serialized_dag table
	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:55,802] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:56,577] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:56,579] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-full' not found in serialized_dag table

	This

gfranxman / clone_objects_example.py

Created March 1, 2023 21:26

Cloning Django objects pattern

	def model_to_dict(instance, exclude: list = None, modify: dict = None):
	excluded_fields = ["id", "pk"]
	if exclude:
	excluded_fields.extend(exclude)

	defaults = dict(
	[
	(fld.name, getattr(instance, fld.name))
	for fld in instance._meta.fields
	if fld.name not in excluded_fields

gfranxman / gist:109f3e1df0916c155a6b0ce49c848a6a

Created June 17, 2022 19:24

skip all airflow catchup runs.



	def abort_on_catchup(**context):
	"""
	This function determines whether to continue to the `next_task` or skip to 'end'

	using the "next" schedule interval.
	"""
	# "Catchups" during this window are allowed.
	# This is just to cover for late startingjobs.

gfranxman / credset

Last active September 9, 2021 19:55

AWS cred juggling, credset command I've been using for years and awsenv which keeps everything off the filesystem and only in memory

gfranxman / sec_policy_middleware.py

Last active February 12, 2021 15:41

POC suggestion for coarse grained view security policies -- BAST Pructise?

	import re
	from logging import getLogger

	from django.conf import settings
	from django.http.response import HttpResponseForbidden

	logger = getLogger(__file__)


	def is_authenticated(r):

gfranxman / pyspark_tricks.py

Created October 16, 2020 17:46

Pyspark / DataBricks DataFrame size estimation

	from pyspark.serializers import PickleSerializer, AutoBatchedSerializer
	def _to_java_object_rdd(rdd):
	""" Return a JavaRDD of Object by unpickling
	It will convert each Python object into Java object by Pyrolite, whenever the
	RDD is serialized in batc h or not.
	"""
	rdd = rdd._reserialize(AutoBatchedSerializer(PickleSerializer()))
	return rdd.ctx._jvm.org.apache.spark.mllib.api.python.SerDe.pythonToJava(rdd._jrdd, True)

	def estimate_df_size(df):

gfranxman / request_as_curl.py

Created July 23, 2020 17:27

	def request_as_curl(request):
	"""
	construct a curl command from a (failed) request.
	"""
	url = request.url
	headers = request.headers
	data = request.body.decode("utf-8")
	method = request.method

	command = "curl -v -H {headers} {data} -X {method} {uri}"

gfranxman / safepath.py

Last active February 18, 2022 23:28

Python safepath to replace os.path.join when you don't want the path components to tmp outside a root path preventing path traversal.

	def safepath_join(head, *tail):
	"""
	combines path parts like os.path.join, but ensures the resultant directory
	doesn't step outside of the path given as the root.
	"""
	root = os.path.abspath(head)
	p = os.path.normpath(os.path.join(head, *tail))
	if not p.startswith(root + os.path.sep):
	raise ValueError(f"{p} steps outside {root}")
	return p

	While rapidly starting and stoping and changing dags during development, you may run into errors look like this for one or more of the dags:

	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:54,767] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:55,800] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-full' not found in serialized_dag table
	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:55,802] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:56,577] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
	dag_talk_examples-airflow-scheduler-1 \| [2023-07-11 21:07:56,579] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-full' not found in serialized_dag table

	This

Glenn Franxman gfranxman