Skip to content

Instantly share code, notes, and snippets.

View a-y-khan's full-sized avatar

Ayla Khan a-y-khan

  • Salt Lake City, UT, USA
View GitHub Profile
@nathairtras
nathairtras / callback_retry_clear_subdag.py
Last active May 28, 2021 15:47
Callback to clear Airflow SubDag on retry
import logging
from airflow.models import DagBag
def callback_subdag_clear(context):
"""Clears a subdag's tasks on retry."""
dag_id = "{}.{}".format(
context['dag'].dag_id,
context['ti'].task_id,
)
execution_date = context['execution_date']

Introduction to Installing PySpark & Jupyter Notebooks on Mac OSX

Spark is used for large-scale distributed data processing. It has become the go to standard for a lot of companies in the technology industry. The Spark framework is capable of computing at high speeds, processing massive amounts of resilient sets of data, and it does it all while computing in a highly distributed manner.

Jupyter Notebooks, commenly called "Jupyter", has been a popular application within the Data Science community for many years.   It enables you to edit, run, and share Python code into a web view. It allows you to execute your code in a step by step process in order to share parts of your code in a very flexible way for data analysis work. This is why Jupyter is a great tool to prototype in, and should be used at all companies that are data centric.

Why use PySpark in a Jupyter Notebook?

Most data engineers argue that the Scala programming language version is more performant than Python version, and it is. Howev

# Credit for this: Nicholas Swift
# as found at https://medium.com/@nicholas.w.swift/easy-a-star-pathfinding-7e6689c7f7b2
from warnings import warn
import heapq
class Node:
"""
A node class for A* Pathfinding
"""
# -*- coding: utf-8 -*-
""" Deletes all tweets below a certain retweet threshold.
"""
import tweepy
from datetime import datetime
# Constants
CONSUMER_KEY = ''