Skip to content

Instantly share code, notes, and snippets.

View aahmd's full-sized avatar
🎯
Focusing

Adam Ahmed aahmd

🎯
Focusing
  • Machine Learning/Data Engineer
  • Washington D.C.
View GitHub Profile
@aahmd
aahmd / gist:87389b2fe5b5cae6f43226748fad5988
Last active February 23, 2024 00:26
pyspark queries
# for records grouped by colA, colB, and colC return a df where colD is unique:
import pyspark.sql.functions as fn
df.groupBy('colA', 'colB', 'colC').agg(fn.collect_list('colD').alias('newColD'), fn.count('colD').alias('count').filter(fn.col('count') > 1))
df.select(fn.explode('newColD').alias('colDUniques')).show()
# given a subset of columns, return a dataframe where duplciates exists for these columns:
@aahmd
aahmd / gist:fa3ca1c7acb62ddcf20b9f517ddea414
Last active September 3, 2018 16:17
pass_request_to_new_method
from scrapy.http import Request
def parse_final_page(self, response):
# do scraping here:
def get_next_page(self, response, url_append):
new_url = response.url + url_append
req = Request(
url=new_url,
from scrapy.spiders import CrawlSpider
from scrapy.loader.processors import Identity, TakeFirst
import logging
logger = logging.getLogger(__name__)
@aahmd
aahmd / file_scan.py
Last active February 24, 2019 03:17
comparison of search techniques
import os
from glob import glob
from pathlib import Path
import time
start_path = Path(os.path.expanduser("~"))
"""Glob."""
def list_comp_and_glob():
return [i for i in glob(str(start_path) + '/**/*.mkv', recursive=True)]
@aahmd
aahmd / player.py
Created February 24, 2019 06:00
video player using tkinter + vlc python bindings
#! /usr/bin/python
# -*- coding: utf-8 -*-
"""vlc media player; based off example in vlc repo:
`http://git.videolan.org/?p=vlc/bindings/python.git;a=commit;h=HEAD`
See also:
`http://infohost.nmt.edu/tcc/help/pubs/tkinter/web/menu.html`
`http://infohost.nmt.edu/tcc/help/pubs/tkinter/web/menu-coptions.html`