squito’s gists

squito / classpath_checks.scala

Created April 11, 2025 20:30

Check for duplicate Jars on your classpath

	import scala.util.matching.Regex
	import collection.mutable. { HashMap, MultiMap, Set, HashSet}

	/**
	* Helper function, just to see if I got the regex correct
	*/
	def checkMatch(p: String, t: String): Option[Seq[String]] = {
	p.r.findFirstMatchIn(t). map { m =>
	(1 to m.groupCount).map { m.group(_) }
	}

squito / test_loop.py

Created January 31, 2023 20:42

Pytest test looper

	#!/usr/bin/env python

	## Makes it easy to run tests in a loop
	## Just a small bit of automation around something like
	## ls quanta_cache.py test/test_quanta_cache.py \| entr -r python -m pytest test/test_quanta_cache.py
	## but that is just complex enough I would never remember

	## If you want to test `jira.py` in a loop, run:
	##
	## test/test_loop.py --test jira.py

squito / AccessControlCheck.scala

Last active May 1, 2021 03:27

Nested UGIs , doAs, proxy users

	import java.security.PrivilegedExceptionAction
	import org.apache.hadoop.security.UserGroupInformation
	import org.apache.hadoop.fs.Path
	import org.apache.hadoop.fs.FileSystem
	import org.apache.hadoop.conf.Configuration

	object AccessControlCheck {

	val privilegedPath = "/some/path/with/limited/access"

squito / Test.java

Last active June 24, 2019 15:35

Interrupts, joins, OOMs, and uncaught exception handlers

	import java.util.ArrayList;

	public class Test implements Runnable {

	public static class OOMer implements Runnable {
	public void run() {
	System.out.println("Starting oomer");
	ArrayList<byte[]> stuff = new ArrayList<>();
	while (true) {
	stuff.add(new byte[100000000]);

squito / shuffle_corrupt_test.scala

Last active April 23, 2019 14:21

	// run with "--conf spark.cleaner.referenceTracking=false"
	// spin up our full set of executors
	sc.parallelize(1 to 100, 100).map { x => Thread.sleep(1000); x}.collect()

	def getLocalDirs(): Array[String] = {
	val clz = Class.forName("org.apache.spark.util.Utils")
	val conf = org.apache.spark.SparkEnv.get.conf
	val method = clz.getMethod("getConfiguredLocalDirs", conf.getClass())
	method.setAccessible(true)
	method.invoke(null, conf).asInstanceOf[Array[String]]

squito / SlowIterator.scala

Last active September 14, 2018 20:41

SlowIterationLogger

	// This is an example iterator that runs slowly, to demonstrate how SlowLoggingIterator works
	// it just iterates over a range of ints, but puts in occassional delays, to simulate an iterator that is
	// actually doing something more complex, eg. fetching records from a DB which is occassionaly slow.

	class SlowIterator(start: Int, end: Int, delay: Long, every: Int) extends java.util.Iterator[Integer] {
	val underlying = (start until end).toIterator

	def hasNext(): Boolean = underlying.hasNext

	def next(): Integer = {

squito / tl.out

Last active May 1, 2018 16:13

InheritableThreadLocals

	creating a new thread pool thread 0 : tl = null
	creating a new thread pool thread 1 : tl = null
	creating a new thread pool thread 2 : tl = null
	creating a new thread pool thread 3 : tl = null
	creating a new thread pool thread 4 : tl = null
	creating a new thread pool thread 5 : tl = null
	creating a new thread pool thread 6 : tl = null
	creating a new thread pool thread 7 : tl = null
	creating a new thread pool thread 8 : tl = null
	creating a new thread pool thread 9 : tl = null

squito / gist:de73fbd0b9c00961377068b91283e04c

Created January 16, 2018 19:00

SPARK-23044 session

	# filters out "apachespark" choice now
	# notice what happens with
	# * bad input ("floop")
	# * real user that can't be assigned a jira ("fakeimran")
	# * selection from list ("imran")
	# * arbitrary user that can be assigned ("vanzin")


	In [1]: from merge_spark_pr import *

squito / gist:ccd56fefefe4dfef808dc21196a89385

Created August 28, 2017 18:05

random example of exploring spark internals w/ reflection while debugging cluster config

	// paste in
	// https://gist.githubusercontent.com/squito/329d9cd82a21f645d592/raw/ca9217708293d6fe69ed6638f4feeb3038f8fd9c/reflector.scala

	val xCat = spark.sessionState.catalog.externalCatalog

	val catClient = get(xCat, "client")
	catClient.reflectMethod("getConf", Seq("hive.metastore.uris", ""))
	import org.apache.hadoop.fs.{FileSystem, Path}
	val fs = FileSystem.get(sc.hadoopConfiguration)

squito / LA_output.txt

Created August 24, 2017 16:47

Java timestamp mechanics

	> scala -Duser.timezone=America/Los_Angeles timestamp.scala
	Defaul TZ: America/Los_Angeles
	hours in UTC: 8
	TZ offset in hours: -8

Imran Rashid squito