CMCDragonkai

Batch size and Image size optimisation for CNNs

Given an initial size of an image. If we drop the square image size by side_p percentage points.

How much does the area percentage drop by?

The formula is as below:

area_p = [side_p * x^2 * (2 - side_p)] / 100

ROC Curves

Receiver Operating Characteristic curves are useful in measuring the effectiveness of a classifier.

To calculate a ROC curve, you first need to prepare dataset and pass it through the classifier.

You will have these 2 vectors:

truth vector with values of 0 (negative-class) and 1 (positive-class).
score vector with values with "smaller" or "negative" values pointing to the negative-class and "larger" or "positive" values pointing to the positive-class

Acquiring the Nix hash of a large file

Normally you could use nix-prefetch-url. However it doesn't work when the files don't fit into memory.

For those cases, you should use:

wget https://url-to-large-file.com/file.zip
nix-hash --flat --base32 --type sha256 ./file.zip

Self-Referential/Recursive Dictionary in Strict Languages

The problem with this in strict languages is that we keep needing lazy annotations.

And the other problem is that this won't be very fast in strict languages.

In strict languages there is probably no caching of the function calls.

What this means is comparing a recursive dictionary to a static dictionary, the recursive dictionary is going to be slower depending on how many function calls there are. In fact the slow down may be linear to the number of self or that function calls.

Singleton List Syntax

Sometimes you want to have a singleton list or an empty list.

In Python:

[1] * (True) # [1]
[1] * (False) # []

	import io
	import numpy as np
	from PIL import Image
	from typing import IO


	def load_image(image_f: IO[bytes]) -> np.ndarray:
	peek = image_f.peek(6)
	if image_f.closed:
	# if peek caused the underlying stream to be closed

	import zipfile
	import numpy as np
	from PIL import Image
	from pycocotools import coco

	# coco releases annotations and images in zip files
	# you can keep the images in the zip archives because they allow random access
	# however the annotations should be extracted and processed as is

	c = coco.COCO("./annotations/instances_train2017.json")

	import numpy as np


	# these functions reify shannon information and shannon entropy
	# the results are in units of "bits" because we are using log with base 2
	# prob has to have the [0.0, 1.0] range
	# probs should be the array of all probabilities of all possible outcomes of a random variable
	# it is assumed that the random variable is discrete

	#!/usr/bin/env python3

	import argparse
	import hashlib


	# see https://docs.aws.amazon.com/cli/latest/topic/s3-config.html
	# for default multipart_threshold and multipart_chunksize
	def md5sum(
	file_like, multipart_threshold=8 * 1024 * 1024, multipart_chunksize=8 * 1024 * 1024

	#!/usr/bin/env bash

	commit_1="$1"
	commit_2="$2"
	branch="$(git rev-parse --abbrev-ref HEAD)"

	if git merge-base --is-ancestor "$commit_1" "$commit_2"; then
	echo "$commit_2"
	exit 0
	fi