Sammwich

PostgreSQL is Enough

Background and Cron Jobs

You probably don't know how to do Prompt Engineering

(This post could also be titled "Features missing from most LLM front-ends that should exist")

Apologies for the snarky title, but there has been a huge amount of discussion around so called "Prompt Engineering" these past few months on all kinds of platforms. Much of it is coming from individuals who are peddling around an awful lot of "Prompting" and very little "Engineering".

Most of these discussions are little more than users finding that writing more creative and complicated prompts can help them solve a task that a more simple prompt was unable to help with. I claim this is not Prompt Engineering. This is not to say that crafting good prompts is not a difficult task, but it does not involve doing any kind of sophisticated modifications to general "template" of a prompt.

Others, who I think do deserve to call themselves "Prompt Engineers" (and an awful lot more than that), have been writing about and utilizing the rich new eco-system

Exploring Tokenizers from Hugging Face

Hugging Face (HF) has made NLP (Natural Language Processing) a breeze. In this post, we are going to take a look at tokenization using a hands on approach with the help of the Tokenizers library. We are going to load a real world dataset containing 10-K filings of public firms and see how to train a tokenizer from scratch based on the BERT tokenization scheme. In the process we will understand tokenization in detail and some gotchas to keep an eye out for.

Background on NLP (Optional)

If you already have an understanding of the NLP pipeline, you can safely skip this section.

For any NLP task, one of the first steps is pre-processing the data so that it can be fed into our NLP models. For those new to NLP, the general pipeline for any NLP task (text classification, question answering, etc.) is as follows:

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of

If You've Never Used Sklearn's Pipeline Constructor...You're Doing It Wrong

How To Use sklearn Pipelines, FeatureUnions, and GridSearchCV With Your Own Transformers

By Emily Gill and Amber Rivera

What's a Pipeline and Why Use One?

The Pipeline constructor from sklearn allows you to chain transformers and estimators together into a sequence that functions as one cohesive unit. For example, if your model involves feature selection, standardization, and then regression, those three steps, each as it's own class, could be encapsulated together via Pipeline.

	// ==UserScript==
	// @name Prevent link mangling on Google
	// @namespace LordBusiness.LMG
	// @match https://www.google.com/search
	// @grant none
	// @version 1.1
	// @author radiantly
	// @description Prevent google from mangling the link when copying or clicking the link on Firefox
	// ==/UserScript==

	###########################################################
	# How to NEVER use lambdas. An inneficient and yet educa- #
	# tonal [sic] guide to the proper misuse of the lambda #
	# construct in Python 3.x. [DO NOT USE ANY OF THIS EVER] #
	# original by (and apologies to): e000 (13/6/11) #
	# now in Python 3 courtesy of: khuxkm (17/9/20) #
	###########################################################

	## Part 1. Basic LAMBDA Introduction ##
	# If you're reading this, you've probably already read e000's

	""" Shows how to use flask and matplotlib together.

	Shows SVG, and png.
	The SVG is easier to style with CSS, and hook JS events to in browser.

	python3 -m venv venv
	. ./venv/bin/activate
	pip install flask matplotlib
	python flask_matplotlib.py
	"""