Laurian Gridinoc Laurian

ChatGPT Resources

Context

ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?

I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.

Model Architecture

Circumventing Deep Packet Inspection with Socat and rot13

I have a Linux virtual machine inside a customer's private network. For security, this VM is reachable only via VPN + Citrix + Windows + a Windows SSH client (eg PuTTY). I am tasked to ensure this Citrix design is secure, and users can not access their Linux VM's or other resources on the internal private network in any way outside of using Citrix.

The VM can access the internet. This task should be easy. The VM's internet gateway allows it to connect anywhere on the internet to TCP ports 80, 443, and 8090 only. Connecting to an internet bastion box on one of these ports works and I can send and receive clear text data using netcat. I plan to use good old SSH, listening on tcp/8090 on the bastion, with a reverse port forward configured to expose sshd on the VM to the public, to show their Citrix gateway can be circumvented.

Rejected by Deep Packet Inspection

I hit an immediate snag. The moment I try to establish an SSH or SSL connection over o

Hacking the Rectangular Starlink Dishy Cable

These are are some notes I put together on butchering the rectangular dishy cable.

FOLLOW THESE GUIDELINES AT YOUR OWN RISK. I TAKE NO RESPONSIBILITY FOR ANY DAMAGE OR INJURY YOU SUSTAIN FROM FOLLOWING OR NOT FOLLOWING THESE GUIDELINES.

Risk Assessment of GitHub Copilot

0xabad1dea, July 2021

this is a rough draft and may be updated with more examples

GitHub was kind enough to grant me swift access to the Copilot test phase despite me @'ing them several hundred times about ICE. I would like to examine it not in terms of productivity, but security. How risky is it to allow an AI to write some or all of your code?

Ultimately, a human being must take responsibility for every line of code that is committed. AI should not be used for "responsibility washing." However, Copilot is a tool, and workers need their tools to be reliable. A carpenter doesn't have to

GraphQL

GraphQL Schema

type Customer {
  id: ID!
  email: String!
}

Machine learning models installed on macOS is part of Vision.framework

I found these while poking around at the list of open files for photoanalysisd in Activity Monitor.

% ls -lahS /System/Library/Frameworks/Vision.framework/Versions/A/Resources
total 465616
-rw-r--r--    1 root  wheel    40M Dec 13 16:32 landmarks_v2.bin
-rw-r--r--    1 root  wheel    31M Dec 13 16:32 scenenet_sc2.4_sa1.4_ae1.4_r9_opt_int8.espresso.weights
-rw-r--r--    1 root  wheel    29M Dec 13 16:32 scenenet_sc2.4_sa1.4_ae1.6_r13.4_opt_int8_asymetric.espresso.weights

How to calculate the alignment between BERT and spaCy tokens effectively and robustly

site: https://tamuhey.github.io/tokenizations/

Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still require language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm that simplifies calculating correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm. Here are the library and the demo site links:

repo: https://github.com/tamuhey/tokenizations

	/* eslint-disable no-bitwise, no-param-reassign, operator-assignment */

	// Mulberry32 - 32-bit random seed generator
	// Source: https://github.com/bryc/code/blob/master/jshash/PRNGs.md#mulberry32

	/**
	* Function is used to get the same random value every time to ensure
	* data is the same in unit tests and screenshot tests for storybook
	* @param seed
	* @returns random number based on input seed

	MIT License

	Copyright (c) 2021 Daniel Ethridge

	Permission is hereby granted, free of charge, to any person obtaining a copy
	of this software and associated documentation files (the "Software"), to deal
	in the Software without restriction, including without limitation the rights
	to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
	copies of the Software, and to permit persons to whom the Software is
	furnished to do so, subject to the following conditions:

	#!/usr/bin/env bash
	START_TIME=$SECONDS
	set -e

	echo "-----START GENERATING HLS STREAM-----"
	# Usage create-vod-hls.sh SOURCE_FILE [OUTPUT_NAME]
	[[ ! "${1}" ]] && echo "Usage: create-vod-hls.sh SOURCE_FILE [OUTPUT_NAME]" && exit 1

	# comment/add lines here to control which renditions would be created
	renditions=(