Olve Hansen olvesh

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.

How to push tagged Docker releases to Google Artifact Registry with a GitHub Action

Here's how I configured a GitHub Action so that a new version issued by GitHub's release interface will build a Dockerfile, tag it with the version number and upload it to Google Artifact Registry.

Before you attempt the steps below, you need the following:

A GitHub repository that contains a working Dockerfile
The Google Cloud SDK tool gcloud installed and authenticated

Create a Workload Identity Federation

Google Cloud service account to AWS Role federation

inspired by https://github.com/shrikant0013/gcp-aws-webidentityfederation

create an AWS Role configured for Web Identity federation using Cognito or any OpenID provider
select Google as the Identity provider in the wizard
set the audience to a dummy value and do not add any additional conditions in the setup wizard. We will edit the trust policy after completing the wizard.
assign any permissions needed to the role
read up on "Available keys for AWS web identity federation" at

Monitoring

Some thoughts on monitoring.

Source Documents:

Configure Git to use a proxy

In Brief

You may need to configure a proxy server if you're having trouble cloning or fetching from a remote repository or getting an error like unable to access '...' Couldn't resolve host '...'.

Consider something like:

	# based on the "patch deployment" strategy in this comment:
	# https://github.com/kubernetes/kubernetes/issues/13488#issuecomment-372532659
	# requires jq

	# $1 is a valid namespace
	function refresh-all-pods() {
	echo
	DEPLOYMENT_LIST=$(kubectl -n $1 get deployment -o json\|jq -r .items[].metadata.name)
	echo "Refreshing pods in all Deployments"
	for deployment_name in $DEPLOYMENT_LIST ; do

	options:
	docker: true
	pipelines:
	branches:
	master:
	- step:
	image: google/cloud-sdk:latest
	name: Deploy to production
	deployment: production
	caches:

	If you redis initial replication fails with error like
	"5101:M 20 Feb 18:14:29.130 # Client id=4500196 addr=71.459.815.760:43872 fd=533 name= age=127 idle=127 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=13997 oll=1227 omem=192281275 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.

	means that slave buffer is not enough and you should increase it (at master!) with command like
	redis-cli config set client-output-buffer-limit "slave 836870912 836870912 0"

	more info: https://redis.io/topics/clients

	package main

	import (
	"context"
	"log"
	"net/http"
	"os"
	"os/signal"
	"time"
	)

	package main

	import (
	"encoding/json"
	"fmt"
	"log"
	"net/http"
	"os"
	)