Jakub Bartczuk lambdaofgod

pl_en_translation

translation -> english model -> backtranslation

Opus Marian model works Helsinki-NLP/opus-mt-pl-en

you can

chatgpt_experiment

ChatGPT API monthly cost estimate

Assume using ChatGPT 30 days a month, with 100 queries per day.

Mean tokens per query is upper limit (we estimated it using a bunch of queries), in practice this would be lower, especially because it can be bounded by asking ChatGPT to only answer with fixed number of sentences

api_cost_per_token = 2e-6

	from rust_functions import tokenize_python_code

	def test_simple_input():
	example_code = """
	def foo():
	return x + 1
	"""
	expected_tokens = ["def", "foo", "(", ")", "return", "x", "+", "1"]
	assert tokenize_rust(example_code.strip()) == expected_tokens

	"""
	Query:
	Write a gradio app that shows results of searching in a list of texts:
	- tokenizing texts with nltk
	- using rank_bm25 library
	- displaying results as dataframe under the search box

	Response:
	Here is an example of a gradio app that uses the rank_bm25 library to search through a list of texts, tokenizes the texts and queries using nltk, and shows the results as a dataframe under the search box:
	"""

	sudo apt-get install -y libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6 icu-devtools libicu-dev
	wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
	bash Miniconda3-latest-Linux-x86_64.sh

	sudo apt-get install npm
	sudo npm i elasticdump

	curl -fsSL https://download.docker.com/linux/ubuntu/gpg \| sudo apt-key add -
	sudo add-apt-repository \
	"deb [arch=amd64] https://download.docker.com/linux/ubuntu \

	def build_segmentation_model(
	input_shape,
	n_classes,
	base_block_size=BASE_BLOCK_SIZE,
	base_dropout_rate=BASE_DROPOUT_RATE,
	activation=ACTIVATION
	):
	# Build U-Net segmentation_model
	inputs = layers.Input(input_shape)
	s = layers.Lambda(lambda x: x - 0.5) (inputs)

	# Example NeoMutt config file for the index-color feature.

	# Entire index line
	color index white black '.*'
	# Author name, %A %a %F %L %n
	# Give the author column a dark grey background
	color index_author default color234 '.*'
	# Highlight a particular from (~f)
	color index_author brightyellow color234 '~fRay Charles'
	# Message flags, %S %Z

	import ktrain
	zsl = ktrain.text.ZeroShotClassifier('huggingface/prunebert-base-uncased-6-finepruned-w-distil-mnli')

	import torch
	import ot
	from sklearn import metrics


	roberta = torch.hub.load('pytorch/fairseq', 'roberta.large')
	roberta.eval() # disable dropout (or leave in train mode to finetune)


	def get_roberta_features(text):