translation -> english model -> backtranslation
Opus Marian model works Helsinki-NLP/opus-mt-pl-en
you can
translation -> english model -> backtranslation
Opus Marian model works Helsinki-NLP/opus-mt-pl-en
you can
Assume using ChatGPT 30 days a month, with 100 queries per day.
Mean tokens per query is upper limit (we estimated it using a bunch of queries), in practice this would be lower, especially because it can be bounded by asking ChatGPT to only answer with fixed number of sentences
api_cost_per_token = 2e-6
from rust_functions import tokenize_python_code | |
def test_simple_input(): | |
example_code = """ | |
def foo(): | |
return x + 1 | |
""" | |
expected_tokens = ["def", "foo", "(", ")", "return", "x", "+", "1"] | |
assert tokenize_rust(example_code.strip()) == expected_tokens |
""" | |
Query: | |
Write a gradio app that shows results of searching in a list of texts: | |
- tokenizing texts with nltk | |
- using rank_bm25 library | |
- displaying results as dataframe under the search box | |
Response: | |
Here is an example of a gradio app that uses the rank_bm25 library to search through a list of texts, tokenizes the texts and queries using nltk, and shows the results as a dataframe under the search box: | |
""" |
sudo apt-get install -y libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6 icu-devtools libicu-dev | |
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | |
bash Miniconda3-latest-Linux-x86_64.sh | |
sudo apt-get install npm | |
sudo npm i elasticdump | |
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - | |
sudo add-apt-repository \ | |
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \ |
import ktrain | |
zsl = ktrain.text.ZeroShotClassifier('huggingface/prunebert-base-uncased-6-finepruned-w-distil-mnli') |
def build_segmentation_model( | |
input_shape, | |
n_classes, | |
base_block_size=BASE_BLOCK_SIZE, | |
base_dropout_rate=BASE_DROPOUT_RATE, | |
activation=ACTIVATION | |
): | |
# Build U-Net segmentation_model | |
inputs = layers.Input(input_shape) | |
s = layers.Lambda(lambda x: x - 0.5) (inputs) |
apt-get install vim git wget |
# Example NeoMutt config file for the index-color feature. | |
# Entire index line | |
color index white black '.*' | |
# Author name, %A %a %F %L %n | |
# Give the author column a dark grey background | |
color index_author default color234 '.*' | |
# Highlight a particular from (~f) | |
color index_author brightyellow color234 '~fRay Charles' | |
# Message flags, %S %Z |
import torch | |
import ot | |
from sklearn import metrics | |
roberta = torch.hub.load('pytorch/fairseq', 'roberta.large') | |
roberta.eval() # disable dropout (or leave in train mode to finetune) | |
def get_roberta_features(text): |