translation -> english model -> backtranslation
Opus Marian model works Helsinki-NLP/opus-mt-pl-en
you can
translation -> english model -> backtranslation
Opus Marian model works Helsinki-NLP/opus-mt-pl-en
you can
Assume using ChatGPT 30 days a month, with 100 queries per day.
Mean tokens per query is upper limit (we estimated it using a bunch of queries), in practice this would be lower, especially because it can be bounded by asking ChatGPT to only answer with fixed number of sentences
api_cost_per_token = 2e-6| from rust_functions import tokenize_python_code | |
| def test_simple_input(): | |
| example_code = """ | |
| def foo(): | |
| return x + 1 | |
| """ | |
| expected_tokens = ["def", "foo", "(", ")", "return", "x", "+", "1"] | |
| assert tokenize_rust(example_code.strip()) == expected_tokens |
| """ | |
| Query: | |
| Write a gradio app that shows results of searching in a list of texts: | |
| - tokenizing texts with nltk | |
| - using rank_bm25 library | |
| - displaying results as dataframe under the search box | |
| Response: | |
| Here is an example of a gradio app that uses the rank_bm25 library to search through a list of texts, tokenizes the texts and queries using nltk, and shows the results as a dataframe under the search box: | |
| """ |
| sudo apt-get install -y libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6 icu-devtools libicu-dev | |
| wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | |
| bash Miniconda3-latest-Linux-x86_64.sh | |
| sudo apt-get install npm | |
| sudo npm i elasticdump | |
| curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - | |
| sudo add-apt-repository \ | |
| "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ |
| import ktrain | |
| zsl = ktrain.text.ZeroShotClassifier('huggingface/prunebert-base-uncased-6-finepruned-w-distil-mnli') |
| def build_segmentation_model( | |
| input_shape, | |
| n_classes, | |
| base_block_size=BASE_BLOCK_SIZE, | |
| base_dropout_rate=BASE_DROPOUT_RATE, | |
| activation=ACTIVATION | |
| ): | |
| # Build U-Net segmentation_model | |
| inputs = layers.Input(input_shape) | |
| s = layers.Lambda(lambda x: x - 0.5) (inputs) |
| apt-get install vim git wget |
| # Example NeoMutt config file for the index-color feature. | |
| # Entire index line | |
| color index white black '.*' | |
| # Author name, %A %a %F %L %n | |
| # Give the author column a dark grey background | |
| color index_author default color234 '.*' | |
| # Highlight a particular from (~f) | |
| color index_author brightyellow color234 '~fRay Charles' | |
| # Message flags, %S %Z |
| import torch | |
| import ot | |
| from sklearn import metrics | |
| roberta = torch.hub.load('pytorch/fairseq', 'roberta.large') | |
| roberta.eval() # disable dropout (or leave in train mode to finetune) | |
| def get_roberta_features(text): |