Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import re | |
| # Method build_tokenizer from _VectorizerMixin mixin from which classes HashingVectorizer, CountVectorizer and | |
| # TfidfVectorizer (through CountVectorizer) are partially inherited. | |
| # It is used to split a string into a sequence of tokens (only if analyzer == 'word'). | |
| def build_tokenizer(token_pattern: str = r"(?u)\b\w\w+\b"): | |
| """ | |
| Return a function that splits a string into a sequence of tokens. | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import numpy as np | |
| from tqdm import trange | |
| from collections import defaultdict | |
| from typing import Dict, Tuple, DefaultDict | |
| def get_matrix_idx_to_value_dict( | |
| matrix: np.ndarray, | |
| verbose: bool = True, | |
| ) -> DefaultDict[Tuple[int, int], int]: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| - repo: local | |
| hooks: | |
| - id: unittest | |
| name: unittest | |
| entry: python -m unittest discover | |
| language: python | |
| always_run: true | |
| pass_filenames: false |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| FROM cr.msk.sbercloud.ru/aicloud-jupyter/jupyter-cuda10.1-tf2.2.0-mlspace:latest | |
| MAINTAINER Dani El-Ayyass <[email protected]> | |
| USER root | |
| # Docker | |
| # Set up the repository | |
| RUN apt-get update | |
| RUN apt-get -y install apt-transport-https ca-certificates curl gnupg lsb-release |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def humanize_bytes(bytes: int, suffix: str = "B") -> str: | |
| """ | |
| Convert bytes to human readable format. | |
| :param int bytes: number of bytes. | |
| :param str suffix: bytes suffix. | |
| :return: human readable size. | |
| :rtype: str | |
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pymorphy2 | |
| class Lemmatizer: | |
| """ | |
| Pymorphy2 lemmatizer class. | |
| """ | |
| def __init__(self): | |
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from sklearn.feature_extraction.text import TfidfVectorizer | |
| # data | |
| corpus = [ | |
| 'This is the first document.', | |
| 'This document is the second document.', | |
| 'And this is the third one.', | |
| 'Is this the first document?', | |
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from sklearn.feature_extraction.text import TfidfVectorizer | |
| # pymorphy2 lemmatizer | |
| import pymorphy2 | |
| class Lemmatizer: | |
| def __init__(self): | |
| self.morph = pymorphy2.MorphAnalyzer() | |
| def __call__(self, x: str) -> str: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from itertools import permutations | |
| import numpy as np | |
| from sklearn.metrics import accuracy_score | |
| np.random.seed(42) | |
| y_true = np.random.randint(low=0, high=3, size=100) | |
| noize_mapper = {0: 1, 1: 2, 2: 0} |