Dani El-Ayyass dayyass

🚀

Rocket Science

Head of AI Transformation @ Social Discovery Group, ex-GigaChat, tg: @cats_shredinger

228 followers · 142 following

Social Discovery Group
Dubai, UAE
https://t.me/cats_shredinger
in/dayyass
@d_ayyass

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

dayyass / muse_tokenize.ipynb

Last active September 5, 2023 08:19

How to get and use tokenizer from "universal-sentence-encoder-multilingual".

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

dayyass / sklearn_tokenizer.py

Created June 17, 2021 13:54

sklearn tokenizer used in HashingVectorizer, CountVectorizer and TfidfVectorizer.

	import re


	# Method build_tokenizer from _VectorizerMixin mixin from which classes HashingVectorizer, CountVectorizer and
	# TfidfVectorizer (through CountVectorizer) are partially inherited.
	# It is used to split a string into a sequence of tokens (only if analyzer == 'word').
	def build_tokenizer(token_pattern: str = r"(?u)\b\w\w+\b"):
	"""
	Return a function that splits a string into a sequence of tokens.

dayyass / matrix_to_dict.py

Created June 17, 2021 18:29

Convert matrix into a dictionary whose keys are the row and column indices of the matrix and values correspond to the matrix values for given key indices.

	import numpy as np
	from tqdm import trange
	from collections import defaultdict
	from typing import Dict, Tuple, DefaultDict


	def get_matrix_idx_to_value_dict(
	matrix: np.ndarray,
	verbose: bool = True,
	) -> DefaultDict[Tuple[int, int], int]:

dayyass / .pre-commit-config.yaml

Last active September 4, 2021 11:43

Pre-commit local python unittests. https://stackoverflow.com/questions/59358182/python-pre-commit-unittest-faild

	- repo: local
	hooks:
	- id: unittest
	name: unittest
	entry: python -m unittest discover
	language: python
	always_run: true
	pass_filenames: false

dayyass / Dockerfile

Last active July 19, 2021 10:06

jupyter-cuda10.1-tf2.2.0-docker-mlspace

	FROM cr.msk.sbercloud.ru/aicloud-jupyter/jupyter-cuda10.1-tf2.2.0-mlspace:latest
	MAINTAINER Dani El-Ayyass <[email protected]>

	USER root

	# Docker
	# Set up the repository
	RUN apt-get update
	RUN apt-get -y install apt-transport-https ca-certificates curl gnupg lsb-release

dayyass / humanize_bytes.py

Created July 25, 2021 08:25

Convert bytes to human readable format.

	def humanize_bytes(bytes: int, suffix: str = "B") -> str:
	"""
	Convert bytes to human readable format.

	:param int bytes: number of bytes.
	:param str suffix: bytes suffix.
	:return: human readable size.
	:rtype: str
	"""

dayyass / lemmatized.py

Last active May 26, 2022 11:26

Pymorphy2 lemmatizer class.

	import pymorphy2


	class Lemmatizer:
	"""
	Pymorphy2 lemmatizer class.
	"""

	def __init__(self):
	"""

dayyass / tfidf_token2idf.py

Last active September 29, 2021 12:25

Extract token2idf mapper from TfidfVectorizer.

	from sklearn.feature_extraction.text import TfidfVectorizer

	# data
	corpus = [
	'This is the first document.',
	'This document is the second document.',
	'And this is the third one.',
	'Is this the first document?',
	]

dayyass / tfidf_lemmatization.py

Created September 29, 2021 09:20

How to use sklearn TfidfVectorizer with lemmatizer.

	from sklearn.feature_extraction.text import TfidfVectorizer

	# pymorphy2 lemmatizer
	import pymorphy2

	class Lemmatizer:
	def __init__(self):
	self.morph = pymorphy2.MorphAnalyzer()

	def __call__(self, x: str) -> str:

dayyass / permutation_accuracy.py

Created October 9, 2021 14:01

Find a labels mapper with the highest accuracy.

	from itertools import permutations

	import numpy as np
	from sklearn.metrics import accuracy_score

	np.random.seed(42)

	y_true = np.random.randint(low=0, high=3, size=100)

	noize_mapper = {0: 1, 1: 2, 2: 0}

Older Newer