Skip to content

Instantly share code, notes, and snippets.

View dayyass's full-sized avatar
🚀
Rocket Science

Dani El-Ayyass dayyass

🚀
Rocket Science
View GitHub Profile
@dayyass
dayyass / pytorch_onnx_global_pooling.py
Created November 25, 2020 16:49
ONNX doesn't support PyTorch Adaptive Pooling (and Global Pooling as a special case with output_size=1). There is an implementation of Global Pooling compatible with ONNX.
import numpy as np
import torch
import torch.nn as nn
import onnx
import onnxruntime
##### INIT 1d, 2d, 3d GLOBAL POOLING MODULES #####
@dayyass
dayyass / pytorch_cross_entropy_loss_for_binary_classification.py
Last active June 17, 2021 15:37
PyTorch nn.BCELoss and nn.CrossEntropyLoss equivalence for binary classification.
"""
In a binary classification problem, a neural network usually returns a vector of logits of shape [batch_size],
while in a multiclass classification problem, logits are represented as a matrix of shape [batch_size, n_classes].
For these tasks, different loss functions are used, and, therefore, the network training pipelines are also different,
which is not convenient when you need to test hypotheses for both problem statements (binary/multiclass).
Pipeline schemes:
- binary classification:
logits (of shape [batch_size]) -> BCEWithLogitsLoss
@dayyass
dayyass / pytorch_set_global_seed.py
Created December 20, 2020 10:16
Set global seed for reproducibility.
import torch
import random
import numpy as np
def set_global_seed(seed: int):
"""
Set global seed for reproducibility.
"""
@dayyass
dayyass / pytorch_pack_padded_sequence.ipynb
Last active January 12, 2023 12:43
RNN inference time with/without pack_padded_sequence comparison.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dayyass
dayyass / attention.ipynb
Last active June 17, 2021 15:37
My own implementation of Multihead Attention.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dayyass
dayyass / muse_tokenize.ipynb
Last active September 5, 2023 08:19
How to get and use tokenizer from "universal-sentence-encoder-multilingual".
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dayyass
dayyass / sklearn_tokenizer.py
Created June 17, 2021 13:54
sklearn tokenizer used in HashingVectorizer, CountVectorizer and TfidfVectorizer.
import re
# Method build_tokenizer from _VectorizerMixin mixin from which classes HashingVectorizer, CountVectorizer and
# TfidfVectorizer (through CountVectorizer) are partially inherited.
# It is used to split a string into a sequence of tokens (only if analyzer == 'word').
def build_tokenizer(token_pattern: str = r"(?u)\b\w\w+\b"):
"""
Return a function that splits a string into a sequence of tokens.
@dayyass
dayyass / matrix_to_dict.py
Created June 17, 2021 18:29
Convert matrix into a dictionary whose keys are the row and column indices of the matrix and values correspond to the matrix values for given key indices.
import numpy as np
from tqdm import trange
from collections import defaultdict
from typing import Dict, Tuple, DefaultDict
def get_matrix_idx_to_value_dict(
matrix: np.ndarray,
verbose: bool = True,
) -> DefaultDict[Tuple[int, int], int]:
- repo: local
hooks:
- id: unittest
name: unittest
entry: python -m unittest discover
language: python
always_run: true
pass_filenames: false
@dayyass
dayyass / Dockerfile
Last active July 19, 2021 10:06
jupyter-cuda10.1-tf2.2.0-docker-mlspace
FROM cr.msk.sbercloud.ru/aicloud-jupyter/jupyter-cuda10.1-tf2.2.0-mlspace:latest
MAINTAINER Dani El-Ayyass <[email protected]>
USER root
# Docker
# Set up the repository
RUN apt-get update
RUN apt-get -y install apt-transport-https ca-certificates curl gnupg lsb-release