Skip to content

Instantly share code, notes, and snippets.

@snakers4
snakers4 / modeling.py
Created March 1, 2019 09:14
Best pretraining for Russian language - embedding bag interfaces
class BertEmbeddingBag(nn.Module):
"""Construct the embeddings from word, position and token_type embeddings.
"""
def __init__(self, config):
super(BertEmbeddingBag, self).__init__()
# self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size)
ngram_matrix=np.load(config.ngram_matrix_path)
self.old_bag = config.old_bag
@asanakoy
asanakoy / Scrape with proxy
Created October 12, 2018 19:11
How to scrape data fwom awebsite through proxies
#Proxy list graper
# https://github.com/abdallahelsokary/Proxy-Collector-/blob/master/Proxy_Collector.py
import urllib.request
import urllib.error
import time
def proxy_list():
try:
@zeyademam
zeyademam / Troubleshoot-dcnn.md
Last active January 22, 2024 05:54
Troubleshooting Convolutional Neural Nets

Troubleshooting Convolutional Neural Networks

Intro

This is a list of hacks gathered primarily from prior experiences as well as online sources (most notably Stanford's CS231n course notes) on how to troubleshoot the performance of a convolutional neural network . We will focus mainly on supervised learning using deep neural networks. While this guide assumes the user is coding in Python3.6 using tensorflow (TF), it can still be helpful as a language agnostic guide.

Suppose we are given a convolutional neural network to train and evaluate and assume the evaluation results are worse than expected. The following are steps to troubleshoot and potentially improve performance. The first section corresponds to must-do's and generally good practices before you start troubleshooting. Every subsequent section header corresponds to a problem and the section is devoted to solving it. The sections are ordered to reflect "more common" issues first and under each header the "most-eas

@rzykov
rzykov / XgBoostRankSparkScala.scala
Last active March 31, 2022 22:43
XGboost Spark - ranking problem
import _root_.ml.dmlc.xgboost4j.scala.spark.XGBoost
import org.apache.spark.ml.feature.LabeledPoint
def encodeFeaturesToLabeledPoint(features: RDD[Feature], relevance: Option[RDD[Relevance]], workers: Int)
(implicit parallel: Int): (RDD[LabeledPoint], Seq[String], Seq[Seq[Int]]) = {
val missingValue = Double.NaN
val names = features
.map { _.name }