Skip to content

Instantly share code, notes, and snippets.

@thomwolf
thomwolf / loading_wikipedia.py
Last active January 18, 2024 14:04
Load full English Wikipedia dataset in HuggingFace nlp library
import os; import psutil; import timeit
from datasets import load_dataset
mem_before = psutil.Process(os.getpid()).memory_info().rss >> 20
wiki = load_dataset("wikipedia", "20200501.en", split='train')
mem_after = psutil.Process(os.getpid()).memory_info().rss >> 20
print(f"RAM memory used: {(mem_after - mem_before)} MB")
s = """batch_size = 1000
for i in range(0, len(wiki), batch_size):
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
version: '3'
services:
portainer:
image: portainer/portainer:latest
container_name: portainer
restart: unless-stopped
security_opt:
- no-new-privileges:true
networks:
@kingspp
kingspp / numpy_transfer_over_kafka.py
Created October 4, 2018 05:01
Efficient Transfer of Numpy Arrays over kafka
"""
Requirements
1. Numpy
2. Pympler or a recursive sys.getsizeof()
3. PIL
"""
import numpy as np
from pympler.asizeof import asizeof
import json
@johnmcfarlane
johnmcfarlane / begin(C++).md
Last active November 9, 2024 17:26
Resources for C++ beginners
@W4ngatang
W4ngatang / download_glue_data.py
Last active October 31, 2024 02:08
Script for downloading data of the GLUE benchmark (gluebenchmark.com)
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
@lukasnellen
lukasnellen / 00-docker-shorewall.md
Last active September 23, 2024 04:19
setup shorewall for docker networking beyond the default bridge network, e.g., for docker-compose

Docker(-compose) with shorewall

The shorewall documentation explains in http://shorewall.org/Docker.html how to configure shorewall for use with docker. The problem with the configuration is that it only allows connections from the host to the main bridge docker0. Connections to other networks on dynamically created bridges, with names starting by default with br-, is blocked. Instead of the recommended contents of /etc/shorewall/interfaces, use wild-card interface names as follows:

#ZONE	INTERFACE	OPTIONS
#dock	docker0		bridge     # disabled default recommendation
dock 	docker0		physical=docker+,routeback=1
dock 	br		physical=br-+,routeback=1
anonymous
anonymous / config
Created January 2, 2018 16:35
ssh config
Host *+*
ProxyCommand ssh $(echo %h | sed 's/+[^+]*$//;s/\([^+%%]*\)%%\([^+]*\)$/\2 -l \1/;s/:/ -p /') nc -w1 $(echo %h | sed 's/^.*+//;/:/!s/$/ %p/;s/:/ /')
@changkun
changkun / sendmail.py
Created April 11, 2017 13:59
Python Email Sender for QQ Mail
from email.mime.text import MIMEText
from email.header import Header
from smtplib import SMTP_SSL
# qq mail sending server
host_server = 'smtp.qq.com'
sender_mail = 'SENDER_MAIL'
sender_passcode = 'PASS_CODE'
# receiver mail