Skip to content

Instantly share code, notes, and snippets.

View liaocs2008's full-sized avatar
🎯
Focusing

Leo liaocs2008

🎯
Focusing
View GitHub Profile
@veekaybee
veekaybee / chatgpt.md
Last active October 30, 2024 08:38
Everything I understand about chatgpt

ChatGPT Resources

Context

ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?

I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.

Model Architecture

@vlasenkoalexey
vlasenkoalexey / download_glue_data.py
Created May 19, 2021 22:44
Fixed W4ngatang/download_glue_data.py script to download GLUE data
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
#!/usr/bin/env python
# coding=utf-8
# Copyright 2020 The HuggingFace Team All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
@bonlime
bonlime / get_Imagenet.sh
Last active November 2, 2024 21:41 — forked from BIGBALLON/extract_ILSVRC.sh
script for ImageNet data extract.
#!/bin/bash
#
# script to fully prepare ImageNet dataset
## 1. Download the data
# get ILSVRC2012_img_val.tar (about 6.3 GB). MD5: 29b22e2961454d5413ddabcf34fc5622
# wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
# get ILSVRC2012_img_train.tar (about 138 GB). MD5: 1d675b47d978889d74fa0da5fadfb00e
# wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar
@Windsooon
Windsooon / leetcode_retag.md
Last active October 7, 2024 04:20
Retag most popular Leetcode problems

osjobs

海外兔

website

@popcornell
popcornell / gf2elim.py
Last active August 5, 2024 17:49
Gaussian elimination for binary matrices ( all elements in GF(2) ) implemented in numba python and numpy for efficiency.
import numpy as np
import numba
@numba.jit(nopython=True, parallel=True) #parallel speeds up computation only over very large matrices
# M is a mxn matrix binary matrix
# all elements in M should be uint8
def gf2elim(M):
m,n = M.shape
@W4ngatang
W4ngatang / download_glue_data.py
Last active October 31, 2024 02:08
Script for downloading data of the GLUE benchmark (gluebenchmark.com)
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
@jinyu121
jinyu121 / README.md
Last active November 28, 2018 07:24
雅黑PHP探针 自用精简版

雅黑PHP探针 自用精简版

在原0.4.7版基础上,只保留AJAX部分,显示服务器时时信息。

声明

本版本为自用版。

版本记录

@wassname
wassname / torch_summarize_with_df.py
Last active May 22, 2020 06:09
summarize a torch model like in keras, showing parameters and output shape
# summarize model
from collections import OrderedDict
import pandas as pd
import torch
from torch import nn
from torch.autograd import Variable
class TorchSummarizeDf(object):
def __init__(self, model, weights=False, input_shape=True, nb_trainable=False, debug=False):
@simonw
simonw / recover_source_code.md
Last active September 28, 2024 08:10
How to recover lost Python source code if it's still resident in-memory

How to recover lost Python source code if it's still resident in-memory

I screwed up using git ("git checkout --" on the wrong file) and managed to delete the code I had just written... but it was still running in a process in a docker container. Here's how I got it back, using https://pypi.python.org/pypi/pyrasite/ and https://pypi.python.org/pypi/uncompyle6

Attach a shell to the docker container

Install GDB (needed by pyrasite)

apt-get update && apt-get install gdb