Skip to content

Instantly share code, notes, and snippets.

View behitek's full-sized avatar
😎

Hieu Nguyen Van behitek

😎
View GitHub Profile
@behitek
behitek / Sparse Table.cpp
Created November 17, 2018 02:13 — forked from jacky860226/Sparse Table.cpp
Sparse Table
#define MAXN 100000
#define MAX_LOG 17
int n,s[MAXN+5];
int st[MAX_LOG+1][MAXN+5];
inline void init(){/*假設區間由[0~n-1]*/
for(int i=0;i<n;++i)st[0][i]=s[i];
for(int j=1;(1<<j)<=n;++j)
for(int i=0;i+(1<<j)<=n;++i)
st[j][i]=min(st[j-1][i],st[j-1][i+(1<<(j-1))]);
}
@behitek
behitek / KMP.cpp
Created November 17, 2018 02:13 — forked from shihongzhi/KMP.cpp
KMP
//shihongzhi -- 2012.3.9
#include <stdio.h>
#include <string.h>
void KMP(char *T, char *P, int *pi)
{
int tLen = strlen(T);
int pLen = strlen(P);
int k = 0;
for (int i=0; i<tLen; ++i)
@behitek
behitek / kmp.cpp
Created November 17, 2018 02:13 — forked from osjayaprakash/kmp.cpp
KMP
#include <iostream>
#include <cstring>
using namespace std;
int buildlps (char * pat, int m, int *lps){
lps[0] = lps[1] = 0;
for(int i=2; i<=m; i++){
int j = lps[i-1];
while(1){
@behitek
behitek / tfidf-self-implement.ipynb
Last active January 16, 2019 06:06
Ví dụ tính tf-idf với python
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@behitek
behitek / all-vietnamese-syllables.txt(Gõ dấu kiểu cũ)
Last active March 28, 2022 15:51
Từ điển từ đơn tiếng Việt
a
ai
am
an
ang
anh
ao
au
ay
ba
@behitek
behitek / 1.Xóa_dấu_tiếng_Việt_trong_Python.py
Last active September 27, 2024 08:32
Xóa dấu tiếng việt trong Python
import re
def no_accent_vietnamese(s):
s = s.lower()
s = re.sub('[áàảãạăắằẳẵặâấầẩẫậ]', 'a', s)
s = re.sub('[éèẻẽẹêếềểễệ]', 'e', s)
s = re.sub('[óòỏõọôốồổỗộơớờởỡợ]', 'o', s)
s = re.sub('[íìỉĩị]', 'i', s)
s = re.sub('[úùủũụưứừửữự]', 'u', s)
s = re.sub('[ýỳỷỹỵ]', 'y', s)
s = re.sub('đ', 'd', s)
@behitek
behitek / regex.md
Last active May 3, 2019 03:42 — forked from vitorbritto/regex.md
Regex Cheat Sheet

Regular Expressions

Basic Syntax

  • /.../: Start and end regex delimiters
  • |: Alternation
  • (): Grouping
@behitek
behitek / num2str.cpp
Last active May 7, 2019 04:23
Chuyển số viết thành số đọc sử dụng C++/Python
// https://daynhauhoc.com/t/share-code-doc-so-thanh-chu-so-lon-bao-nhieu-cung-can-tat/62701/
#include <iostream>
#include <string>
#include <algorithm>
#include <exception>
#include <cassert>
#ifdef __unix__
#include <clocale>
#elif defined _WIN32 || defined _WIN64
#include <fcntl.h> //_O_WTEXT
@behitek
behitek / NlpUtils.java
Last active November 20, 2022 12:30
Chuẩn hóa cách gõ dấu câu về kiểu gõ cũ (Python + Java version)
# -*- coding: utf-8 -*-
import regex as re
uniChars = "àáảãạâầấẩẫậăằắẳẵặèéẻẽẹêềếểễệđìíỉĩịòóỏõọôồốổỗộơờớởỡợùúủũụưừứửữựỳýỷỹỵÀÁẢÃẠÂẦẤẨẪẬĂẰẮẲẴẶÈÉẺẼẸÊỀẾỂỄỆĐÌÍỈĨỊÒÓỎÕỌÔỒỐỔỖỘƠỜỚỞỠỢÙÚỦŨỤƯỪỨỬỮỰỲÝỶỸỴÂĂĐÔƠƯ"
unsignChars = "aaaaaaaaaaaaaaaaaeeeeeeeeeeediiiiiooooooooooooooooouuuuuuuuuuuyyyyyAAAAAAAAAAAAAAAAAEEEEEEEEEEEDIIIOOOOOOOOOOOOOOOOOOOUUUUUUUUUUUYYYYYAADOOU"
def loaddicchar():
dic = {}
@behitek
behitek / command.md
Last active May 30, 2019 04:36
Useful linux command for data engineer

Remove duplicate line in text file

awk '!seen[$0]++' filename > output.txt

Merge file

cat file1 fil2 > merge.txt

Show file to read

less filename

Show head

head -n 1000 filename

Show tail

tail -n 1000 filename