Skip to content

Instantly share code, notes, and snippets.

View amakukha's full-sized avatar

Andriy Makukha amakukha

  • Toronto
View GitHub Profile
@amakukha
amakukha / generate_random_words.py
Created February 21, 2023 01:27
Generate unique random strings or identifiers
#!/usr/bin/env python3
'''Generate unique random strings or identifiers'''
import random, sys
# How many new unique identifiers to generate?
NUMBER = 100 if len(sys.argv) <= 1 else int(sys.argv[1])
# How many words to combine into a new generated string?
LENGTH = 3 if len(sys.argv) <= 2 else int(sys.argv[2])
@amakukha
amakukha / code_duplication.py
Created December 29, 2022 19:51
code_duplication.py
#!/usr/bin/env python3
'''
Code duplication assessment tool. Runs in linear time.
Usage:
Put your packages into a single directory and run this script:
python3 code_duplication.py > report.txt
Then sort files by similarity:
cat report.txt | awk 'NF > 1' | sort -rn | less
'''
@amakukha
amakukha / fast_strlcpy.c
Created December 17, 2021 08:29
fast_strlcpy
/* Copyright (c) 1998, 2015 Todd C. Miller <[email protected]>
* (c) 2021 Andrii Makukha
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
@amakukha
amakukha / djb2_32.html
Last active July 27, 2021 13:03
DJB2 hash in JavaScript (32-bit version)
<html>
<body>
<h2>32-bit DJB2 hash in JavaScript demo</h2>
<input type="file" id="file-input" />
<h3 id="result">Select file to calculate DJB2 hash</h3>
</body>
<script>
// DJB2 hash function. Takes ArrayBuffer
function hash_buf_djb2_32(buf) {
Weights←{
⍝ 2020 APL Problem Solving Competition Phase II
⍝ Problem 9, Task 1 - Weights
⍝ 1) Read the diagram of a mobile from the file as a vector of lines (M).
⍝ 2) Find lines which exactly repeat the preceding lines and contain only
⍝ vertical bars (│) and spaces. Such lines don't bring any useful
⍝ information. (This filtering step allows to process files which are
⍝ very deep without running out of memory. For example, 10K characters
⍝ wide, 100K lines deep. Without filtering, such a file would be
@amakukha
amakukha / predicting_movie_reviews_with_bert_on_tf_hub.py
Created May 4, 2020 10:48
BERT (2018) script for movie comment sentiment analysis
#!/usr/bin/env python
# Copyright 2019 Google Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
@amakukha
amakukha / .config
Created February 20, 2020 21:46
Config for compiling toybox on MacOS
# .config file for compiling toybox on MacOS
# 1) make defconfig
# 2) replace the .config file
# 3) make
# Also: don't forget to install gsed (GNU sed)
# Adding CPUS=1 in scripts/make.sh might also be needed
#
# -----------------------------------------------
#
# Automatically generated make config: don't edit
@amakukha
amakukha / giganovel.py
Last active November 3, 2023 23:19
Generates a huge artificial text
#!/usr/bin/env python3
'''
A script to generate text files that look like a novel in TXT form.
Words are completely made up, but vaguely resemble the Finnish language.
The resulting text uses ASCII encoding with only printable characters.
Distribution of words follows Zipf's law.
Standard parameters generate 1 GB text with 148391 distinct words.
@amakukha
amakukha / mostfreq.cpp
Last active August 7, 2022 18:01
Find k most frequent words in a file (fast implementation)
/*
* Solution of the Jon Bentley's k most frequent words problem using a prefix tree and
* a heap of size k. Worst case time complexity is O(N log k), where N is the total
* number of words.
*
* The problem is formulated in the Communications of the ACM 29,5 (May 1986), 364-369:
* "Given a text file and an integer k, you are to print the k
* most common words in the file (and the number of their
* occurrences) in decreasing frequency."
*
@amakukha
amakukha / hash_djb2.py
Last active March 19, 2023 08:55 — forked from mengzhuo/hash_djb2.py
DJB2 Hash in Python
# Proper (fast) Python implementations of Dan Bernstein's DJB2 32-bit hashing function
#
# DJB2 has terrible avalanching performance, though.
# For example, it returns the same hash values for these strings: "xy", "yX", "z7".
# I recommend using Murmur3 hash. Or, at least, FNV-1a or SDBM hashes below.
import functools
djb2 = lambda x: functools.reduce(lambda x,c: 0xFFFFFFFF & (x*33 + c), x, 5381)
sdbm = lambda x: functools.reduce(lambda x,c: 0xFFFFFFFF & (x*65599 + c), x, 0)
fnv1a_32 = lambda x: functools.reduce(lambda x,c: 0xFFFFFFFF & ((x^c)*0x1000193), x, 0x811c9dc5)