Skip to content

Instantly share code, notes, and snippets.

View vmarkovtsev's full-sized avatar

Vadim Markovtsev vmarkovtsev

View GitHub Profile
@vmarkovtsev
vmarkovtsev / draw_clones.py
Last active August 29, 2015 14:07
clonedigger CPD XML visualization
import os
import sys
import matplotlib
matplotlib.use('cairo')
from matplotlib import pyplot
from matplotlib.colors import LinearSegmentedColormap
import numpy
from scipy.cluster.hierarchy import linkage, leaves_list
import xmltodict
Большая подборка иконочных веб-шрифтов (free) и их генераторов.
https://www.google.com/design/icons/
(Лицензия: CC BY 4.0)
https://linearicons.com/free
(Лицензия: Custom)
https://octicons.github.com/
@vmarkovtsev
vmarkovtsev / manual.md
Last active December 23, 2021 09:05
Installing VCMI on MacOSX El Capitan

Installing XCode

App Store -> XCode Launch it after the installation to agree with it's license terms.

Installing Homebrew

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
#!/bin/bash
# License: Public Domain.
echo "export PYSPARK_PYTHON=python3" | tee -a /etc/profile.d/spark_config.sh /etc/*bashrc /usr/lib/spark/conf/spark-env.sh
echo "export PYTHONHASHSEED=0" | tee -a /etc/profile.d/spark_config.sh /etc/*bashrc /usr/lib/spark/conf/spark-env.sh
echo "spark.executorEnv.PYTHONHASHSEED=0" >> /etc/spark/conf/spark-defaults.conf
# Only run on the master node
ROLE=$(/usr/share/google/get_metadata_value attributes/dataproc-role)
if [[ "${ROLE}" == 'Master' ]]; then
@vmarkovtsev
vmarkovtsev / notebook.ipynb
Created March 10, 2017 10:40
lapjv blog post
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Recently, GitHub introduced the change in how atx headers are parsed in Markdown files.

##Wrong

Correct

While this change follows the spec, it breaks many existing repositories. I took the README dataset which we created at source{d} and ran a simple

@vmarkovtsev
vmarkovtsev / identifier_split.py
Created May 26, 2017 10:39
Identifier splitting algorithm from the paper "Topic modeling of public repositories at scale using names in source code"
import re
NAME_BREAKUP_RE = re.compile(r"[^a-zA-Z]+")
def extract_names(token):
token = token.strip()
prev_p = [""]
def ret(name):
r = name.lower()
if len(name) >= 3:
@vmarkovtsev
vmarkovtsev / ml_sapi_usecases.md
Last active August 21, 2017 17:34
ML Spark API usecases

Domains

First of all. ML has two quite different activity domains:

  1. Running something on many repositories.
  2. Running something on a single repository

Depending on the size of (2), it makes or does not make sense to launch Spark. For example, consider the topic model application scenario:

@vmarkovtsev
vmarkovtsev / id2vec_legacy.ipynb
Created October 6, 2017 14:11
Source code identifier embeddings - legacy demonstration
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@vmarkovtsev
vmarkovtsev / hercules.go
Created September 27, 2018 08:08
Imports from internal packages
package hercules
import (
"gopkg.in/src-d/go-git.v4"
"gopkg.in/src-d/go-git.v4/plumbing/object"
"gopkg.in/src-d/hercules.v4/internal/core"
"gopkg.in/src-d/hercules.v4/internal/plumbing"
"gopkg.in/src-d/hercules.v4/internal/plumbing/identity"
"gopkg.in/src-d/hercules.v4/internal/plumbing/uast"
"gopkg.in/src-d/hercules.v4/internal/yaml"