Skip to content

Instantly share code, notes, and snippets.

View drvenabili's full-sized avatar

Simon Hengchen drvenabili

View GitHub Profile
vagrant@lamachine:~/TICCL$ sudo perl TICCLops.PICCL.pl TICCL.Black.config
TICCL_OPTSin: abcmdef TXT /home/vagrant/TICCL/ticclops /home/vagrant/TICCL/data/int/nld/nld.aspell.dict.c20.d2.confusion empty.txt xml 100000000 /home/vagrant/TICCL/data/int/nld/nld.aspell.dict.lc.chars /home/vagrant/vooruit_preprocessed /home/vagrant/TICCL/data/int/nld/nuTICCL.OldandINLlexandINLNamesAspell.v2.COL1.tsv 2 /home/vagrant/OUT TESTTWO 3 nld /usr/bin/ 30 5 50
TICCL_OPTSin2: MODE: abcmdef TEXTTYPE: TXT ROOTDIR: /home/vagrant/TICCL/ticclops CHARCONFUS: /home/vagrant/TICCL/data/int/nld/nld.aspell.dict.c20.d2.confusion KHC: empty.txt EXT: xml$ ARTIFRQ: 100000000 ALPH: /home/vagrant/TICCL/data/int/nld/nld.aspell.dict.lc.chars INPUTDIR: /home/vagrant/vooruit_preprocessed DIR: LEX: /home/vagrant/TICCL/data/int/nld/nuTICCL.OldandINLlexandINLNamesAspell.v2.COL1.tsv LD: 2 OUTPUTDIR: /home/vagrant/OUT PREFIX: TESTTWO RANK: 3 LANG: nld TOOLDIR: /usr/bin/ THREADS: 30 MINLENGTH: 5 MAXLENGTH: 50
OUT1:
OUT2: /home/vagrant/OUT/zzz/TICCL/TES
vagrant@lamachine:~$ sudo lamachine-update.sh ticcl
=====================================================================
, LaMachine - NLP Software distribution
~) (http://proycon.github.io/LaMachine)
(----í Language Machines research group
/| |\ & Centre of Language and Speech Technology
/ / /| Radboud University Nijmegen
=====================================================================
Bootstrapping Virtual Machine or Docker image....
Warning: JAVA_HOME environment variable is not set.
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Mallet Topic-Modeling-Tool GUI 0.99-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-resources-plugin:2.6:copy-resources (copy-resources) @ TopicModelingTool ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
f = open("fichier","r")
f = readlines()
x = 0
sample = open("sample.txt","w")
for line in f:
if x%20 == 0:
sample.write(line+"\n")
x = x+1
\begin{figure}[H]
\centering
\caption{\todo{écrire une caption}}
\label{fig:distrib3}
\begin{tikzpicture}[scale=1]
\begin{axis}[
area style,
xtick=data,
tick label style={font=\small},
xticklabel interval boundaries,
2017-09-18 15:14:46,932 INFO - Processing file '/home/sigmund/work/hartlib/OneDrive/lemmatization/cleaned_EN_input/9D_17_55_cleaned.xml' .
org.jdom2.input.JDOMParseException: Error on line 1: The reference to entity "c" must end with the ';' delimiter.
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232)
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303)
at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196)
at edu.northwestern.at.morphadorner.corpuslinguistics.inputter.XMLTextInputter.doLoadText(XMLTextInputter.java:328)
at edu.northwestern.at.morphadorner.corpuslinguistics.inputter.XMLTextInputter.loadText(XMLTextInputter.java:415)
at edu.northwestern.at.morphadorner.MorphAdorner.adornXML(MorphAdorner.java:817)
at edu.northwestern.at.morphadorner.MorphAdorner.processInputFiles(MorphAdorner.java:718)
at edu.northwestern.at.morphadorner.MorphAdorner.main(MorphAdorner.java:2610)
import numpy as np
import matplotlib.pyplot as plt
valeurs = {"p1":[0.4, 0.55, 0.05, 0.0], "p2":[0.2, 0.3, 0.5, 0.0], "p3":[0.4, 0.2, 0.2, 0.2], "p4":[0.2, 0.2, 0.2, 0.4], "p5":[0.4, 0.55, 0.05, 0.0], "p6":[0.4, 0.55, 0.05, 0.0], "p7":[0.4, 0.55, 0.05, 0.0]}
colours = ['b','g','r','c','m','y','k']
valeurs2 = dict()
for key in valeurs.keys():
#print(key)
hengchen@LM7-HUMTDK-02  ~/git/seed-semantic-change/src/dynamic-senses   master ●  ./aest_non_aesth.sh
./greek_input/aesth_non_aesth/all_corpora/aesth_113388.txt
./greek_input/aesth_non_aesth/targets_113388.txt
5
28884 bytes successfully written to file
correct case for test likelihood
open : no such file or directory
invalid argument
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x10dc0de]
@drvenabili
drvenabili / gensim_word2vec_procrustes_align.py
Last active October 26, 2019 12:21 — forked from quadrismegistus/gensim_word2vec_procrustes_align.py
Code for aligning two gensim word2vec models using Procrustes matrix alignment. Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
## This is a copy for backup purposes. The original GitHub gist by Ryan Heuser is available at https://gist.github.com/quadrismegistus/09a93e219a6ffc4f216fb85235535faf .
def smart_procrustes_align_gensim(base_embed, other_embed, words=None):
"""Procrustes align two gensim word2vec models (to allow for comparison between same word across models).
Code ported from HistWords <https://github.com/williamleif/histwords> by William Hamilton <[email protected]>.
(With help from William. Thank you!)
First, intersect the vocabularies (see `intersection_align_gensim` documentation).
Then do the alignment on the other_embed model.
Replace the other_embed model's syn0 and syn0norm numpy matrices with the aligned version.
@drvenabili
drvenabili / diff.py
Created October 11, 2018 13:23
diff 70-60
Difference between 60 and 70: {'6.297', 'candles', 'third-rate', 'completer', 'axpb', '174.evolution', 'incongruously', 'soaked', '21.0..1.1', '6.546', 'surmountable', '8.97', '351.velian', 'self-benefactors', 'geographical', 'swell', '482.to', "v'-s'v", '5.373', '469.every', '3.142', 'imposition', 'correctorium', '2.611', '~k+h', 'glee', 'x-ae', '.á.a', 'hessians', 'symbolon', 'durations', '149.according', '340.the', 'stecheology', 'signatures', '300,000,000', '7.472', 'subjectivist', '590.if', 'kantianism', '57.the', '631.the', 'buddhisto-christian', '5.430', '261.eighth', '2.88', '4.436', 'leader-writer', 'racer', '518.pragmaticism', 'apagoge', '6.22', 'non-success', '819,539', 'engendered', '529', 'decreasing', 'semi-clerical', '324.it', '..1-a2..1-b2c+a..1-c2', '123.the', 'illuminates', 'inconsiderate', 'avogadro', '3.from', 'willkŸhrliche', '187.in', '0.50', '-ii', 'photometrics', 'christiana', '0.1111', 'filason', '73.', '3.406', 'buys', '168.however', 'barest', '8.227', '1.662', 'spade-suit', 'permiss