Skip to content

Instantly share code, notes, and snippets.

@clemsos
clemsos / color_pages_pdf.sh
Created July 29, 2014 10:09
Count color and B&W pages in a PDF
#!/bin/bash
file="$1"
colorpages=0
# count all pages
totalpages=$(gs -q -dNODISPLAY -c "($1) (r) file runpdfbegin pdfpagecount = quit")
echo "Total pages : $totalpages"
# find pages with colors
for page in $(identify -density 12 -format '%p ' "$file") ; do
@clemsos
clemsos / citations2tex.py
Last active August 29, 2015 14:02
Convert scientific citations in plain text to Latex
#!/usr/bin/python
# convert citations into latex format
#
# (Nivre et al., 2007)
# (Sagae and Tsujii 2007)
# Nivre (2007)
# (Chen et al., 2007; Dredze et al., 2007).
#
# \cite{Nivre2007}
@clemsos
clemsos / csv_to_elastic_search_bulk_insert.py
Last active February 27, 2024 10:15
Elastic Search : index large csv files with Python Pandas
from pyelasticsearch import ElasticSearch
import pandas as pd
from time import time
root_path="/home/clemsos/Dev/mitras/"
raw_data_path=root_path+"data/"
csv_filename="week10.csv"
t0=time()
@clemsos
clemsos / gensim_workflow.py
Last active February 22, 2022 11:09
How to calculate TF-IDF similarity matrix of a complete corpus with Gensim
#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
This script just show the basic workflow to compute TF-IDF similarity matrix with Gensim
OUTPUT :