Skip to content

Instantly share code, notes, and snippets.

View ebuildy's full-sized avatar
🤟
South of France

Thomas Decaux ebuildy

🤟
South of France
View GitHub Profile
<?php
/**
* A simple Php script to remove duplicates content from elasticsearch (> 2.0)
* Usage:
* php remove_dup.php HOST INDEX(/TYPE) QUERY_STRING FIELD REFRESH_TIMEOUT
*
* This script uses terms aggregation > top_hits to retrieve duplicates then bulk delete document by id.
*
* <!> Very long because we have to wait refresh_interval seconds before new search <!>
{
"size": 0,
"query": {
"query_string": {
"query": "blacklist"
}
},
"aggs": {
"dup": {
"terms": {
echo "auto_prepend_file=/opt/www/proxy.php" >> /etc/php5/cli/php.ini
@ebuildy
ebuildy / shell
Last active December 21, 2016 19:43
Set resolv.conf for Docker4Mac virtual machine
docker run --rm -v /etc/resolv.conf:/r debian:8 bash -c 'echo "nameserver X.X.X.X" > /r'
@ebuildy
ebuildy / gist:8117d3c1748dc1f26d39d02fe1b8c4a7
Created January 24, 2017 19:43
Extract protocol, host and port from URL like "PROTOCOL://HOST:PORT"
PROXY_PROTOCOL=$(echo "${HTTP_PROXY%%:*}")
PROXY_HOST=$(echo "${HTTP_PROXY%:*}")
PROXY_HOST=$(echo "${PROXY_HOST##*/}")
PROXY_PORT=$(echo "${HTTP_PROXY##*:}")
@ebuildy
ebuildy / ContainerImageRepository.php
Created February 1, 2017 10:43
Retrieve Gitlab projects with Docker container registry tags.
<?php
namespace AppBundle\Service;
class ContainerImageRepository
{
/**
* URL to container registry.
*
* @var string
@ebuildy
ebuildy / gist:b758341b89bcf50eff454d6fc8179e76
Created March 9, 2017 19:13
elasticsearch Hot threads dump
::: {Silverclaw}{_MpS6u3AT1W9ZsGumNZrwg}{151.80.57.192}{151.80.57.192:9300}
Hot threads at 2017-03-09T19:11:56.390Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
41.3% (206.6ms out of 500ms) cpu usage by thread 'elasticsearch[Silverclaw][search][T#4]'
3/10 snapshots sharing following 25 elements
org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$2.get(Lucene54DocValuesProducer.java:502)
org.apache.lucene.util.LongValues.get(LongValues.java:45)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$11.setDocument(Lucene54DocValuesProducer.java:1009)
org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$2.collect(GlobalOrdinalsStringTermsAggregator.java:126)
org.elasticsearch.search.aggregations.LeafBucketCollector$3.collect(LeafBucketCollector.java:73)
@ebuildy
ebuildy / intro.md
Last active March 17, 2017 12:39
Restrict Docker network address

Currently, Docker lets you customize only IP ranges of docker0 bridge (via BIP options), so if you run docker-compose that creates networks, you are fucked up.

Solution in progress

A merge request has been sent => https://github.com/docker/docker/pull/29376/files not yet merged

Solution for now

You can create manualy a network:

@ebuildy
ebuildy / get_memory.scala
Created April 7, 2017 08:42
Get total memory used by all executors on Apache Spark
sc.getExecutorMemoryStatus.map(a => (a._2._1 - a._2._2)/(1024.0*1024*1024)).sum
@ebuildy
ebuildy / flatten.java
Last active December 22, 2024 03:01
Flatten Spark data frame fields structure, via SQL in Java
class Toto
{
public void Main()
{
final DataFrame source = GetDataFrame();
final String querySelectSQL = flattenSchema(source.schema(), null);
source.registerTempTable("source");
final DataFrame flattenData = sqlContext.sql("SELECT " + querySelectSQL + " FROM source")