Anderson Banihirwe andersy005

NOTE : Take a look at the comments below !

GIS with pySpark : A not-so-easy journey

Why would you do that ?

Today, many datas are geolocalised (meaning that they have a position in space). They're named GIS datas.

It's not rare that we need to do operations on those, such as aggregations, and there are many optimisations existing to do that.

How to Setup Automatic Uploads to Anaconda from Travis CI in 15 minutes

TL;DR: Edit .travis.yaml to install Anaconda and to run conda_upload.sh after testing. Edit meta.yaml to take in the environmental variables $VERSION and $CONDA_BLD_PATH. Create conda_upload.sh which sets the needed environmental variables, builds the tar archive, and uploads it to Anaconda. Finally edit some stuff on your Anaconda and Travis CI account so they can talk.

Intro

The following steps will detail how to automatically trigger Anaconda builds and uploads from Travis CI. This will only upload successful builds in the master branch and if there are multiple commits in a single day, it'll only keep the latest one. Both of these settings can easily be changed.

Edit .travis.yaml

First, edit .travis.yml so that it installs Anaconda.

install:

	FROM python:2-alpine

	RUN pip install \
	beautifulsoup4 \
	requests

	COPY papers.py /usr/local/bin/
	RUN chmod +x /usr/local/bin/papers.py

	WORKDIR /root

	import batchspawner

	# The port for this process
	c.JupyterHub.hub_port = 8081
	# The ip for this process
	c.JupyterHub.hub_ip = '127.0.0.1'

	class SlurmSpawnerNoLocalUsers(batchspawner.SlurmSpawner):

	"""Slurm Spawner that does not need local Unix users on the Hub server"""

	--[[
	Youtube playlist importer for VLC media player 1.1 and 2.0
	Copyright 2012 Guillaume Le Maout

	Authors: Guillaume Le Maout
	Contact: http://addons.videolan.org/messages/?action=newmessage&username=exebetche

	This program is free software; you can redistribute it and/or modify
	it under the terms of the GNU General Public License as published by
	the Free Software Foundation; either version 2 of the License, or

	{
	"__inputs": [],
	"__requires": [
	{
	"type": "grafana",
	"id": "grafana",
	"name": "Grafana",
	"version": "4.6.3"
	},
	{

	from threading import Thread
	from time import sleep
	import uuid

	from dask.distributed import LocalCluster, Client
	import dask.dataframe as dd
	import pandas as pd
	import pyspark

	# A simple cheat sheet of Spark Dataframe syntax
	# Current for Spark 1.6.1

	# import statements
	from pyspark.sql import SQLContext
	from pyspark.sql.types import *
	from pyspark.sql.functions import *

	#creating dataframes
	df = sqlContext.createDataFrame([(1, 4), (2, 5), (3, 6)], ["A", "B"]) # from manual data

	Host *
	ControlPath ~/.ssh/control/%C
	ControlMaster auto