Albert alpoza

Common Options

-#, --progress-bar Make curl display a simple progress bar instead of the more informational standard meter.

-b, --cookie <name=data> Supply cookie with request. If no =, then specifies the cookie file to use (see -c).

-c, --cookie-jar <file name> File to save response cookies to.

The pyspark documentation doesn't include an example for the aggregateByKey RDD method. I didn't find any nice examples online, so I wrote my own.

Here's what the documetation does say:

aggregateByKey(self, zeroValue, seqFunc, combFunc, numPartitions=None)

Aggregate the values of each key, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U's, The former operation is used for merging values within a partition, and the latter is used for merging values between partitions. To avoid memory allocation, both of these functions are allowed to modify and return their first argument instead of creating a new U.

reduceByKey and aggregateByKey are much more efficient than groupByKey and should be used for aggregations as much as possible.

	#!/usr/bin/env python
	import os, requests, json, re

	## requirements.txt:
	#
	# httplib2==0.8
	# requests==2.0.1
	# wsgiref==0.1.2

	# hard coded

	curl -s -u admin:admin -XPOST "localhost:9000/api/organizations/enable_support" \| python -m json.tool
	curl -s -u admin:admin -XPOST "localhost:9000/api/organizations/create?name=myorg&key=myorg" \| python -m json.tool
	curl -s -u admin:admin -XPOST "localhost:9000/api/qualityprofiles/create?name=myprofile&language=java&organization=myorg" \| python -m json.tool
	curl -s -u admin:admin -XPOST "localhost:9000/api/qualityprofiles/search" \| python -m json.tool
	curl -s -u admin:admin -XPOST "localhost:9000/api/qualityprofiles/search?organization=myorg" \| python -m json.tool
	curl -s -u admin:admin -XPOST "localhost:9000/api/qualityprofiles/search?defaults=true" \| python -m json.tool
	curl -s -u admin:admin -XPOST "localhost:9000/api/projects/create?project=myproject&name=myproject" \| python -m json.tool
	curl -s -u admin:admin -XPOST "localhost:9000/api/qualityprofiles/search?project=myproject" \| python -m json.tool

	#delete

	ORIGINAL_JENKINS_SERVER=
	ORIGINAL_SERVER_USER=

	NEW_JENKINS_SERVER=
	NEW_SERVER_USER=

	# ON THE ORIGINAL JENKINS SERVER
	ssh $ORIGINAL_SERVER_USER@$ORIGINAL_JENKINS_SERVER
	cd /var/lib/jenkins/
	for i in `ls jobs`; do echo "jobs/$i/config.xml";done > config.totar

	#!/usr/bin/env python
	import sys
	import pandas as pd
	import pymongo
	import json



	def import_content(filepath):
	mng_client = pymongo.MongoClient('localhost', 27017)

	package groovy.csv

	/**
	* CSV slurper which parses text or reader content into a data strucuture of lists and maps.
	* <p>
	* Example usage:
	* <code><pre>
	* def slurper = new CsvSlurper()
	* def result = slurper.parseText('''
	* name, age

	library(RCurl)
	val <- getURL('ldap://ldap.domain.net/DC=domain,DC=net?sAMAccountName?sub?(employeeID=0123456)',
	.opts=list(userpwd = "DOMAIN\\domainid:password"))

	import groovy.json.JsonOutput

	/**
	* A simple CSV file to Json converter
	*
	* The CSV file is expected to have a header row to identify the columns. These
	* columns will be used to generate the corresponding Json field.
	*
	* @author Marco Pas
	*/

	import pandas as pd
	import numpy as np
	from sklearn.feature_extraction import DictVectorizer

	def encode_onehot(df, cols):
	"""
	One-hot encoding is applied to columns specified in a pandas DataFrame.

	Modified from: https://gist.github.com/kljensen/5452382