###Tested with:
- Spark 2.0.0 pre-built for Hadoop 2.7
- Mac OS X 10.11
- Python 3.5.2
Use s3 within pyspark with minimal hassle.
The way I could do it was by using the docker api. I used the docker-py package to access it.
The api exposes a labels
dictionary for each container, and the keys com.docker.compose.container-number
, com.docker.compose.project
and com.docker.compose.service
did what was needed to build the hostname.
The code below is a simplified for code I am now using. You can find my advanced code with caching and fancy stuff that at Github at luckydonald/pbft/dockerus.ServiceInfos (backup at gist.github.com).
<!DOCTYPE html> | |
<html> | |
<head> | |
<title>AWS S3 File Upload</title> | |
<script src="https://sdk.amazonaws.com/js/aws-sdk-2.1.12.min.js"></script> | |
</head> | |
<body> | |
<input type="file" id="file-chooser" /> |
PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default) | |
PASSWORD2==>examplePass # replace with 'examplePass' instead | |
PASSWORD3==> # replace with the empty string | |
regex:password=\w+==>password= # Replace, using a regex | |
regex:\r(\n)==>$1 # Replace Windows newlines with Unix newlines |
pragma solidity ^0.4.7; | |
contract Factory { | |
bytes32[] Names; | |
address[] newContracts; | |
function createContract (bytes32 name) { | |
address newContract = new Contract(name); | |
newContracts.push(newContract); |
flatMap
, especially if the following operation will result in high memory usage. The flatMap
op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap
to a number of partitions that will safely allow for appropriate partition memory sizes, based upon thefrom airflow import DAG | |
from airflow.operators import BashOperator | |
from datetime import datetime | |
import os | |
import sys | |
args = { | |
'owner': 'airflow' | |
, 'start_date': datetime(2017, 1, 27) | |
, 'provide_context': True |
ec2-54-152-134-146.compute-1.amazonaws.com.