Skip to content

Instantly share code, notes, and snippets.

View chhantyal's full-sized avatar
💭
🤷‍♂

Nar Kumar Chhantyal chhantyal

💭
🤷‍♂
View GitHub Profile
@chhantyal
chhantyal / spark_rdd_to_pandas_distributed.py
Last active April 27, 2023 23:53
Convert Spark RDD to Pandas DataFrame inside Spark executors and make Spark DataFrame from resulting RDD. This is distributed i.e. no need for collecting RDD to driver.
"""
Spark DataFrame is distributed but it lacks many features compared to Pandas.
If you want to use Pandas, you can't just convert Spark DF to Pandas because that means collecting it to driver.
It can be slow & not work at all when data size is big.
So only way to use Pandas is to create mini dataframes inside executors.
This gist shows how to create DataFrame from RDD inside Spark executors & build Spark DataFrame from final output.
"""
# Convert function to use in mapPartitions
@chhantyal
chhantyal / amqp_client.py
Last active November 17, 2017 12:34
Python AMQP client to send messages to Azure EventHub (see line 2 & 3 to run it)
from __future__ import print_function, unicode_literals
"""
To run this script (works on both Python 2 & 3), follow these steps:
1. pip install python-qpid-proton
2. python amqp_client.py
"""
import optparse
from proton import Message
from proton.handlers import MessagingHandler
from proton.reactor import Container
@chhantyal
chhantyal / lambda-dynamodb.py
Last active July 20, 2017 14:50
AWS Lambda to DynamoDB integration (can be used with Alexa)
import boto3
dynamodb = boto3.resource('dynamodb', region_name='eu-central-1')
table = dynamodb.Table('roomtemp')
def set_temperature(current_temp, session):
table.put_item(Item={'currentTemperature': current_temp, 'userId':session['user']['userId']})
return {"currentTemp": current_temp}
def get_temperature(session):
@chhantyal
chhantyal / hive-table-csv.sql
Last active February 11, 2019 11:31
Hive create external table from CSV file with semicolon as delimiter
/* Semicolon (;) is used as query completion in Hive */
/* Thus, using TERMINATED BY ";" will not work. This is workaround to that limitation */
CREATE EXTERNAL TABLE tablename
(`col1` string, `col2` string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\u003B" STORED AS TEXTFILE
LOCATION "<location>"
TBLPROPERTIES("skip.header.line.count"="1");
@chhantyal
chhantyal / install_documentdb.sh
Last active April 21, 2017 11:31
Install pyDocumentDB on azure HDInsights
# Install azure on HDInsights Python 3
/usr/bin/anaconda/envs/py35/bin/pip install pyDocumentDB
@chhantyal
chhantyal / install_azure.sh
Last active August 31, 2017 13:23
Install azure on HDInsights Python 3
# Install azure on HDInsights Python 3
/usr/bin/anaconda/envs/py35/bin/pip install plotly
@chhantyal
chhantyal / install_pyodbc.sh
Last active November 29, 2017 15:40
Install PyODBC on Azure Spark cluster to connect to SQL Server and othere databases. By default it is not installed.
#!/usr/bin/env bash
# To use, submit this script url to Script Actions in Azure Portal
/usr/bin/anaconda/envs/py35/bin/pip install pyodbc
# Just /usr/bin/anaconda/bin/conda for Python 2.
@chhantyal
chhantyal / sleep.js
Created February 22, 2017 08:30
Javascript/NodeJS sleep function. Javascript doesn't have built-in sleep function, thus this is workaround.
function sleep(milliSeconds) {
var startTime = new Date().getTime();
while (new Date().getTime() < startTime + milliSeconds);
}
@chhantyal
chhantyal / hosting-deploying-python-web-apps.md
Last active March 31, 2023 08:36
Easiest way to host and deploy Python web applications (Django) using Apache and Mod_WSGI (Mod_WSGI-express)

Apache virtual host to proxy requests to another port

Name it project_name.conf and symlink to /etc/apache2/sites-enabled/

<VirtualHost *:80>
       ServerName example.com
       ProxyPass / http://localhost:8000/
       ProxyPassReverse / http://localhost:8000/
       RequestHeader set X-Forwarded-Port 80
@chhantyal
chhantyal / runserver.sh
Last active August 25, 2016 20:11
Script to run Django development server.
#!/bin/bash
# script to run development server
# Instead of typing `python manage.py runserver` all the time
# just use ./runserver.sh
# you might need to do: chmod +x runserver.sh
DJANGO_SETTINGS_MODULE=project.settings.local venv/bin/python manage.py runserver 0.0.0.0:8000