Skip to content

Instantly share code, notes, and snippets.

@fyears
fyears / note.md
Last active February 6, 2024 09:59
how to install scipy numpy matplotlib ipython in virtualenv

if you are using linux, unix, os x:

pip install -U setuptools
pip install -U pip

pip install numpy
pip install scipy
pip install matplotlib
#pip install PySide

Chapter 2. Downloading and Getting Started

In this chapter we will walk through the process of downloading and running Spark in local mode on a single computer. This chapter was written for anybody that is new to Spark, including both Data Scientists and Engineers.

Spark can be used from Python, Java or Scala. To benefit from this book, you don’t need to be an expert programmer, but we do assume that you are comfortable with the basic syntax of at least one of these languages. We will include examples in all languages wherever possible.

Spark itself is written in Scala, and runs on the Java Virtual Machine (JVM). To run Spark on either your laptop or a cluster, all you need is an installation of Java 6 (or newer). If you wish to use the Python API you will also need a Python interpreter (version 2.6 or newer) . Spark does not yet work with Python 3.

@dapangmao
dapangmao / ssn.md
Last active August 29, 2015 14:10
Spark practice (3): clean and sort Social Security numbers

Sample.txt

Requirements:
1. separate valid SSN and invalid SSN
2. count the number of valid SSN
402-94-7709 
283-90-3049 
@mangecoeur
mangecoeur / description.md
Last active March 30, 2021 21:34
Pandas PostgresSQL support for loading to DB using fast COPY FROM method

This small subclass of the Pandas sqlalchemy-based SQL support for reading/storing tables uses the Postgres-specific "COPY FROM" method to insert large amounts of data to the database. It is much faster that using INSERT. To acheive this, the table is created in the normal way using sqlalchemy but no data is inserted. Instead the data is saved to a temporary CSV file (using Pandas' mature CSV support) then read back to Postgres using Psychopg2 support for COPY FROM STDIN.