Skip to content

Instantly share code, notes, and snippets.

View javierluraschi's full-sized avatar
👨‍💻
Coding @hal9ai

Javier Arturo Porras Luraschi javierluraschi

👨‍💻
Coding @hal9ai
View GitHub Profile
@javierluraschi
javierluraschi / build-arrow-appveyor-env.md
Last active February 27, 2019 04:26
Building Apache Arrow in AppVeyor Environment
pacman -S --noconfirm $MINGW_PACKAGE_PREFIX-boost
pacman -S --noconfirm $MINGW_PACKAGE_PREFIX-brotli
pacman -S --noconfirm $MINGW_PACKAGE_PREFIX-cmake
pacman -S --noconfirm $MINGW_PACKAGE_PREFIX-flatbuffers
pacman -S --noconfirm $MINGW_PACKAGE_PREFIX-gcc
pacman -S --noconfirm $MINGW_PACKAGE_PREFIX-gobject-introspection
pacman -S --noconfirm $MINGW_PACKAGE_PREFIX-gtk-doc
pacman -S --noconfirm $MINGW_PACKAGE_PREFIX-lz4
@javierluraschi
javierluraschi / analysing-twitter-stream-using-spark-and-r.md
Last active April 14, 2021 15:49
Analyzing Twitter Stream using Spark and R
https://kafka.apache.org/quickstart

wget http://apache.claz.org/kafka/2.1.0/kafka_2.12-2.1.0.tgz
tar -xzf kafka_2.12-2.1.0.tgz

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
@javierluraschi
javierluraschi / building-arrow-r-bindings-in-windows.md
Created December 17, 2018 21:11
Building Arrow R Bindings in Windows

Building Arrow R Bindings in Windows

This doc captures results from investigating how to build the Arrow R bindings in Windows. The following options were explored:

  • Building Arrow in MSys and bindings in RTools.
  • Building Arrow and bindings in RTools.

The most promising long-term solution is to write a CMake generator that can be run from RTools; short term, we can continue making progress compiling Arrow from MSys.

This document explains other approaches considered and provides additional details.

@javierluraschi
javierluraschi / building-rwinlib-arrow-0.11.md
Last active December 7, 2018 01:22
Building rwinlib/arrow 0.11
  • Create EC2 Windows Machine and connect.
  • Download and install MSys2.
  • Launch c:\msys64\mingw64 and run:
pacman -S base-devel
pacman -S msys2-devel
pacman -S mingw-w64-i686-toolchain
pacman -S mingw-w64-x86_64-toolchain
pacman -S mingw-w64-x86_64-cmake
@javierluraschi
javierluraschi / building-rwinlib-arrow-0.9.md
Created December 6, 2018 05:24
Building rwinlib/arrow 0.9
  • Create EC2 Windows Machine and connect.
  • Download and install MSys2.
  • Launch c:\msys64\mingw64 and run:
pacman -S base-devel
pacman -S msys2-devel
pacman -S mingw-w64-i686-toolchain
pacman -S mingw-w64-x86_64-toolchain
pacman -S mingw-w64-x86_64-cmake
@javierluraschi
javierluraschi / experiments-pyarrow-emr.py
Last active November 30, 2018 02:02
Expermients with pyarrow in EMR
sudo pip install PyArrow
./pyspark --master yarn --num-executors 2
from pyspark.sql.functions import rand
df = spark.range(1 << 22).toDF("id").withColumn("x", rand())
from pyspark.sql.functions import udf
@udf('double')
@javierluraschi
javierluraschi / installing-arrow-emr.md
Last active April 19, 2020 23:15
Install Apache Arrow in Amazon EMR

Automated Install

EMR Configuration, replace <a-github-pat> with a valid PAT:

[{
  "configurations":[{
    "classification":"export",
    "properties":{"GITHUB_PAT":"<a-github-pat>"}
 }],
@javierluraschi
javierluraschi / rstudio-1.2-package-improvements.Rmd
Last active August 9, 2018 01:10
RStudio 1.2: Package Improvements
RStudio 1.2 comes with improvements to manage **package repos** and adds better support to packages that provide **testing infrastructure** and **database connectivity**.
## Package Repos
Some organizations set up private CRAN repos to share internal packages, one way of accomplishing this is by
creating a private CRAN repos as described in the [R Admin Guide](https://cran.r-project.org/doc/manuals/R-admin.html#Setting-up-a-package-repository). Other projects rely on `drat` to provide CRAN-compatible repos. With RStudio 1.2 you can easily configure and prioritize: private, primary or secondary CRAN repos through the preferences pane. For instance, at the
time of this writting, the `limer` package from the `cloudyr` project was not available on CRAN; however, the `cloudyR`
project provides a `drat` repo `http://cloudyr.github.io/drat` that we can easily add as a secondary repo:
`IMAGE`
@javierluraschi
javierluraschi / install-rstudio-server-suse.Rmd
Created May 29, 2018 21:03
Install RStudio Server in Suse
Verified using Amazon EC2 Suse 12 machine.
See [http://download.opensuse.org/repositories/devel:/languages:/R:/patched/](http://download.opensuse.org/repositories/devel:/languages:/R:/patched/), then from [https://forums.opensuse.org/showthread.php/517620-Installing-R](https://forums.opensuse.org/showthread.php/517620-Installing-R):
```bash
zypper ar -f http://download.opensuse.org/repositories/devel:/languages:/R:/patched/openSUSE_12.3/ R-patched
zypper --gpg-auto-import-keys ref
zypper in R-patched R-patched-devel
```
https://api.github.com/search/repositories?q=language:R+stars:%3C10000&page=10&sort=stars
...
https://api.github.com/search/repositories?q=language:R+stars:%3C1&page=10&sort=stars