As you all know R users tend to install packages from CRAN using “install.packages”. Making DR available there would greatly help adoption.
- Source package
- Binary package
- Hybrid
import jinja2 | |
import json | |
import logging | |
import os | |
import requests | |
import tempfile | |
import pykube.config | |
import pykube.http |
import numpy as np | |
import seaborn as sns | |
import matplotlib.pyplot as plt | |
sns.set(style="white", context="talk") | |
# Set up the matplotlib figure | |
f, (ax1) = plt.subplots(1, 1, figsize=(10, 6), sharex=True) | |
# Specify data |
FROM centos:centos7 | |
# install required packages | |
RUN yum -y install vim openssh-server sudo glibc tar openssh-clients initscripts | |
# create user | |
RUN useradd --create-home jorgem | |
RUN mkdir -p /home/jorgem/.ssh/ | |
ADD id_rsa.pub /home/jorgem/.ssh/id_rsa.pub | |
ADD id_rsa /home/jorgem/.ssh/id_rsa |
sudo apt-get install libXt-dev
sudo apt-get install texinfo
sudo apt-get install texlive-latex-base
sudo apt-get install texlive-fonts-extra
Debugging workers and executors is hard because they are started automatically. One possible way is to sleep for a few seconds when the programs start. This gives us time to attach a debugger before the programs does anything.
One option is to create 2 files: /tmp/r_executor_startup_sleep_secs
and /tmp/r_executor_startup_sleep_secs
. The first thing the workers and executors do is to check if that file exists. If it exists the processes sleep for the number of seconds specified in the file:
$ cat /tmp/r_executor_startup_sleep_secs
30
This is one possible flow for backporting fixes to old branches.
Let's say we want to backport commit f54200217d57c64bdeac93192aa3ff9fc53d5890
to branch DistR-1_0_x
.
First we create a local Distr-1_0_x
branch:
$ git checkout -b DistR-1_0_x remotes/origin/DistR-1_0_x
Then we backport the commit with git cherry-pick
:
I've being studying the memory usage, especially for serialize
. For my tests I'm creating a data frame with 50M rows of doubles that occupies 400MB. I'm using /usr/bin/time -v
to gauge memory usage. (In my tests R always has an overhead of 20M, that's the reason why 420MB is reported instead of 400MB).
jorgem@ubuntu:~$ cat df.R
di <- data.frame(runif(50e6,1,wh10))
jorgem@ubuntu:~$ /usr/bin/time -v Rscript df.R 2>&1|grep resident|grep Max
Maximum resident set size (kbytes): 421332
If we add serialization the memory peak is 1.2GB:
In some compiler versions (e.g. GCC 4.6.4 in Ubuntu) when compiling with -rdynamic
two functions with the same name (e.g. dataptr
defined both in routines.h
and barrier.cpp
) are placed in the dynamic symbol table. Subsequently When we do R_GetCCallable("Rcpp", "dataptr")
we get the wrong function at runtime.
When Rcpp registers dataptr
it means this function (in barrier.cpp
):
// [[Rcpp::register]]
void* dataptr(SEXP x){
return DATAPTR(x);
}