Jump to Results:
Simple experiment showed seccomp-based syscall ~5 times slower than vanila one.
Calling write
syscall directly:
const unsigned count = UINT_MAX / 10000;
unsigned i = 0;
#!/usr/bin/env python | |
import sys | |
import re | |
def main(): | |
regexp = re.compile(r'^(\S+)\((.*)\)\s+=\s+(\d+)$') | |
whitelist = ['read', 'write', 'fstat', 'lseek', 'fcntl'] | |
opened_fd = {} |
Jump to Results:
Simple experiment showed seccomp-based syscall ~5 times slower than vanila one.
Calling write
syscall directly:
const unsigned count = UINT_MAX / 10000;
unsigned i = 0;
# Gnu Pth as thread library for impalad | |
In short, it's impossible to use Gnu Pth library with `impalad` "AS IS", i.e. without modification. | |
Gnu Pth: | |
* Gnu Pth can't fully replace `pthreads`. It lacks some functions, some entities. | |
* It doesn't provide versioned symbols | |
There are some `*.so` libraries (system/thirdparty) which come precompiled and they are linked against versioned symbols. Be prepared to recompile them replace somehow or just do anything. Example: |
## Git repo | |
Find modified impala [here](https://github.com/rampage644/impala-cut). First, have a look at [this](https://github.com/rampage644/impala-cut/blob/executor/README.md) *README* file. | |
## Task description | |
Original task was to prune impalad to some sort of *executor* binary which only executes part of query. Two approaches were suggested: top-down and bottom-up. I used bottom-up approach. | |
My intention was to write unittest that whill actually test the behavior we need. So, look at `be/src/runtime/plan-fragment-executior-test.cc`. It contains all possible tests (that is, actual code snippets) to run part of query with or without data. Doing so helped me a lot to understand impalad codebase relative to query execution. |
Just collecting information about unikernels/kvm and friends. Little osv source code digging with no actual result. Discussions.
plan-fragment-executor-test
run under OSv
tcmallocstatic
. First, OSv
doesn't support sbrk
-based memory management. One has to tune tcmallocstatic
not to use SbrkMemoryAllocator
at all (comment #undef HAVE_SBRK
in config.h.in
). Second, it still fails with invalid opcode exception.yum-config-manager --add-repo http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
yum install impala-server impala-catalog impala-state-store impala-shell
ln -sf /usr/lib/hbase/lib/hbase-client.jar /usr/lib/impala/lib
ln -sf /usr/lib/hbase/lib/hbase-common.jar /usr/lib/impala/lib
ln -sf /usr/lib/hbase/lib/hbase-protocol.jar /usr/lib/impala/lib
import java.text.SimpleDateFormat | |
import java.util.Date | |
import org.apache.spark.{SparkContext, SparkConf} | |
import org.apache.spark.sql.{SaveMode, Row, SQLContext} | |
import com.databricks.spark.csv.CsvSchemaRDD | |
import org.apache.spark.sql.functions._ | |
This document describes sample process of implementing part of existing Dim_Instance
ETL.
I took only Clound Block Storage source to simplify and speedup the process. I also ignnored creation of extended tables (specific for this particular ETL process). Below are code and final thoughts about possible Spark
usage as primary ETL tool.
Basic ETL implementation is really straightforward. The only real problem (I mean, really problem) is to find correct and comprehensive Mapping document (description what source fields go where).