Skip to content

Instantly share code, notes, and snippets.

@seahrh
seahrh / easypipe.py
Created May 1, 2018 03:42 — forked from dannguyen/easypipe.py
Using scikit-learn to classify NYT columnists
# some convenience functions here, nothing new
'''
# usage:
from easypipe import easy_pipeline
from easypipe import print_metrics
data_folder = "data-hold/20news"
p = easy_pipeline()
print_metrics(p, data_folder)
'''
@seahrh
seahrh / easypipe.py
Created May 1, 2018 03:42 — forked from dannguyen/easypipe.py
Using scikit-learn to classify NYT columnists
# some convenience functions here, nothing new
'''
# usage:
from easypipe import easy_pipeline
from easypipe import print_metrics
data_folder = "data-hold/20news"
p = easy_pipeline()
print_metrics(p, data_folder)
'''
# dump messages to stdout, uses old consumer api!
kafka-console-consumer --zookeeper localhost:2181 --topic my_topic --from-beginning
#
kafka-topics --zookeeper localhost:2181 --describe --topic my_topic
# alter topic
from datetime import datetime, timedelta
from airflow import DAG
from airflow import utils
from airflow.operators import BashOperator, EmailOperator, DummyOperator
default_args = {
'owner': 'myowner',
'depends_on_past': False,
'start_date': datetime(year=2017, month=10, day=18, hour=0, minute=0),
'email': ['[email protected]'],
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
default_args = {
'owner': 'myowner',
'depends_on_past': False,
'start_date': datetime(year=2017, month=10, day=18, hour=0, minute=0),
'email': ['[email protected]'],
'email_on_failure': True,
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>versions-maven-plugin</artifactId>
<version>2.3</version>
<configuration>
<rulesUri>file:///${project.basedir}/versions-maven-rules.xml</rulesUri>
</configuration>
<executions>
<execution>
<phase>compile</phase>
<?xml version="1.0" encoding="UTF-8"?>
<ruleset xmlns="http://mojo.codehaus.org/versions-maven-plugin/rule/2.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" comparisonMethod="maven" xsi:schemaLocation="http://mojo.codehaus.org/versions-maven-plugin/rule/2.0.0 http://mojo.codehaus.org/versions-maven-plugin/xsd/rule-2.0.0.xsd">
<ignoreVersions>
<!-- Ignore Alpha's, Beta's, release candidates and milestones -->
<ignoreVersion type="regex">(?i).*Alpha(?:-?\d+)?</ignoreVersion>
<ignoreVersion type="regex">(?i).*Beta(?:-?\d+)?</ignoreVersion>
<ignoreVersion type="regex">(?i).*-B(?:-?\d+)?</ignoreVersion>
<ignoreVersion type="regex">(?i).*RC(?:-?\d+)?</ignoreVersion>
<ignoreVersion type="regex">(?i).*CR(?:-?\d+)?</ignoreVersion>
<ignoreVersion type="regex">(?i).*M(?:-?\d+)?</ignoreVersion>
#!/usr/bin/env bash
export SPARK_MAJOR_VERSION=2
/usr/hdp/current/spark2-client/bin/spark-submit --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--files /path/to/log4j.properties \
--conf spark.yarn.executor.memoryOverhead=1024 \
--conf spark.port.maxRetries=64 \
--conf spark.driver.extraJavaOptions='-Dlog4j.debug -Dlog4j.configuration=file:/path/to/log4j.properties -Da=a1' \
--conf spark.executor.extraJavaOptions='-Dlog4j.debug -Dlog4j.configuration=log4j.properties' \
--master yarn \
use mydb;
set @s='pqrs';
set @d=11.11;
set @pk=15605;
INSERT INTO t1 (s,d,_fk) SELECT * FROM (SELECT @s, @d, @pk) AS tmp
WHERE NOT EXISTS (SELECT s FROM t1 WHERE s=@s and _fk=@pk) LIMIT 1;
commit;
@seahrh
seahrh / README.md
Created January 28, 2018 01:22 — forked from phillipgreenii/README.md
Running NPM Scripts through maven

I am in the process of introducing single page applications to where I work. For development, using node based build tools is much easier for the single page applications. However, the build process for our organization is based upon maven. Our solution started with the maven plugin frontend-maven-plugin. It worked great at first, but then we ran into a situation that I couldn't make work with it.

As stated before, at our organization, we have the older ecosystem which is maven and the newer ecosystem which is node. Our goal was to keep the hacking to a minimum. We did this by putting all of the hacks into a single super node based build file. This is what maven calls and the reason frontend-maven-plugin wasn't sufficient. The super node based build script calls all of the other build scripts by spawning npm run. Try as I might, I could not figure out how to make the spawn work. front-end-maven-plugin downloads npm