Skip to content

Instantly share code, notes, and snippets.

View myui's full-sized avatar

Makoto YUI myui

View GitHub Profile
%
% As used by Kilpatrick, D. & Cameron-Jones, M. (1998). Numeric prediction
% using instance-based learning with encoding length selection. In Progress
% in Connectionist-Based Information Systems. Singapore: Springer-Verlag.
%
% Deleted "vendor" attribute to make data consistent with with what we
% used in the data mining book.
%
@relation 'cpu'
@attribute MYCT real
@relation weather.symbolic
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,hot,high,FALSE,no
rowid weight specific_heat reflectance
1 69.613 129.070 52.111
2 70.670 128.161 52.446
3 72.303 128.450 52.853
4 73.759 127.522 51.786
5 74.085 129.067 53.352
6 74.561 134.031 50.992
7 74.911 134.944 50.744
8 75.205 129.162 52.800
9 75.395 129.711 52.844
/*
* Hivemall: Hive scalable Machine Learning Library
*
* Copyright (C) 2015 Makoto YUI
* Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST)
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
/*******************************************************************************
* Copyright (c) 2010 Haifeng Li
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
/*
* Hivemall: Hive scalable Machine Learning Library
*
* Copyright (C) 2015 Makoto YUI
* Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST)
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
@myui
myui / kaggle_dataget.sh
Last active October 28, 2015 01:54
Kaggle data get script
curl -s https://www.kaggle.com/c/titanic/data | pup 'tbody tr td a attr{href}' | awk '{print "https://kaggle.com" $1}' > urls.txt
# pup is required
# https://github.com/ericchiang/pup
for f in `cat urls.txt`;
do
echo "downloading file: ${f}"
wget -O `basename ${f}` https://www.kaggle.com/account/login?ReturnUrl=${f} --post-data "username=${KAGGLE_USER}&password=${KAGGLE_PASSWD}"
echo
done
ble to execute method public boolean hivemall.fm.FMPredictUDAF$Evaluator.iterate(org.apache.hadoop.hive.serde2.io.DoubleWritable,java.util.List,org.apache.hadoop.hive.serde2.io.DoubleWritable) throws org.apache.hadoop.hive.ql.metadata.HiveException on object hivemall.fm.FMPredictUDAF$Evaluator@3272e5d6 of class hivemall.fm.FMPredictUDAF$Evaluator with arguments {1.3495872020721436:org.apache.hadoop.hive.serde2.io.DoubleWritable, []:java.util.ArrayList, 1.0:org.apache.hadoop.hive.serde2.io.DoubleWritable} of size 3
at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1253)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.iterate(GenericUDAFBridge.java:170)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:184)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641)
at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838)
Hive history file=/mnt/hive/tmp/1/hive_job_log_eb8582f1-0cc1-4fdc-99fc-52b4896e5f74_1231952821.txt
**
** WARNING: time index filtering is not set!
** This query could be very slow as a result.
** If you used 'unix_timestmap' please modify your query to use TD_SCHEDULED_TIME instead
** or rewrite the condition using TD_TIME_RANGE
** Please see http://docs.treasure-data.com/articles/performance-tuning#leveraging-time-based-partitioning
**
**
** WARNING: time index filtering is not set!
Caused by: java.lang.NumberFormatException: empty String
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1020)
at java.lang.Double.parseDouble(Double.java:540)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:744)
at hivemall.ftvec.trans.QuantitativeFeaturesUDF.evaluate(QuantitativeFeaturesUDF.java:91)
at hivemall.ftvec.trans.QuantitativeFeaturesUDF.evaluate(QuantitativeFeaturesUDF.java:41)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:166)
at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:77)
at hivemall.tools.array.ConcatArrayUDF.evaluate(ConcatArrayUDF.java:86)