Takeshi Yamamuro maropu

🌴

On vacation

OSS engineer@R&D, Ph.D. in CS (Database Systems) - Apache Spark PMC&committer, Apache Hivemall PPMC, PostgreSQL enthusiast - LLVM/C/C++11/Java/Scala/Rust/Python

270 followers · 0 following

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

maropu / ComplExでベクトル化された関係の索引化（with Vertex AI Matching Engine）に関して

Created October 3, 2022 06:49

	- Complex Embeddings for Simple Link Prediction, https://arxiv.org/abs/1606.06357
	- Vertex AI Matching Engine: https://cloud.google.com/vertex-ai/docs/matching-engine

maropu / memorize failed and canceled tests in scalatest

Last active August 24, 2021 04:31

	$ ./build/mvn clean test -DmemoryFiles=rerun.txt
	$ cat
	TestFailed Some(org.apache.spark.api.python.RepairSuite) org.apache.spark.api.python.RepairSuite None
	TestFailed Some(org.apache.spark.api.python.DepGraphSuite) org.apache.spark.api.python.DepGraphSuite Some(computeFunctionalDepMap)

	$ ./build/mvn clean test -DtestsFiles=rerun.txt
	Run starting. Expected test count is: 5
	DepGraphSuite:
	13:29:58.598 WARN org.apache.spark.util.Utils: Your hostname, maropus-MacBook-Pro.local resolves to a loopback address: 127.0.0.1; using 192.168.3.2 instead (on interface en0)
	13:29:58.599 WARN org.apache.spark.util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address

maropu / spark-janino-v3.1.3

Last active November 4, 2022 11:15

	Spark session available as 'spark'.
	Welcome to
	____ __
	/ __/__ ___ _____/ /__
	_\ \/ _ \/ _ `/ __/ '_/
	/___/ .__/\_,_/_/ /_/\_\ version 3.2.0-SNAPSHOT
	/_/

	Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
	Type in expressions to have them evaluated.

maropu / Collects the elapsed time of internal function calls in pandas_udf

Created January 27, 2021 03:08

	import time
	from collections import Counter

	from pyspark.accumulators import AccumulatorParam
	from pyspark.sql.functions import col, pandas_udf, PandasUDFType

	class UdfMetricAccumulatorParam(AccumulatorParam):
	def zero(self, value):
	init_value = {}
	return init_value.update(value)

maropu / show_elapsed_time.py

Created January 26, 2021 23:43 — forked from harupy/show_elapsed_time.py

	@pytest.hookimpl(hookwrapper=True)
	def pytest_report_teststatus(report, config):
	outcome = yield
	res = outcome.get_result()

	attr_name = "___TIME___"
	if report.when == "setup":
	# HACK: store the start time in `config`
	setattr(config, attr_name, time.time())
	elif report.when == "call":

maropu / Dump TPCDS data stats (SPARK-32564)

Last active October 19, 2020 06:08

	// export SPARK_HOME=<YOUR_SPARK_V3_0>
	$ git clone https://github.com/maropu/spark-tpcds-datagen.git
	$ cd spark-tpcds-datagen
	$ ./bin/datagen --master=local[*] --conf spark.driver.memory=8g --scale-factor 10 --output-location /tmp/tpcds-sf-10
	scala> :paste
	import org.apache.spark.sql.catalyst.catalog.CatalogColumnStat
	import org.apache.spark.sql.execution.datasources.LogicalRelation
	import org.apache.spark.sql.types.DataType

	sql("SET spark.sql.cbo.enabled=true")

maropu / Markov Logic Network

Created February 7, 2020 00:01

	# https://qiita.com/9_ties/items/3bdb177384937ddc88df
	# https://homes.cs.washington.edu/~pedrod/papers/mlj05.pdf
	import pandas as pd
	import numpy as np
	from scipy.special import logsumexp
	from itertools import product

	const = ['A', 'B']
	preds = [('Smokes', 1), ('Cancer', 1), ('Friends', 2)] # Predicate and arity

maropu / Scala Reflection API

Last active April 12, 2018 08:01

	///////// Invocation of Scala collection object methods /////////

	---
	scala> import scala.reflect.runtime.universe._

	scala> val mapClazz = scala.collection.immutable.Map.getClass
	mapClazz: Class[_ <: scala.collection.immutable.Map.type] = class scala.collection.immutable.Map$

	scala> val mirror = runtimeMirror(mapClazz.getClassLoader)
	mirror: reflect.runtime.universe.Mirror = JavaMirror with ...

maropu / Snappy+BitShuffle

Last active October 19, 2020 06:17

We couldn’t find that file to show.