Skip to content

Instantly share code, notes, and snippets.

View ryan-williams's full-sized avatar
🚆

Ryan Williams ryan-williams

🚆
View GitHub Profile
@ryan-williams
ryan-williams / Show.scala
Last active August 7, 2018 04:21
Sketch of a pattern for making type-class "templates" that allow downstream users to easily define implementations of the type-class, and configure which types get default instances on the implicit-search path (in the companion object)
// Instead of defining a specific Show type-class, a library can define a template for a Show and its companion:
object ShowTemplates {
// implement custom `Show` type-classes as sub-types of this
trait ShowTemplate[T] {
def apply(t: T): String
}
// mix this in to your `Show`'s companion object, bringing whichever optional instances you want into the default implicit-search path
trait ShowCompanionTemplate[Show[T] <: ShowTemplate[T]] {
def make[T](fn: T ⇒ String): Show[T]
implicit val string: Show[String] = make(identity) // String instance always available
@ryan-williams
ryan-williams / puzzler.scala
Last active August 6, 2018 23:03
hazards of lower-case first-letters in `case object` names in scala
{
sealed trait obj
case object one extends obj
case object two extends obj
def bad(o: obj) =
o match {
case one ⇒ 1
case two ⇒ 2
}
@ryan-williams
ryan-williams / cell-vs-gene-rows.md
Last active July 20, 2018 19:56
Writeup of pros and cons of using cells vs genes as "rows" in distributed matrices of single-cell expression data

(carried over from humancellatlas/table-testing#8)

Caveats

  • i am by no means a (SC) domain expert / this is just my guess about how things will shake out
  • there may well be situations where a person will have a dataset where elements ("rows") correspond to genes, which each include a list of per-cell metrics
    • doing a transpose of a distributed matrix is possible and will be supported
    • the thesis is just that, in "99%" of cases, "rows as cells" will map domain needs to infrastructure-{assumptions,conventions} better than "columns as cells"

Features

@ryan-williams
ryan-williams / http4s.scala
Created July 2, 2018 02:54
Minimal web servers in scala, supporting a simple GET / HEAD endpoint
import $ivy.`org.http4s::http4s-dsl:0.18.13`
import $ivy.`org.http4s::http4s-blaze-server:0.18.13`
import cats.effect._
import org.http4s._
import org.http4s.dsl.io._
import org.http4s.server.blaze._
val response = "This is the response"
val port = 8000
@ryan-williams
ryan-williams / looms.md
Last active July 1, 2018 05:28
Public Loom datasets, taken from http://loom.linnarssonlab.org/, with actual size in bytes of the underlying files added
Title Description Date Cells Size (Bytes) Size
L5_All.loom 2018/04/13 09:33:55 160796 19129203189 17.8G
L6_Neurons.loom 2018/04/04 09:06:19 74539 6194902783 5.8G
L6_Cns_neurons.loom 2018/04/04 09:06:08 70968 5624800899 5.2G
L6_Glia.loom 2018/04/04 09:06:17 66656 3410506583 3.2G
L6_Cns_glia.loom 2018/04/04 09:06:12 52539 1881357834 1.8G
L6_Telencephalon_projecting_neurons.loom 2018/04/04 09:06:13 28858 929955780 887M
L1_Medulla.loom 2018/04/05 23:20:39 65179 480827497 459M
L1_Pons.loom 2018/04/05 23:20:27 62635 455401509 434M
@ryan-williams
ryan-williams / fastavro_it.md
Created June 26, 2018 03:29
Benchmarks and info about WIP Beam fastavro integration test

Notes on the integration test, fastavro_it_test

Benchmarks

I set it to write 10MM synthetic records, with fastavro and avro, and then read them back in, each side reading what it wrote, and then verify that the read PCollections are equal (via a CoGroupByKey).

"Write" pipeline: 10MM records

The fastavro side is 3.5x faster:

@ryan-williams
ryan-williams / spire-version-issue.md
Last active June 13, 2018 22:55
Symptom of a bug resulting from Spark depending on (old) org.spire-math::spire vs library code that uses (new) org.typelevel::spire

While doing some dependency upgrades I saw tests failing like:

[error] Uncaught exception when running BugTest: java.lang.IncompatibleClassChangeError: Class spire.math.IntIsNumeric does not implement the requested interface cats.kernel.Eq
[error] sbt.ForkMain$ForkError: java.lang.IncompatibleClassChangeError: Class spire.math.IntIsNumeric does not implement the requested interface cats.kernel.Eq

It turns out that Spark MLLib depends transitively on org.spire-math:spire_2.11:0.13.0 via breeze, and when I had an unrelated exclusion in my build.sbt:

@ryan-williams
ryan-williams / build.sc
Last active June 9, 2018 18:10
mill 0.2.2 dependency on cats-core 1.0.1 crashes console
import mill._, mill.scalalib._
object test1 extends ScalaModule {
def scalaVersion = "2.11.12"
}
object test2 extends ScalaModule {
def scalaVersion = "2.11.12"
def ivyDeps = Agg(ivy"org.typelevel::cats-core:1.0.1")
}
object test3 extends ScalaModule {
def scalaVersion = "2.12.6"
@ryan-williams
ryan-williams / Plugin.scala
Last active April 22, 2018 14:22
SBT plugin exposing module-name property which is the artifact-name as Ivy or Maven would see it: "foo_2.12"
import sbt.Keys._
import sbt._
import sbt.PluginTrigger.AllRequirements
import sbt.librarymanagement.CrossVersion
// Put this in project/Plugin.scala to expose `modName` setting which is your project's name and scala-binary-version (e.g. "foo_2.12").
object Plugin
extends AutoPlugin {
override def trigger = AllRequirements