这篇文章的目的是让读者了解type-safety的重要性! 概念不限定语言, 虽然例子是用scala写的.

A high-level definition
Motivation
Returning the "Current" Type
- The problem
- The solution
最后希望之前不理解type-safety重要性的同学看到这里能说一句, 真香!

先回答两个基本问题:

什么是type-safety?
为什么需要type-safety?

然后结合下面这个具体问题, 简单介绍一种实现type-safety的方法:

I have a type hierarchy … how do I declare a super-type method(polymorphic method) that returns the “current” type?

Type theory是个大坑, type-safety的定义与实现方式也比较复杂, 这里忽略了很多细节.

A high-level definition

A static type is something you know about a value in our program at compile time.

这里强调了static type.

因为所有程序都要用到内存, 所以所有编程语言都有自己的type system. 拿Java和scala举例, 他们除了有static type还有reified type(static type续命到了runtime) -- 每个value都在runtime时关联一个class或者interface, 所以我们可以在runtime做type-casting和type-checking; 而对于python和js这类动态语言来说, 没有static type, 只有dynamic type -- 所有的type信息只存在于runtime.

我们平常说的静态语言/动态语言含义是混乱的. 我们需要区分下面两个概念:

Type本身是static还是dynamic;
Type-checking发生在complile-time还是runtime.

Type-safety is making use of what we know of our values at compile-time to minimize the consequences of most mistakes.

这里的safety只针对static type和static type-checking.

Type-safety定义在不同context下是不一样的, Type-safety到底有多safe? Type-safety主要受两个变量影响:

他的上限和下限(safety-level区间)由type system决定. 上面说了所有的语言都有type system, 因此所有的语言都有独特的safety定义;
Safety-level区间内取哪个level由我们程序员决定 -- 我们想把"the consequences of most mistakes"减小到什么程度? Coding时遵循的一些rule, style, bp, design pattern, framework决定了具体的safety-level.

拿scala举个粗略的例子:

Scala的type system提供了classes, traits, type inference, type-bounds, context bounds, higher-kinded type, implicit等功能. 他们使Scala的type-safety下限不低, 上限极高.
如果coding时遵循scalazzi原则, 可以获得非常高level的type-safety:
- ~~null~~
- ~~exceptions~~
- ~~type-casing (isInstanceOf)~~
- ~~type-casting (asInstanceOf)~~
- ~~side-effects~~
- ~~object.(equals/toString/hashCode)~~
- ~~object.(notify/wait)~~
- ~~classOf/object.getClass~~
- General recursion

Motivation

很多时候(不是绝对)我们想push the limits of type-safety as hard as possible. 下面我们从头开始讲为啥.

Complexity is the enemy of "scalability"

我们先从functional programming的角度来看看复杂程序是怎么写出来的.

一个将Int转换成String的函数:

f: Int -> String

Int type是函数的domain, String type是他的codomain, f的定义保证Int domain中的每个i都对应一个String codomain中的s -- 一个有意义的mapping.

程序scale的基本方式是 -- function composition. 两个函数compose:

如果还有一个 g: String -> Boolean

我们必然可以得到一个新的函数 h : Int -> Boolean: h = g(f(x))

Everything is a function, 程序中所有语义层面的东西都是可以compose的. 一个复杂的程序是由很多个function以某种先后关系compose得到的. 但是Composition不能随随便便, compose之后新的domain与codomain之间的mapping必须是有意义的. 换句话说, f的codomain必须在g的domain里面.

当一个程序变得越来越复杂, 他的function composition关系是指数增加的. 人工检查来保证composition有意义已经不可能了; 另外我们还希望线上环境尽量稳定不出错, 所以"保证composition有意义"这个过程最好在编译时完成. 基于这两点, 我们需要一个聪明的, 自动的, 静态的工具.

Static type-checking is our saver

这个工具就是type system.

Type system可以类比分布式系统中的protocol -- 他是function composition遵循的protocol. Compiler type-checking是protocol的driver -- 我们使用type system设计一套protocol, compiler在编译时会自动检查程序是否符合我们设计的protocol.

Side note: type system有一个非常叼的副产品 -- static code analysis. Ide使用static code analysis为我们提供lint, auto complete, refactor等功能, 大大提升生产效率.

拿scala和java举例, 遵守基本语法并合理设计class diagram, 已经实现了一定程度的type-safety, 舒服! 👏

真的舒服了吗? 并不是!

Be more specific(stronger) in type systems

我们真正想要的业务逻辑跟手撸出来的程序的语义之间存在差距. 我们当然希望两者完全一样, 但是现实很残酷, 人为检查往往要花费大量的精力. 这时候我们想到了type system. 能不能有一个功能丰富的type system, 使我们可以将部分底层关键的业务逻辑写到type里面, 让compiler帮我们静态检查正确性? 换句话说, 我们想让type system变得more specific(我们平常说的"强"类型指的就是specific).

下面这个例子说明了如何逐步将type变得more specific:

// Future is a Type used for asynchronous operations.
// Subtypes for Future a. Success b. Failure 
// Either is a Type used for capturing Errors or Valid results.
// Subtypes for Either a. Left b. Right (Left for Error, Right for (valid) Result)
// ErrorType is a custom type
// NonEmptyList is a Type used to prevent creation of an Empty List
1. Future[Result]
2. Future[Either[String, Result]]
3. Future[Either[ErrorType, Result]]
4. Future[Either[NonEmptyList[ErrorType], Result]]]

其次, 我们想遵守DRY原则, 我们想写高度abstract/generic的代码. 我们需要一个功能强大的type system, 来兼顾type safety和generic, 虽然很多时候两者是矛盾的. (这是个复杂的话题, 想深入了解的话请搜索"category theory", "shapeless", "cats scala"这三个关键词)

另外, 编程语言往往允许我们越过static type-checking做一些有潜在危险的操作. 例如reflection, type casting, throw exctption, null等等. 虽然用起来一时爽, 但是其实是在给自己(还有一脸无辜的同事)埋坑.

上面三点直接导致实现strong type-safety的成本大幅增加. 想要type system be more specific, 必须要付出额外的思考, 对码农的能力也有更高的要求, 代码量也会比原来多很多.

Facing the trade-offs

Precise function signatures makes it harder to fuck up. Both, the function implementer and function caller benefit from this information, but coming up with the right type involves a deeper thought than just picking up first thing which comes to your mind.

所以指导原则是: 如果项目越复杂(代码量, module数量, 依赖), 一起合作编写的同事越多, 项目迭代的越频繁, 就应该遵守level越高的type safety. 对于简单的项目追求非常强的type-safety是浪费时间, 而对于复杂的项目, 花额外的成本实现较强的type-safety, 最后必然能提高工程质量和效率.

To sum up

Type-safety是我们应对日益增长的程序复杂度的一个关键原则.

Returning the "Current" Type

The problem

有这样一个pet type(or interface, trait, whatever):

trait Pet {
  def nickname: String
  def age: Duration
}

case class Frog(name: String, age: Int) extends Pet

我们想定义一个抽象的prolongLife(plus: Duration)方法. 作为一个习惯�java风格的程序员, 我们可能会撸出下面这几行代码:

trait Pet {
  def name: String
  def age: Duration
  def prolongLife(d: Duration): Pet
}

case class Frog(name: String, age: Duration) extends Pet {
  override def prolongLife(d: Duration): Frog = Frog(name, age + d)
}

如果我们对一个frog instance调用prolongLife方法:

val frog = Frog("mogician", (92 * 365).days)
frog.prolongLife(1.second)
val pet: Pet = Frog("mofashi", (92 * 365).days)
pet.prolongLife(1.second)

第一种调用没有问题, 但代码里常见的是第二种调用. pet.prolongLife返回的type是Pet, 并不是"current" type Frog. 如果我们想要得到Frog, 必须在runtime做type判断:

// java style type-casting
if (pet.isInstanceOf[Frog]) pet.asInstanceOf[Frog]

// scala. Make trait sealed should increase type-safey, but still not checked by compiler staticly
sealed trait Pet {...}
pet match {
  case frog: Frog => frog
  case _ => throw new ClassCastException("boom!")
}

这样的代码绕过了static type checking. 首先, compiler不会阻止下面这种明显逻辑错误的代码:

case class Cat(name: String, age: Duration) extends Pet {
  override def prolongLife(d: Duration): Frog = Frog(name, age + d)
}

其次, 虽然对于这种简单的type casting逻辑, 最开始正确实现并不难, 但是随着程序一遍遍迭代, 代码变得越来越复杂, 这些type casting会变成最有可能被忽略的bug, 把程序搞炸.

老练的��你可能已经发现, 这个问题可以通过parameterized polymorphism(在java里面就是泛型)解决. 于是我们�又撸出了下面这些代码:

trait Prolongable[P] {
  def prolongLife(d: Duration): P
}
case class Frog(name: String, age: Duration) extends Prolongable[Frog] {
  override def prolongLife(d: Duration): Frog = Frog(name, age + d)
}

这样写确实解决了"current" type的问题, 但是引入了�另一个问题 -- 阻碍�high-level generic代码的编写. 请看下面这个例子:

// �改变单个p, 这个�generic方法乍一看没问题
def prolong[P <: Prolongable[P]](d: Duration, p: P): P = p.prolongLife(d)
// 我们把�generic成都提高, 改变list of Ps, 问题就�明显起来了
def prolongAll[P <: Prolongable[P]](d: Duration, pList: List[P]): List[P] = ???

如果我们传入一个�empty list, 我们�就拿不到Prolongable的instance, 因此也无法调用Prolongable.prolongLife方法. 当然我们可以在prolongAll里面做特殊判断, 但是这样就打破了抽象封装 -- 传进来如果不是list而是map/set/tree等等, 都要针对写特殊的判断, 完全不是generic方法了.

这个本质的问题是: �我们想generic代码和不想generic的代码在同一个type里面. 我们只想对Prolongable generic, 不想对P generic, Prolongable是共性, 而P就是会变的. �如果P继承Prolongable, 那我们只能捆绑操作Prolongable和P.

The solution

在scala和haskell中, 最优的解法(兼顾"强"type-safety和polymorphism)是使用typeclass. 关于typeclass的简介请看另一篇简介. 这里以scala举例, :

// ADTs
sealed trait Pet {
  def name: String
  def age: Duration
}
case class Frog(name: String, age: Duration) extends Pet
// typeclass
trait Prolongable[P] {
  def prolongLife(pet: P, d: Duration): P
}
object Prolongable {
  implicit val frogPet = new Prolongable[Frog] {
    override def prolongLife(pet: Frog, d: Duration): Frog = pet.copy(age = pet.age + d)
  } 
}
// caller
def worship[P: Prolongable](pet: P, d: Duration): P = implicitly[Prolongable[P]].prolongLife(pet, d)

我们这样调用:

val frog = Frog("mogician", (92 * 365).days)
worship(frog, 1.second)
val pet: Pet = Frog("mofashi", (92 * 365).days)
worship(pet, 1.second) // �编译不通过!!!!

这样返回的type永远不可能是Pet, 只能是"current" type. Looks Great, finally!

最后希望之前不理解type-safety重要性的同学看到这里能说一句, 真香!

SabaPing/type-safety-101.md