Skip to content

Instantly share code, notes, and snippets.

@tanishiking
Last active August 5, 2024 11:16
Show Gist options
  • Save tanishiking/34fae60efade961406fe252c1be9ac91 to your computer and use it in GitHub Desktop.
Save tanishiking/34fae60efade961406fe252c1be9ac91 to your computer and use it in GitHub Desktop.

Wasm-Native String

I attempted to implement a String representation in our Wasm backend using an i16 array with WasmGC. However, we cannot simply switch the String representation from a JS String to an i16 array, as we still rely on JS Strings for JS interoperation. The plan is to allow these two String representations to coexist, using i16 arrays wherever possible while retaining JS Strings where necessary. This way, we should be able to keep the test suites passing, start using i16 arrays, and gradually remove JS interoperation.

To achieve this, we need to convert between JS Strings and i16 array Strings. The main idea is to

  • convert an i16 array String to a JS String when upcasting String into Any and
  • convert a JS String into an i16 array when down casting from Any to String
  • Additionally, we must handle conversions between JS Strings and i16 arrays when passing Strings between non-JS and JS classes (because we keep using JSStrings inside the JSClasses).
  • Select from JSClass from Non-JSClass, and vise-versa.

I tried to implement this approach, and most test suites passed, but I couldn't get all of them be green.

Here's what I did and what isn't working:

complete diff: https://github.com/tanishiking/scala-js/pull/2/files/d8602d851243c243ee4fb68fbe64708c3521ce49..3c27517045a277fcf89e50cd4645af0ec05b2048


Change in TypeTransformer

First of all, we have two different TypeTransformers:

  • JSTypeTransformer, which uses JSString ((null)? any) for Strings.
  • WasmTypeTransformer, which uses (ref (null)? (array (mut i16))) for Strings.
// TypeTransformer.scala
object JSTypeTransformer extends TypeTransformer {
  override val useWasmString: Boolean = false
  override val stringType: Types.Type = watpe.RefType.any
  override val boxedStringType: Types.Type = watpe.RefType.anyref
}

object WasmTypeTransformer extends TypeTransformer {
  override val useWasmString: Boolean = true
  override val stringType: Types.Type = watpe.RefType(genTypeID.i16Array)
  override val boxedStringType: Types.Type = watpe.RefType.nullable(genTypeID.i16Array)
}

We use JSTypeTransformer in JSClasses, and WasmTypeTransformer for others.

// ClassEmitter.scala
val typeTransformer =
  if (clazz.kind.isJSType) TypeTransformer.JSTypeTransformer
  else TypeTransformer.WasmTypeTransformer

The transformType method transforms StringType and ClassType(BoxedStringType) to any and anyref in WasmTypeTransformer.

def transformType(tpe: Type)(implicit ctx: WasmContext): watpe.Type = {
  tpe match {
    case AnyType => watpe.RefType.anyref
    case ClassType(className) if className == BoxedStringClass => boxedStringType
    case ClassType(className) => transformClassType(className)
    case StringType => stringType
    case UndefType => watpe.RefType.any
    // ...

Convert when passing String between JS and non-JS Class

When passing a String from a non-JS class to a JS class, we need to convert the i16 array String into a JS String and vice versa. We also need to convert the JS String back to an i16 array String when return String back to non-JS class.

I defined the methods genAdaptArgString and genAdaptResultString. These methods generate the necessary transformations between JS Strings and i16 arrays. We generate these conversions on the caller side.

  private def genAdaptArgString(
    paramType: Type,
    callerUsesWasmStr: Boolean,
    calleeUsesWasmStr: Boolean,
    argNullable: Boolean = true // make it false when callee knows it's not nullable
  ): Unit = {
    if (paramType == StringType || paramType == ClassType(BoxedStringClass)) {
      val nullable = (paramType == ClassType(BoxedStringClass) && argNullable)

      // if caller uses Wasm string (i16 array), and callee doesn't,
      // transform i16 array to JS string
      if (callerUsesWasmStr && !calleeUsesWasmStr) {
        if (nullable) fb += wa.Call(genFunctionID.createJSStringFromArrayNullable)
        else fb += wa.Call(genFunctionID.createJSStringFromArray)
      // if caller doesn't use Wasm string (i16 array), and callee does,
      // transform JS string into i16 array
      } else if (!callerUsesWasmStr && calleeUsesWasmStr) {
        if (nullable) fb += wa.Call(genFunctionID.createArrayFromJSStringNullable)
        else fb += wa.Call(genFunctionID.createArrayFromJSString)
      }
    }
  }

// genAdaptResultString is almost same, it does the same conversion in an opposite way

For example, genArgs and genReceiverNotNull

private def genArgs(args: List[Tree], methodName: MethodName, receiverClassKind: ClassKind)(
    implicit typeTransformer: TypeTransformer): Unit = {
  for ((arg, paramTypeRef) <- args.zip(methodName.paramTypeRefs)) {
    val paramType = ctx.inferTypeFromTypeRef(paramTypeRef)
    genTree(arg, paramType)
    genAdaptArgString(
      paramType,
      // we know caller uses i16Array string by checking `TypeTransformer.useWasmstring`
      callerUsesWasmStr = typeTransformer.useWasmString,
      // we know callee uses i16Array string by the receiver's class kind is NOT JSType
      calleeUsesWasmStr = !receiverClassKind.isJSType
    )
  }
}

def genReceiverNotNull(): Unit = {
  genTreeAuto(receiver)
  fb += wa.RefAsNonNull
  genAdaptArgString(
    receiver.tpe,
    typeTransformer.useWasmString,
    !receiverClassInfo.kind.isJSType,
    argNullable = false
  )
}

We execute genAdaptResultString after Call (or CallRef).

Convert to JSString on upcast String to Any

We convert the i16 array string to a JavaScript string when upcasting a String to AnyType, only when the surrounding class is not a JavaScript class. Inside a JavaScript class, the string is already a JavaScript string, so upcasting is a no-op.

I added the conversion when the generatedType is CharSequence. In this case, the stack should contain either an i16 array or an instance of CharSequence. For the former, a conversion is necessary; for the latter, no conversion is needed, as the underlying content will be handled elsewhere (???)

// genAdapt
case (ClassType(CharSequenceClass), AnyType) if typeTransformer.useWasmString =>
  // should be either an instance of `CharSequence` or `i16Array`
  // if it's i16Array -> convert to JS string
  // if it's an instance of `CharSequence` -> no-op
  val receiver = addSyntheticLocal(watpe.RefType.anyref)
  fb += wa.LocalSet(receiver)
  fb.block(watpe.RefType.anyref) { labelDone =>
    fb.block(watpe.RefType.anyref) { labelNotOurObject =>
      fb += wa.LocalGet(receiver)
      fb += wa.BrOnCastFail(
        labelNotOurObject,
        watpe.RefType.anyref,
        watpe.RefType(genTypeID.ObjectStruct)
      )
      fb += wa.Br(labelDone)
    } // end of labelNotOurObject
    // otherwise, it should be i16Array
    fb += wa.RefCast(watpe.RefType.nullable(genTypeID.i16Array))
    fb += wa.Call(genFunctionID.createJSStringFromArrayNullable)
  }

Convert to i16Array on downcast Any to String

In genAsInstanceOf

val isDownCastAnyToString: Boolean =
  (sourceWasmType, targetWasmType) match {
    case (watpe.RefType(_, sourceHeapType),
          watpe.RefType(_, targetHeapType))
        if sourceHeapType == watpe.HeapType.Any &&
           targetHeapType == watpe.HeapType(genTypeID.i16Array) =>
      true
    case _ => false
  }

// ...

} else if (isDownCastAnyToString && typeTransformer.useWasmString) {
  fb.block(targetWasmType) { foo =>
    genTreeAuto(expr)
    // In case the receiver value is already an i16Array
    fb += wa.BrOnCast(
      foo,
      watpe.RefType.anyref,
      watpe.RefType.nullable(genTypeID.i16Array)
    )
    fb += wa.Call(genFunctionID.createArrayFromJSStringNullable)
  }
  fb += wa.RefCast(watpe.RefType.nullable(genTypeID.i16Array))

Also, in genUnbox if the surrounding type uses i16 array string (but I'm not sure this is right...)

case StringType =>
  fb += wa.RefAsNonNull
  fb += wa.Call(genFunctionID.jsValueToString) // for `undefined`
  if (typeTransformer.useWasmString) {
    fb += wa.Call(genFunctionID.createArrayFromJSString)
  }

and in ArraySelect

/* If it is a reference array type whose element type does not translate
 * to `anyref`, we must cast down the result.
 */
// ...        
case refType @ watpe.RefType(nullable, heapType) if
    typeTransformer.useWasmString &&
    heapType == watpe.HeapType(genTypeID.i16Array) =>
  if (nullable) fb += wa.Call(genFunctionID.createArrayFromJSStringNullable)
  else fb += wa.Call(genFunctionID.createArrayFromJSString)
case refType: watpe.RefType =>
  fb += wa.RefCast(refType)
// ...

Change in genIsInstanceOf

Checking if it's an i16 array

Change in genToStringForConcat

Added a block around notOurObject because if the receiver is not an instance of j.l.Object, it can be either JS value or i16 array.

Other changes

length, charAt

if (typeTransformer.useWasmString) {
  fb += wa.ArrayLen
} else {
  fb += wa.Call(genFunctionID.stringLength)
}
if (typeTransformer.useWasmString) {
  fb += wa.ArrayGetU(genTypeID.i16Array)
} else {
  fb += wa.Call(genFunctionID.stringCharAt)
}

Just use array.len and array.get_u

Equality

// genEq
if (typeTransformer.useWasmString) {
  fb += wa.Call(genFunctionID.equals)
} else {
  fb += wa.Call(genFunctionID.is)
}

Defined an equal function that checks

  • if two given anyref are both i16Array, checks the deep equality of those two arrays
  • if both two are not i16array, call is
  • otherwise, returns false

concat

if (typeTransformer.useWasmString) fb += wa.Call(genFunctionID.wasmStringConcat)
else fb += wa.Call(genFunctionID.stringConcat)

Convert to i16 array from the result of JSValueToString

toString

if (typeTransformer.useWasmString && receiverClassName == CharSequenceClass) {
  // do nothing
  fb += wa.RefCast(watpe.RefType(genTypeID.i16Array)) // ???
} else {
  fb += wa.Call(genFunctionID.jsValueToString)
  if (typeTransformer.useWasmString) fb += wa.Call(genFunctionID.createArrayFromJSString)
}

If the receiverClass is CharSequence and the runtime type is not an our Object, it should be an i16Array, just ref.cast

genLiteral

case StringLiteral(v) =>
  fb ++= ctx.stringPool.getConstantStringInstr(v)
  if (typeTransformer.useWasmString) fb += wa.Call(genFunctionID.createArrayFromJSString)

(It's verbose that, getConstatntStringInstr create i16Array and then convert it to JS string, and createArrayFromJSString converts back to Array, don't care in this prototype)


What doesn't work

NestedJSClass

class ScalaClassContainerWithObject(xxx: String) {
  object InnerJSObject extends js.Object with TestInterface {
    val zzz: String = xxx + "zzz"

    def foo(a: String): String = xxx + "zzz" + a
  }

  def makeLocalJSObject(yyy: String): TestInterface = {
    object LocalJSObject extends js.Object with TestInterface {
      val zzz: String = xxx + yyy

      def foo(a: String): String = xxx + yyy + a
    }

    LocalJSObject
  }
}

In def makeLocalJSObject(yyy: String): TestInterface = {, yyy will be an i16 array because it's a member of ScalaClassContainerWithObject, which is a non-JS class. On the other hand, when referencing yyy from foo in LocalJSObject, it expects yyy to be a JS string. We need to find a way to convert between the i16 array and a JS string when accessing the captured values.

split

Not sure what is the root cause,

val a1 = Array[String]("a", "s", "d", "f")
val a2 = "asdf".split("")
java.util.Arrays.deepEquals(a1, a2) // -> false (should be true)

it seems like the elements of a1 becomes JS strings and elements of a2 will be i16 array. Maybe somehow suppress the conversion to JS string when constructing Array?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment