jhugman/Code generation pattern proposal.md

Created July 28, 2021 18:55

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/jhugman/9d7a969ca345420797f5f1819c8a77e1.js"></script>
Save jhugman/9d7a969ca345420797f5f1819c8a77e1 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

Code generation pattern proposal.md

Tool specific Intermediate Representation

The Intermediate Representation (IR) resolves to a tree of Descriptors, e.g.:

EnumDescriptor, which has EnumVariantDescriptors which may have FieldDescriptors
RecordDescriptor which have FieldDescriptors.
ObjectDescriptor which have MethodDescriptors, which may have ArgDescriptors.

These represent the concrete types and syntactic structures within those types.

struct EnumDescriptor {
	name: String,
	variants: Vec<EnumVariantDescriptor>,
}

Some of these descriptors will point to other concrete types. e.g.

struct FieldDescriptor {
	field_name: String,
	type_: TypeIdentifier,
	default: Option<Value>,
}

These descriptors are shared between all backends and serialize to the IR.

Backend specific type wraps type descriptors

For the descriptors that represent types (e.g. Object, Enum etc), there exists a struct that wraps the descriptor, and gives access to the sub-descriptor.

struct KotlinEnum {
	inner: EnumDescriptor
}

impl KotlinEnum {
	fn variants(&self) -> Vec<EnumVariantDescriptor> {
		self.inner.variants()
	}
}

These structs implement the trait CodeType.

CodeType is a trait that emits foreign language code for specific tasks, mostly identifiers, and expressions (e.g. function calls into its own machinery).

Precisely what is needed depends upon what the tool this generator is part of. E.g. uniffi needs lift and lower machinery.

It also knows how to generate all the code for its inner Descriptor with the fn render_declaration(&self).

impl CodeType for KotlinEnum {
	fn name(&self) -> String {
		self.inner.name().to_camel_case()
	}

	fn internals(&self) -> String {
		format!("Uniffi{}Internals", self.name())
	}

	fn literal(&self, v: Value) -> String {
		…
	}
	fn lower_into(&self, value: String, buffer: String) -> String {
		format!("{}.lowerInto({}, {})", self.internals(), value, buffer)
	}
	
	fn render_declaration(&self) -> Result<String> {
		EnumDecl(&self).render()
	}
}

A type_oracle knows how to map TypeIdentifiers to CodeTypes.

i.e. if a render_declaration() or another CodeType has a TypeIdentifier and the type_oracle, it can look up the CodeType and then be able to reference and manipulate it in the foreign language.

Aside: how far can we take these CodeTypes? Can CodeType contain TypeIdentifiers?

Since we can generate a declaration, and ways to call into it, I suspect we can:

support compound code types (for Option<T> and Array<T>, Map<String, T>)
support the TransformTowers proposal
support the external types proposal.
code types for primitives (though a rust macro may be needed for this).

Generating the declaration with the main templates

The render_declaration method is almost certainly calling into a template, which is all the code needed to define the type, and what ever internal/private machinery that is required.

The template has access to the type_oracle.

We can stop here, and do all the above with askama. In askama land, we have one template per struct, so in this proposal this would be one template file per struct that implements CodeType.

However, rfk asked me for dreamcode.

I've used a rsx! macro and #[component] syntax which is taken directly from the render crate, which itself implements something like JSX. I added to backticks.

Components (in JSX speak) are templates that take a set of arguments and render a representation of those arguments using strings or other components to render themselves.

#[component]
fn EnumDecl(type_: &KotlinEnum) -> Result<String> {
	let type_name = type_.name();
	rsx! ```
		public sealed class {{ type_name }} {
			<EnumVariants type_={{type_}} />
		}
		
		internal class {{ type_.internals() }} {
			static fun downOne(v: {{ type_name }}): Int = …
			static fun upOne(v: Int): {{ type_name }} = …
			static fun lowerInto(v: {{ type_name }}, buffer: RustBuffer) {
				…
			}
		}
	```
}

#[component]
fn EnumVariants(type_: &KotlinEnum) -> Result<String> {
	let type_name = type_.name();
 	type_.variants().map(|v| {
		if v.fields().len() == 0 {
			rsx! ```
				public object {{ v.name() }} : {{ type_name }}
			```
		} else {
			rsx! ```
				public class {{ v.name() }}(
					<FieldsDecl fields={{ variant.fields() }}
				) : {{ type_name }}
			```
		}
	}).join("\n")
}

#[component]
fn FieldsDecl(fields: &Vec<FieldDescriptor>) -> Result<String> {
	fields.map(|f| {
		let name = f.name();
		let type_ = type_oracle.find(f.type_id())?
		if let Option(default) = f.default_value() {
			rsx! ``` 
				val {{ name }}: {{ type_.name() }} = {{ type_.literal(default) }}
			```
		} else {
			rsx! ```
				val {{ name }}: {{ type_.name }}			
			```
		}

	}).join(",\n")
}

The interesting bits here are:

templates are composed of text and other templates.
Intra-template logic is Rust, instead of additional templating logic.

We did use macros in askama, but these are somewhat more ergonomic unit of template re-use.

I don't know if render can be persuaded to do this, or if we have to write something ourselves, perhaps based on syn-rsx.

Wishlist aside if we were to build our own rsx:

works on Rust Stable
Markdown triple backticks FTW
Trimming the indent so it matches the indent of the rsx! token, or kotlin's trimIndent

rfk commented Jul 29, 2021

Without wanting to get too stuck on the practicalities too quickly...

Trimming the indent so it matches the indent of the rsx! token, or kotlin's trimIndent

From experience, managing indentation when using Rust macros is really hard and requires unstable Rust features - basically, Rust tokenizes away the whitespace before the macro even gets to see the input, so if you want to know the details of the whitespace you need to introspect the Span containing the tokenized code, find out e.g. its line and column information, and work backwards from there to reconstruct the whitespace. It's doable (e.g. I'm aware of a project that lets you embed python code as a Rust macro in this way, significant whitespace and all) but it's a lot of unstable messing about.

Author

jhugman commented Jul 29, 2021

templates within templates.

You're right, that askama does allow templates within templates. JSX templates resemble functions and render the arguments passed to them, whereas askama templates are attached to structs.

  public sealed class {{ type_name }} {
  	<EnumVariants type_={{type_}} />
  }

I'm sure a template specific struct could work at the declaration site (although a little ugly):

#[derive(Template)]
#[template(src="""
{% for v in type_.variants() %}
{% if v.fields().len == 0 %}
public object {{ v.name() }} : {{ type_.name() }}
{% else %}
…
{% endif %}
{% endfor %}
"""")]
struct EnumVariants{ type_: EnumDescriptor };

though not sure how that works from within the calling template. I haven't tried this, but if askama supported this, it would be a major step up to what we have already.

  public sealed class {{ type_.name() }} {
  	{{ EnumVariants { type_: type_ } }} />
  }

rfk commented Jul 30, 2021

it would be a major step up to what we have already

Yep, syntax details aside, I think we're in strong agreement that this "component"-based model of rendering would be a significant improvement.

rfk commented Jul 30, 2021

I'm curious if you have a concrete idea of what TypeIdentifier would be here in practice - is it an integer id, or perhaps an opaque type that is actually the existing Type::* enum under the hood? I feel like maybe the point of this proposal is that it doesn't matter too much what a TypeIdentifier actually is, so long as you can map it to a CodeType implementing the necessary rendering functions, but I want to check my understanding.

rfk commented Jul 30, 2021 •

edited

Loading

For the descriptors that represent types (e.g. Object, Enum etc), there exists a struct that wraps the descriptor,
and gives access to the sub-descriptor.

Its not obvious to me what the KotlinEnum wrapper is for. Would it work if we implemented a trait directly on the underlying struct? Something like:

impl KoltinCodeType for EnumDescriptor {
	fn name(&self) -> String {
		self.name().to_camel_case()
	}

	fn internals(&self) -> String {
		format!("Uniffi{}Internals", self.name())
	}

	fn literal(&self, v: Value) -> String {
		…
	}
	fn lower_into(&self, value: String, buffer: String) -> String {
		format!("{}.lowerInto({}, {})", self.internals(), value, buffer)
	}
	
	fn render_declaration(&self) -> Result<String> {
		EnumDecl(&self).render()
	}
}

Edit: Actually I guess one advantage of the wrapper struct is that we wouldn't have conflicts between a trait fn named name and the inherent impl named name...

Author

jhugman commented Jul 30, 2021

Its not obvious to me what the KotlinEnum wrapper is for. Would it work if we implemented a trait directly on the underlying struct?

I'm not stuck on a KotlinEnum wrapper that implements a CodeType, but I ended up there because of function name collisions from the SwiftCodeType and PythonCodeType traits.

Author

jhugman commented Jul 30, 2021

I feel like maybe the point of this proposal is that it doesn't matter too much what a TypeIdentifier actually is, so long as you can map it to a CodeType implementing the necessary rendering functions, but I want to check my understanding.

Yes, I think you're right; it doesn't matter.

My thinking was that the TypeIdentifier was serializable type label, used to label the types of args, properties, return types, etc etc. That points to an enum not unlike Type that can represent compound type labels.

For the purpose of this proposal, uniffi's type_oracle could well be a mega-match expression on Type::. By my count, this would reduce the number of match expressions on Type:: from 6 to 1 per binding.

I think this would be a net-positive. Adding new types to an existing backend becomes almost straight forward.

For the purposes of the External Types and Transform Towers proposal, the type_oracles then becomes quite an interesting place to start.

jhugman/Code generation pattern proposal.md

Tool specific Intermediate Representation

Backend specific type wraps type descriptors

Generating the declaration with the main templates

rfk commented Jul 29, 2021

jhugman commented Jul 29, 2021

rfk commented Jul 30, 2021

rfk commented Jul 30, 2021

rfk commented Jul 30, 2021 • edited Loading

jhugman commented Jul 30, 2021

jhugman commented Jul 30, 2021

rfk commented Jul 30, 2021 •

edited

Loading