Type generation migration

This note proposes a roadmap of migrating Crux and the apps to a new type generation system, based on rustdoc JSON.

Why?

The current system has limitations, is quite fiddly to set up, and the generated code is not great.

The new system removes a lot of the limitations, but to get the most benefit, we will want to generate code, which is fundamentally not compatible with the original output. Given this output is used by existing shell implementations, we need to provide a smooth upgrade path.

At the same time, due to the limitations, we know of Crux users who opted out of the typegen and use something like 1Password's Typeshare. They may not want to continue doing so, and we should enable a migration path from this to the new typegen as well.

To coordinate this transition, this RFC is aiming to provide the roadmap we can follow, which a majority of people are happy with, and had a chance to comment on.

Current system

The current system is based on Serde. It's a combination of two related crates: serde-reflection and serde-generate. Serde-reflection uses the derived Deserialize implementations on types to discover their shape and captures it as data. This data is used by serde-generate to generate equivalent types in TypeScript, Swift, Kotlin and some other languages, alongside an implementation of serialization on the "foreign" side.

This system has limitations:

Only supports types which are Deserialize (this is not a big problem in practice).
Only supports types expressed using the Serde data model. This means it fundamentally can't capture certain kinds of metadata, e.g. use of generic types and their trait bounds, any additional decorations on the foreign side (implemented protocols in Swift, etc.), any other information not provided to the deserializer.
We have no control over the generated code. This is especially problematic in TypeScript and the representation of enums.

Why not use Typeshare?

Typeshare is great, but has its own set of limitations. The very key one is that it is based on analysing Rust code with the compiler, only operates on types which are annotated with a #[typeshare] procmacro and doesn't see into crates. It also doesn't understand macro generated code, which is problematic for discovery of relevant types, when some of the code is generated by derive macros (especially the Effect type cluster).

Rustdoc based system

The system we've been working on is based on rustdoc, specifically it's ability to output the type metadata in JSON. This actually covers all the bases:

It sees all the types in all the used crates (including core and std)
It can see macro generated code
It has all the relevant metadata - generic arguments, trait bounds, implementors of traits, etc.

It is only a data set however. We need to do the work of finding the types, understanding them, and generating the foreign equivalents.

This broadly happens in three phases.

Discover entry points – this is a Crux specific part, which can find implementations of crux_core::App and find their associated types, to use as entry points to start the type discovery
Walk the type graph from entry points down to primitives – this is the key component of the work, which translates the raw type information into an "intermediate representation" (IR), which can be used in step 3
Generation – converting the IR into equivalent types in the selected foreign language

Key benefits and features to enable

Because we can discover the apps and find entry points from there, we no longer need the separate shared_types crate, where developers can register extra types and direct the typegen.

We can support a wider feature set:

Better generated code, including "decorations" (e.g. Swift protocols)
Generic types support (in languages which support them)
Additional foreign code extensions:
- Different serialisation format
- Data based interior mutability support (to support fine-grained updates of the view model in the future)
- Custom code extensions

The goal of this work is to make the type generation completely transparent to the developers most of the time. It should Just Work ™️.

Reducing boilerplate

The other change we'll want to make is how the typegen is executed. Instead of having a separate crate with a build.rs file (which was necessary in order to see the code of the core crate while running build.rs), the new type generation will come as a separate CLI tool, produced as an additional target on the core crate, similar to the uniffi-bindgen tool today. In fact as part of this, we will try to subsume the uniffi-bindgen tool into the same CLI tool, and move from writing a .udl file for the FFI interface to using annotations, likely Crux ones, so that we can take control of how we generate FFI for different platforms (UniFFI, WebAssembly and others).

Ultimately, creating the full FFI interface for the Crux core should become a single build command.

Ways to migrate which we want to support

There are a few migrations involved in this transition:

Migrating the core from original type generation to the new one
Migrating the shells from original generated types to the new ones
Migrating from another type generation system to Crux typegen

We want to support all of them independently, and, crucially, gradually, type by type. For the second and third migration above, it MUST be possible to mix and match approaches at all times.

From old typegen

The initial implementation of the new type generation will aim for parity with the original, while removing all the configuration boilerplate in the process.

It is likely that this stage will remain experimental for some time, because the type discovery mechanism works quite differently to the Serde based one. We will invite community feedback on how well the system is working on their existing code and collect examples where it doesn't work too well.

The second stage of the migration, which is likely to happen concurrently, is the implementation of various improvements to the generated code. To facilitate this being opt-in by type, we're likely to use an annotation driven optionality, along the lines of (the names and specific format may of course change):

#![allow(unused)]
fn main() {
// Generate legacy output at definition site
#[cruxgen(legacy = true)]
MyType(bool)

// Enable future output at reference site
SomeType {
    other: cruxgen! { future = true, OtherType }
}
}

This should allow apps to migrate slowly, one type at a time, making necessary changes to the consuming code on the shell side.

From a different typegen (e.g. Typeshare)

The strategy to support migrating from a different type generation system is similar, but by using a full opt-out of the Crux type generation:

// Skip at definition site
#[cruxgen(skip)]
MyType(bool)

// Skip at reference site
SomeType {
    other: cruxgen! { skip = true, OtherType }
}

In order to cover discrepancies in feature set, we will also do our best to support custom code generation extensions quite early on, but the strategy for specifying them is not yet very clear

Migration roadmap

The migration needs somewhat careful orchestration, so that big step changes are not required for Crux users to adopt. It should go something like this:

Phase 1 - develop the frontend and IR

In this phase we still use serde-generate as a backend, and focus on getting the frontend - the type discovery and the developer interface.

1 - serde-generate feature parity and start validating

Gets us to a working, reliable type generation front end, able to discover all the relevant types and capture the metadata. This is likely to require a period of testing with real-world codebases.

2 - enable annotation controlled feature selection

Support annotations to skip types, ignore fields and similar basic things which previously relied on serde annotations. Both ways should work for the time being, but with future mode enabled, the serde annotations should start being ignored. This is the start of the two modes diverging.

If possible, the annotations should be allowed on both definition sites and reference sites. We need to think about how conflict resolution works in this case, if multiple sites are annotated but with different directions.

Phase 2 - replace the backend, stabilise the IR

In this phase, we replace the serde-generate backend and gradually change what the output looks like. At the same time we gain features.

1 – take over generation of the code to parity

Replace or vendor in serde-generate in order to support outputting all the original code in supported languages At this point, we can retire the serde implementation fully, so long as we're confident with.

We should also introduce the legacy switch which forces backwards compatibility.

2 – change future output to be more idiomatic

Make changes to the generated code to better represent the types idiomatically to the language.

3 - stabilise the IR

To enable extension points on the backend side, we'll need to stabilise the intermediate representation of the discovered types and their relationships.

4 - enable custom extensions

Allow users to add extensions to the generated code, given the IR. This is almost like a derive macro but for the foreign language(s). The exact mechanism is to be decided, but it should be possible to make them language specific (e.g. Swift-only).

5 - support optionality, especially in serialisation

The goal of the output is to be idiomatic to the target codebase, which will likely require some optionality (e.g. which serialisation library to use in Kotlin). We should do our best to pick sensible defaults, and delegate as much as possible to custom extensions, otherwise we risk an explosion in the features we need to support.

One specific optionality we should enable support for is the serialisation format over the FFI boundary.

Phase 3 - enable as default

At some point when the base future output stops evolving too much, we can make it default (when neither future nor legacy is specified).

Further down the line, we retire the legacy support as the final step.

Keyboard shortcuts

Crux: Cross-platform app development in Rust