anki/docs/protobuf.md
Damien Elmes 45f5709214
Migrate to protobuf-es (#2547)
* Fix .no-reduce-motion missing from graphs spinner, and not being honored

* Begin migration from protobuf.js -> protobuf-es

Motivation:

- Protobuf-es has a nicer API: messages are represented as classes, and
fields which should exist are not marked as nullable.
- As it uses modules, only the proto messages we actually use get included
in our bundle output. Protobuf.js put everything in a namespace, which
prevented tree-shaking, and made it awkward to access inner messages.
- ./run after touching a proto file drops from about 8s to 6s on my machine. The tradeoff
is slower decoding/encoding (#2043), but that was mainly a concern for the
graphs page, and was unblocked by
37151213cd

Approach/notes:

- We generate the new protobuf-es interface in addition to existing
protobuf.js interface, so we can migrate a module at a time, starting
with the graphs module.
- rslib:proto now generates RPC methods for TS in addition to the Python
interface. The input-arg-unrolling behaviour of the Python generation is
not required here, as we declare the input arg as a PlainMessage<T>, which
marks it as requiring all fields to be provided.
- i64 is represented as bigint in protobuf-es. We were using a patch to
protobuf.js to get it to output Javascript numbers instead of long.js
types, but now that our supported browser versions support bigint, it's
probably worth biting the bullet and migrating to bigint use. Our IDs
fit comfortably within MAX_SAFE_INTEGER, but that may not hold for future
fields we add.
- Oneofs are handled differently in protobuf-es, and are going to need
some refactoring.

Other notable changes:

- Added a --mkdir arg to our build runner, so we can create a dir easily
during the build on Windows.
- Simplified the preference handling code, by wrapping the preferences
in an outer store, instead of a separate store for each individual
preference. This means a change to one preference will trigger a redraw
of all components that depend on the preference store, but the redrawing
is cheap after moving the data processing to Rust, and it makes the code
easier to follow.
- Drop async(Reactive).ts in favour of more explicit handling with await
blocks/updating.
- Renamed add_inputs_to_group() -> add_dependency(), and fixed it not adding
dependencies to parent groups. Renamed add() -> add_action() for clarity.

* Remove a couple of unused proto imports

* Migrate card info

* Migrate congrats, image occlusion, and tag editor

+ Fix imports for multi-word proto files.

* Migrate change-notetype

* Migrate deck options

* Bump target to es2020; simplify ts lib list

Have used caniuse.com to confirm Chromium 77, iOS 14.5 and the Chrome
on Android support the full es2017-es2020 features.

* Migrate import-csv

* Migrate i18n and fix missing output types in .js

* Migrate custom scheduling, and remove protobuf.js

To mostly maintain our old API contract, we make use of protobuf-es's
ability to convert to JSON, which follows the same format as protobuf.js
did. It doesn't cover all case: users who were previously changing the
variant of a type will need to update their code, as assigning to a new
variant no longer automatically removes the old one, which will cause an
error when we try to convert back from JSON. But I suspect the large majority
of users are adjusting the current variant rather than creating a new one,
and this saves us having to write proxy wrappers, so it seems like a
reasonable compromise.

One other change I made at the same time was to rename value->kind for
the oneofs in our custom study protos, as 'value' was easily confused
with the 'case/value' output that protobuf-es has.

With protobuf.js codegen removed, touching a proto file and invoking
./run drops from about 8s to 6s.

This closes #2043.

* Allow tree-shaking on protobuf types

* Display backend error messages in our ts alert()

* Make sourcemap generation opt-in for ts-run

Considerably slows down build, and not used most of the time.
2023-06-14 22:47:37 +10:00

4.2 KiB

Protocol Buffers

Anki uses different implementations of Protocol Buffers and each has its own peculiarities. This document highlights some aspects relevant to Anki and hopefully helps to avoid some common pitfalls.

For information about Protobuf's types and syntax, please see the official language guide.

General Notes

Names

Generated code follows the naming conventions of the targeted language. So to access the message field foo_bar you need to use fooBar in Typescript and the namespace created by the message FooBar is called foo_bar in Rust.

Optional Values

In Python and Typescript, unset optional values will contain the type's default value rather than None, null or undefined. Here's an example:

message Foo {
  optional string name = 1;
  optional int32 number = 2;
}
message = Foo()
assert message.number == 0
assert message name == ""

In Python, we can use the message's HasField() method to check whether a field is actually set:

message = Foo(name="")
assert message.HasField("name")
assert not message.HasField("number")

In Typescript, this is even less ergonomic and it can be easier to avoid using the default values in active fields. E.g. the CsvMetadata message uses 1-based indices instead of optional 0-based ones to avoid ambiguity when an index is 0.

Oneofs

All fields in a oneof are implicitly optional, so the caveats above apply just as much to a message like this:

message Foo {
    oneof bar {
      string name = 1;
      int32 number = 2;
    }
}

In addition to HasField(), WhichOneof() can be used to get the name of the set field:

message = Foo(name="")
assert message.WhichOneof("bar") == "name"

Backwards Compatibility

The official language guide makes a lot of notes about backwards compatibility, but as Anki usually doesn't use Protobuf to communicate between different clients, things like shuffling around field numbers are usually not a concern.

However, there are some messages, like Deck, which get stored in the database. If these are modified in an incompatible way, this can lead to serious issues if clients with a different protocol try to read them. Such modifications are only safe to make as part of a schema upgrade, because schema 11 (the targeted schema when choosing Downgrade), does not make use of Protobuf messages.

Field Numbers

Field numbers larger than 15 need an additional byte to encode, so repeated fields should preferably be assigned a number between 1 and 15. If a message contains reserved fields, this is usually to accommodate potential future repeated fields.

Implementation-Specific Notes

Python

Protobuf has an official Python implementation with an extensive reference.

  • Every message used in aqt or pylib must be added to the respective .pylintrc to avoid failing type checks. The unqualified protobuf message's name must be used, not an alias from collection.py for example. This should be taken into account when choosing a message name in order to prevent skipping typechecking a Python class of the same name.

Typescript

Anki uses protobuf-es, which offers some documentation.

Rust

Anki uses the prost crate. Its documentation has some useful hints, but for working with the generated code, there is a better option: From within anki/rslib run cargo doc --open --document-private-items. Inside the pb module you will find all generated Rust types and their implementations.

  • Given an enum field Foo foo = 1;, message.foo is an i32. Use the accessor message.foo() instead to avoid having to manually convert to a Foo.
  • Protobuf does not guarantee any oneof field to be set or an enum field to contain a valid variant, so the Rust code needs to deal with a lot of Options. As we don't expect other parts of Anki to send invalid messages, using an InvalidInput error or unwrap_or_default() is usually fine.