* Document Protocol Buffers used in Anki * Fix typos * Correct notes about Rust crate
4.5 KiB
Protocol Buffers
Anki uses different implementations of Protocol Buffers and each has its own pecularities. This document highlights some aspects relevant to Anki and hopefully helps to avoid some common pitfalls.
For information about Protobuf's types and syntax, please see the official language guide.
General Notes
Names
Generated code follows the naming conventions of the targeted language. So to access
the message field foo_bar
you need to use fooBar
in Typescript and the
namespace created by the message FooBar
is called foo_bar
in Rust.
Optional Values
In Python and Typescript, unset optional values will contain the type's default
value rather than None
, null
or undefined
. Here's an example:
message Foo {
optional string name = 1;
optional int32 number = 2;
}
message = Foo()
assert message.number == 0
assert message name == ""
In Python, we can use the message's HasField()
method to check whether a field is
actually set:
message = Foo(name="")
assert message.HasField("name")
assert not message.HasField("number")
In Typescript, this is even less ergonomic and it can be easier to avoid using
the default values in active fields. E.g. the CsvMetadata
message uses 1-based
indices instead of optional 0-based ones to avoid ambiguity when an index is 0
.
Oneofs
All fields in a oneof are implicitly optional, so the caveats above apply just as much to a message like this:
message Foo {
oneof bar {
string name = 1;
int32 number = 2;
}
}
In addition to HasField()
, WhichOneof()
can be used to get the name of the set
field:
message = Foo(name="")
assert message.WhichOneof("bar") == "name"
Backwards Compatibility
The official language guide makes a lot of notes about backwards compatibility, but as Anki usually doesn't use Protobuf to communicate between different clients, things like shuffling around field numbers are usually not a concern.
However, there are some messages, like Deck
, which get stored in the database.
If these are modified in an incompatible way, this can lead to serious issues if
clients with a different protocol try to read them. Such modifications are only
safe to make as part of a schema upgrade, because schema 11 (the targeted schema
when choosing Downgrade), does not make use of Protobuf messages.
Field Numbers
Field numbers larger than 15 need an additional byte to encode, so repeated
fields
should preferrably be assigned a number between 1 and 15. If a message contains
reserved
fields, this is usually to accomodate potential future repeated
fields.
Implementation-Specific Notes
Python
Protobuf has an official Python implementation with an extensive reference.
- Every message used in aqt or pylib must be added to the respective
.pylintrc
to avoid failing type checks. The unqualified protobuf message's name must be used, not an alias fromcollection.py
for example. This should be taken into account when choosing a message name in order to prevent skipping typechecking a Python class of the same name.
Typescript
Anki uses protobuf.js, which offers some documentation.
- If using a message
Foo
as a type, make sure not to use the generated interfaceIFoo
instead. Their definitions are very similar, but the interface requires null checks for every field.
Rust
Anki uses the prost crate.
Its documentation has some useful hints, but for working with the generated code,
there is a better option: From within anki/rslib
run cargo doc --open --document-private-items
.
Inside the pb
module you will find all generated Rust types and their implementations.
- Given an enum field
Foo foo = 1;
,message.foo
is ani32
. Use the accessormessage.foo()
instead to avoid having to manually convert to aFoo
. - Protobuf does not guarantee any oneof field to be set or an enum field to contain
a valid variant, so the Rust code needs to deal with a lot of
Option
s. As we don't expect other parts of Anki to send invalid messages, using anInvalidInput
error orunwrap_or_default()
is usually fine.