401 lines
16 KiB
Markdown
401 lines
16 KiB
Markdown
|
# How To Implement Field Presence for Proto3
|
||
|
|
||
|
Protobuf release 3.12 adds experimental support for `optional` fields in
|
||
|
proto3. Proto3 optional fields track presence like in proto2. For background
|
||
|
information about what presence tracking means, please see
|
||
|
[docs/field_presence](field_presence.md).
|
||
|
|
||
|
## Document Summary
|
||
|
|
||
|
This document is targeted at developers who own or maintain protobuf code
|
||
|
generators. All code generators will need to be updated to support proto3
|
||
|
optional fields. First-party code generators developed by Google are being
|
||
|
updated already. However third-party code generators will need to be updated
|
||
|
independently by their authors. This includes:
|
||
|
|
||
|
- implementations of Protocol Buffers for other languages.
|
||
|
- alternate implementations of Protocol Buffers that target specialized use
|
||
|
cases.
|
||
|
- RPC code generators that create generated APIs for service calls.
|
||
|
- code generators that implement some utility code on top of protobuf generated
|
||
|
classes.
|
||
|
|
||
|
While this document speaks in terms of "code generators", these same principles
|
||
|
apply to implementations that dynamically generate a protocol buffer API "on the
|
||
|
fly", directly from a descriptor, in languages that support this kind of usage.
|
||
|
|
||
|
## Background
|
||
|
|
||
|
Presence tracking was added to proto3 in response to user feedback, both from
|
||
|
inside Google and [from open-source
|
||
|
users](https://github.com/protocolbuffers/protobuf/issues/1606). The [proto3
|
||
|
wrapper
|
||
|
types](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto)
|
||
|
were previously the only supported presence mechanism for proto3. Users have
|
||
|
pointed to both efficiency and usability issues with the wrapper types.
|
||
|
|
||
|
Presence in proto3 uses exactly the same syntax and semantics as in proto2.
|
||
|
Proto3 Fields marked `optional` will track presence like proto2, while fields
|
||
|
without any label (known as "singular fields"), will continue to omit presence
|
||
|
information. The `optional` keyword was chosen to minimize differences with
|
||
|
proto2.
|
||
|
|
||
|
Unfortunately, for the current descriptor protos and `Descriptor` API (as of
|
||
|
3.11.4) it is not possible to use the same representation as proto2. Proto3
|
||
|
descriptors already use `LABEL_OPTIONAL` for proto3 singular fields, which do
|
||
|
not track presence. There is a lot of existing code that reflects over proto3
|
||
|
protos and assumes that `LABEL_OPTIONAL` in proto3 means "no presence." Changing
|
||
|
the semantics now would be risky, since old software would likely drop proto3
|
||
|
presence information, which would be a data loss bug.
|
||
|
|
||
|
To minimize this risk we chose a descriptor representation that is semantically
|
||
|
compatible with existing proto3 reflection. Every proto3 optional field is
|
||
|
placed into a one-field `oneof`. We call this a "synthetic" oneof, as it was not
|
||
|
present in the source `.proto` file.
|
||
|
|
||
|
Since oneof fields in proto3 already track presence, existing proto3
|
||
|
reflection-based algorithms should correctly preserve presence for proto3
|
||
|
optional fields with no code changes. For example, the JSON and TextFormat
|
||
|
parsers/serializers in C++ and Java did not require any changes to support
|
||
|
proto3 presence. This is the major benefit of synthetic oneofs.
|
||
|
|
||
|
This design does leave some cruft in descriptors. Synthetic oneofs are a
|
||
|
compatibility measure that we can hopefully clean up in the future. For now
|
||
|
though, it is important to preserve them across different descriptor formats and
|
||
|
APIs. It is never safe to drop synthetic oneofs from a proto schema. Code
|
||
|
generators can (and should) skip synthetic oneofs when generating a user-facing
|
||
|
API or user-facing documentation. But for any schema representation that is
|
||
|
consumed programmatically, it is important to keep the synthetic oneofs around.
|
||
|
|
||
|
In APIs it can be helpful to offer separate accessors that refer to "real"
|
||
|
oneofs (see [API Changes](#api-changes) below). This is a convenient way to omit
|
||
|
synthetic oneofs in code generators.
|
||
|
|
||
|
## Updating a Code Generator
|
||
|
|
||
|
When a user adds an `optional` field to proto3, this is internally rewritten as
|
||
|
a one-field oneof, for backward-compatibility with reflection-based algorithms:
|
||
|
|
||
|
```protobuf
|
||
|
syntax = "proto3";
|
||
|
|
||
|
message Foo {
|
||
|
// Experimental feature, not generally supported yet!
|
||
|
optional int32 foo = 1;
|
||
|
|
||
|
// Internally rewritten to:
|
||
|
// oneof _foo {
|
||
|
// int32 foo = 1 [proto3_optional=true];
|
||
|
// }
|
||
|
//
|
||
|
// We call _foo a "synthetic" oneof, since it was not created by the user.
|
||
|
}
|
||
|
```
|
||
|
|
||
|
As a result, the main two goals when updating a code generator are:
|
||
|
|
||
|
1. Give `optional` fields like `foo` normal field presence, as described in
|
||
|
[docs/field_presence](field_presence.md) If your implementation already
|
||
|
supports proto2, a proto3 `optional` field should use exactly the same API
|
||
|
and internal implementation as proto2 `optional`.
|
||
|
2. Avoid generating any oneof-based accessors for the synthetic oneof. Its only
|
||
|
purpose is to make reflection-based algorithms work properly if they are
|
||
|
not aware of proto3 presence. The synthetic oneof should not appear anywhere
|
||
|
in the generated API.
|
||
|
|
||
|
### Satisfying the Experimental Check
|
||
|
|
||
|
If you try to run `protoc` on a file with proto3 `optional` fields, you will get
|
||
|
an error because the feature is still experimental:
|
||
|
|
||
|
```
|
||
|
$ cat test.proto
|
||
|
syntax = "proto3";
|
||
|
|
||
|
message Foo {
|
||
|
// Experimental feature, not generally supported yet!
|
||
|
optional int32 a = 1;
|
||
|
}
|
||
|
$ protoc --cpp_out=. test.proto
|
||
|
test.proto: This file contains proto3 optional fields, but --experimental_allow_proto3_optional was not set.
|
||
|
```
|
||
|
|
||
|
There are two options for getting around this error:
|
||
|
|
||
|
1. Pass `--experimental_allow_proto3_optional` to protoc.
|
||
|
2. Make your filename (or a directory name) contain the string
|
||
|
`test_proto3_optional`. This indicates that the proto file is specifically
|
||
|
for testing proto3 optional support, so the check is suppressed.
|
||
|
|
||
|
These options are demonstrated below:
|
||
|
|
||
|
```
|
||
|
# One option:
|
||
|
$ ./src/protoc test.proto --cpp_out=. --experimental_allow_proto3_optional
|
||
|
|
||
|
# Another option:
|
||
|
$ cp test.proto test_proto3_optional.proto
|
||
|
$ ./src/protoc test_proto3_optional.proto --cpp_out=.
|
||
|
$
|
||
|
```
|
||
|
|
||
|
The experimental check will be removed in a future release, once we are ready
|
||
|
to make this feature generally available. Ideally this will happen for the 3.13
|
||
|
release of protobuf, sometime in mid-2020, but there is not a specific date set
|
||
|
for this yet. Some of the timing will depend on feedback we get from the
|
||
|
community, so if you have questions or concerns please get in touch via a
|
||
|
GitHub issue.
|
||
|
|
||
|
### Signaling That Your Code Generator Supports Proto3 Optional
|
||
|
|
||
|
If you now try to invoke your own code generator with the test proto, you will
|
||
|
run into a different error:
|
||
|
|
||
|
```
|
||
|
$ ./src/protoc test_proto3_optional.proto --my_codegen_out=.
|
||
|
test_proto3_optional.proto: is a proto3 file that contains optional fields, but
|
||
|
code generator --my_codegen_out hasn't been updated to support optional fields in
|
||
|
proto3. Please ask the owner of this code generator to support proto3 optional.
|
||
|
```
|
||
|
|
||
|
This check exists to make sure that code generators get a chance to update
|
||
|
before they are used with proto3 `optional` fields. Without this check an old
|
||
|
code generator might emit obsolete generated APIs (like accessors for a
|
||
|
synthetic oneof) and users could start depending on these. That would create
|
||
|
a legacy migration burden once a code generator actually implements the feature.
|
||
|
|
||
|
To signal that your code generator supports `optional` fields in proto3, you
|
||
|
need to tell `protoc` what features you support. The method for doing this
|
||
|
depends on whether you are using the C++
|
||
|
`google::protobuf::compiler::CodeGenerator`
|
||
|
framework or not.
|
||
|
|
||
|
If you are using the CodeGenerator framework:
|
||
|
|
||
|
```c++
|
||
|
class MyCodeGenerator : public google::protobuf::compiler::CodeGenerator {
|
||
|
// Add this method.
|
||
|
uint64_t GetSupportedFeatures() const override {
|
||
|
// Indicate that this code generator supports proto3 optional fields.
|
||
|
// (Note: don't release your code generator with this flag set until you
|
||
|
// have actually added and tested your proto3 support!)
|
||
|
return FEATURE_PROTO3_OPTIONAL;
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
If you are generating code using raw `CodeGeneratorRequest` and
|
||
|
`CodeGeneratorResponse` messages from `plugin.proto`, the change will be very
|
||
|
similar:
|
||
|
|
||
|
```c++
|
||
|
void GenerateResponse() {
|
||
|
CodeGeneratorResponse response;
|
||
|
response.set_supported_features(CodeGeneratorResponse::FEATURE_PROTO3_OPTIONAL);
|
||
|
|
||
|
// Generate code...
|
||
|
}
|
||
|
```
|
||
|
|
||
|
Once you have added this, you should now be able to successfully use your code
|
||
|
generator to generate a file containing proto3 optional fields:
|
||
|
|
||
|
```
|
||
|
$ ./src/protoc test_proto3_optional.proto --my_codegen_out=.
|
||
|
```
|
||
|
|
||
|
### Updating Your Code Generator
|
||
|
|
||
|
Now to actually add support for proto3 optional to your code generator. The goal
|
||
|
is to recognize proto3 optional fields as optional, and suppress any output from
|
||
|
synthetic oneofs.
|
||
|
|
||
|
If your code generator does not currently support proto2, you will need to
|
||
|
design an API and implementation for supporting presence in scalar fields.
|
||
|
Generally this means:
|
||
|
|
||
|
- allocating a bit inside the generated class to represent whether a given field
|
||
|
is present or not.
|
||
|
- exposing a `has_foo()` method for each field to return the value of this bit.
|
||
|
- make the parser set this bit when a value is parsed from the wire.
|
||
|
- make the serializer test this bit to decide whether to serialize.
|
||
|
|
||
|
If your code generator already supports proto2, then most of your work is
|
||
|
already done. All you need to do is make sure that proto3 optional fields have
|
||
|
exactly the same API and behave in exactly the same way as proto2 optional
|
||
|
fields.
|
||
|
|
||
|
From experience updating several of Google's code generators, most of the
|
||
|
updates that are required fall into one of several patterns. Here we will show
|
||
|
the patterns in terms of the C++ CodeGenerator framework. If you are using
|
||
|
`CodeGeneratorRequest` and `CodeGeneratorReply` directly, you can translate the
|
||
|
C++ examples to your own language, referencing the C++ implementation of these
|
||
|
methods where required.
|
||
|
|
||
|
#### To test whether a field should have presence
|
||
|
|
||
|
Old:
|
||
|
|
||
|
```c++
|
||
|
bool MessageHasPresence(const google::protobuf::Descriptor* message) {
|
||
|
return message->file()->syntax() ==
|
||
|
google::protobuf::FileDescriptor::SYNTAX_PROTO2;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
New:
|
||
|
|
||
|
```c++
|
||
|
// Presence is no longer a property of a message, it's a property of individual
|
||
|
// fields.
|
||
|
bool FieldHasPresence(const google::protobuf::FieldDescriptor* field) {
|
||
|
return field->has_presence();
|
||
|
// Note, the above will return true for fields in a oneof.
|
||
|
// If you want to filter out oneof fields, write this instead:
|
||
|
// return field->has_presence && !field->real_containing_oneof()
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### To test whether a field is a member of a oneof
|
||
|
|
||
|
Old:
|
||
|
|
||
|
```c++
|
||
|
bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) {
|
||
|
return field->containing_oneof() != nullptr;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
New:
|
||
|
|
||
|
```c++
|
||
|
bool FieldIsInOneof(const google::protobuf::FieldDescriptor* field) {
|
||
|
// real_containing_oneof() returns nullptr for synthetic oneofs.
|
||
|
return field->real_containing_oneof() != nullptr;
|
||
|
}
|
||
|
```
|
||
|
|
||
|
#### To iterate over all oneofs
|
||
|
|
||
|
Old:
|
||
|
|
||
|
```c++
|
||
|
bool IterateOverOneofs(const google::protobuf::Descriptor* message) {
|
||
|
for (int i = 0; i < message->oneof_decl_count(); i++) {
|
||
|
const google::protobuf::OneofDescriptor* oneof = message->oneof(i);
|
||
|
// ...
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
New:
|
||
|
|
||
|
```c++
|
||
|
bool IterateOverOneofs(const google::protobuf::Descriptor* message) {
|
||
|
// Real oneofs are always first, and real_oneof_decl_count() will return the
|
||
|
// total number of oneofs, excluding synthetic oneofs.
|
||
|
for (int i = 0; i < message->real_oneof_decl_count(); i++) {
|
||
|
const google::protobuf::OneofDescriptor* oneof = message->oneof(i);
|
||
|
// ...
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Updating Reflection
|
||
|
|
||
|
If your implementation offers reflection, there are a few other changes to make:
|
||
|
|
||
|
### API Changes
|
||
|
|
||
|
The API for reflecting over fields and oneofs should make the following changes.
|
||
|
These match the changes implemented in C++ reflection.
|
||
|
|
||
|
1. Add a `FieldDescriptor::has_presence()` method returning `bool`
|
||
|
(adjusted to your language's naming convention). This should return true
|
||
|
for all fields that have explicit presence, as documented in
|
||
|
[docs/field_presence](field_presence.md). In particular, this includes
|
||
|
fields in a oneof, proto2 scalar fields, and proto3 `optional` fields.
|
||
|
This accessor will allow users to query what fields have presence without
|
||
|
thinking about the difference between proto2 and proto3.
|
||
|
2. As a corollary of (1), please do *not* expose an accessor for the
|
||
|
`FieldDescriptorProto.proto3_optional` field. We want to avoid having
|
||
|
users implement any proto2/proto3-specific logic. Users should use the
|
||
|
`has_presence()` function instead.
|
||
|
3. You may also wish to add a `FieldDescriptor::has_optional_keyword()` method
|
||
|
returning `bool`, which indicates whether the `optional` keyword is present.
|
||
|
Message fields will always return `true` for `has_presence()`, so this method
|
||
|
can allow a user to know whether the user wrote `optional` or not. It can
|
||
|
occasionally be useful to have this information, even though it does not
|
||
|
change the presence semantics of the field.
|
||
|
4. If your reflection API may be used for a code generator, you may wish to
|
||
|
implement methods to help users tell the difference between real and
|
||
|
synthetic oneofs. In particular:
|
||
|
- `OneofDescriptor::is_synthetic()`: returns true if this is a synthetic
|
||
|
oneof.
|
||
|
- `FieldDescriptor::real_containing_oneof()`: like `containing_oneof()`,
|
||
|
but returns `nullptr` if the oneof is synthetic.
|
||
|
- `Descriptor::real_oneof_decl_count()`: like `oneof_decl_count()`, but
|
||
|
returns the number of real oneofs only.
|
||
|
|
||
|
### Implementation Changes
|
||
|
|
||
|
Proto3 `optional` fields and synthetic oneofs must work correctly when
|
||
|
reflected on. Specifically:
|
||
|
|
||
|
1. Reflection for synthetic oneofs should work properly. Even though synthetic
|
||
|
oneofs do not really exist in the message, you can still make reflection work
|
||
|
as if they did. In particular, you can make a method like
|
||
|
`Reflection::HasOneof()` or `Reflection::GetOneofFieldDescriptor()` look at
|
||
|
the hasbit to determine if the oneof is present or not.
|
||
|
2. Reflection for proto3 optional fields should work properly. For example, a
|
||
|
method like `Reflection::HasField()` should know to look for the hasbit for a
|
||
|
proto3 `optional` field. It should not be fooled by the synthetic oneof into
|
||
|
thinking that there is a `case` member for the oneof.
|
||
|
|
||
|
Once you have updated reflection to work properly with proto3 `optional` and
|
||
|
synthetic oneofs, any code that *uses* your reflection interface should work
|
||
|
properly with no changes. This is the benefit of using synthetic oneofs.
|
||
|
|
||
|
In particular, if you have a reflection-based implementation of protobuf text
|
||
|
format or JSON, it should properly support proto3 optional fields without any
|
||
|
changes to the code. The fields will look like they all belong to a one-field
|
||
|
oneof, and existing proto3 reflection code should know how to test presence for
|
||
|
fields in a oneof.
|
||
|
|
||
|
So the best way to test your reflection changes is to try round-tripping a
|
||
|
message through text format, JSON, or some other reflection-based parser and
|
||
|
serializer, if you have one.
|
||
|
|
||
|
### Validating Descriptors
|
||
|
|
||
|
If your reflection implementation supports loading descriptors at runtime,
|
||
|
you must verify that all synthetic oneofs are ordered after all "real" oneofs.
|
||
|
|
||
|
Here is the code that implements this validation step in C++, for inspiration:
|
||
|
|
||
|
```c++
|
||
|
// Validation that runs for each message.
|
||
|
// Synthetic oneofs must be last.
|
||
|
int first_synthetic = -1;
|
||
|
for (int i = 0; i < message->oneof_decl_count(); i++) {
|
||
|
const OneofDescriptor* oneof = message->oneof_decl(i);
|
||
|
if (oneof->is_synthetic()) {
|
||
|
if (first_synthetic == -1) {
|
||
|
first_synthetic = i;
|
||
|
}
|
||
|
} else {
|
||
|
if (first_synthetic != -1) {
|
||
|
AddError(message->full_name(), proto.oneof_decl(i),
|
||
|
DescriptorPool::ErrorCollector::OTHER,
|
||
|
"Synthetic oneofs must be after all other oneofs");
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
|
||
|
if (first_synthetic == -1) {
|
||
|
message->real_oneof_decl_count_ = message->oneof_decl_count_;
|
||
|
} else {
|
||
|
message->real_oneof_decl_count_ = first_synthetic;
|
||
|
}
|
||
|
```
|