CloudState serialization convention

There are several cases in CloudState where user provided values need to be serialized. Given CloudState’s dependence on gRPC, protobuf is an obvious serialization format, however, it’s not always convenient. For example, if protobufs were the only supported format, it would not be possible to use strings as keys in an ORMap, instead users would have to create a protobuf message that wrapped a string, and use that instead.

User function libraries are therefore encouraged to support a variety of primitive types, according to what’s available in their language. JSON is also a popular and convenient serialization format in some situations, and this may supported too.

User provided values in CloudState are serialized to protobuf Any values. These contain a type_url string field that identify the protobuf type, along with a value bytes value for the serialized bytes. While there’s nothing technically stopping anything being put in either of the fields, such as a UTF8 encoded string or JSON, the bytes are meant to be a valid protobuf, and the type_url is meant to refer to a protobuf type. To support the serialization of primitive and JSON values in a way that is as consistent as possible with the intentions behind Any, CloudState has defined convention for achieving this.

Primitive values

Serialized primitive values have a type_url prefix of p.cloudstate.io. The path of the URL is the protobuf name of the primitive. So, for example, the type_url of a serialized string is p.cloudstate.io/string, while for a serialized boolean, it is p.cloudstate.io/bool.

The value field contains a message that is serialized using the following schema:

syntax = "proto3";
message Primitive {
  <type> value = 1;
}

Where <type> is the type of the primitive, eg string. Implementations may define messages to represent these, but in most cases this is not necessary, serializing and deserializing the above message can be done manually using the libraries provided by the protobuf support in a given language.

Primitive value stability

Since these values are often used as keys in maps and elements in sets, the stability of the encoded form is important. While the protobuf spec does not make any guarantees on serialization stability, protobuf implementations are free to guarantee stability of their serialized form, and so it’s important that the implementation used to encode the above messages is stable. This means, no additional fields may be included in the message, and if the primitive value is the default value for that type (for example, for string, the empty string, or for any number, zero), the encoded message must be an empty byte string.

JSON values

Serialized JSON values have a type_url prefix of json.cloudstate.io, and a path determined using a language specific mechanism, for example, in statically typed languages, it may be a fully qualified class name, in dynamically type languages, it may come from a field in the object, such as type. Typically, the type will be important even in dynamically typed languages when serializing event sourced events, to determine which event handler should be invoked to handle a particular event.

The value field contains a message that is serialized using the following schema:

syntax = "proto3";
message Json {
  bytes json = 1;
}

The UTF8 encoded serialized JSON is placed in the json field of the message above. This encoding is intentionally identical to using a string typed json field instead, both will yield exactly the same bytes for the same JSON. It is also identical to the primitive encoding for string and bytes, and so implementations may simply reuse that.

Serialized JSON value stability

As with primitive values, the stability of the encoded form is important. Libraries are encouraged to use JSON encoders that produce a stable output for logically equal inputs. CloudState does not put any requirements on how to achieve stable serialization, however many languages have libraries that do produce a canonical JSON representation of values. In the absence of such a library, the following considerations need to be remembered:

  • The order of fields must not change for two equivalent objects - lexicographically sorting them is one way to achieve this.
  • The whitespace in the representation must be consistent (typically, no whitespace is best).
  • The serialization of equal numbers must be consistent (eg, if a number is equal to 1, it should either encode to 1, or 1.0, but not both).
The source code for this page can be found here.