We need to rethink the use of CRUD in Serverless. CRUD, in the general sense, means unconstrained database access, and is too broad and open-ended to be used effectively in Serverless environments (or any general Cloud development for that matter).
Unconstrained database access means that the user function itself needs to manage the nitty-gritty details about data access and storage, and you are thereby moving all the operational concerns from the Serverless framework into the user function. Now it’s hard for the framework to know the intention of each access. For example:
Is the operation a read, or a write?
Can it be cached?
Can consistency be relaxed, or is strong consistency needed?
Can operations proceed during partial failure?
Instead, if we understand these properties then we can make better decisions automatically. For example:
Write ops are fast and read ops are slow: add more memory to DB
Reading immutable values: add caching
Writes must be serializable: add sharding, single writers per entity.
We all know that constraints can be liberating and this holds true for Serverless as much as anything else. As a fact, one of the reasons for the success of Serverless is that it has such a constrained developer experience, which allows you as a developer to focus on the essence: the business logic for the function. For example, Serverless has a great model for abstracting over communication where all communication is translated to receiving and emitting events.
The question we asked ourselves was: can we abstract over state in the same way? Provide a clean and uniform abstraction of state in and state out for the function.
This would allow the framework to manage durable state on behalf of the function, to monitor and manage it holistically across the whole system, and make more intelligent decisions.
Unconstrained CRUD does not work in this model since we can’t pass the entire data set in and out of the function. What we need are data storage patterns that have constrained input/output protocols. Patterns that fall into this category are Key-Value, Event Sourcing, and CRDTs.
In Event Sourcing, state in is the event log while state out is any newly persisted events as a result of handling a command.
In CRDTs, state in is a stream of deltas and/or state updates, and state out is a stream of deltas and/or state updates.
In Key-Value, the state out is the key and state in the value.
While most developers have worked with Key-Value stores, Event Sourcing and CRDTs might be a bit unfamiliar. What’s interesting is that they fit an event-driven model very well while being on opposite sides of the state consistency spectrum, with the former providing strong (ACID) consistency (through event logging) and the latter providing eventual/causal consistency. Taken together, they give us a truly wide range of options for managing distributed state in a consistent fashion by allowing you to choose the optimal model for your specific use-case and data set. Akka Multi-DC log replication is a great example of combining the two techniques to do amazing things.
Next, let’s see how Cloudstate addresses the needs of a stateful Serverless platform: The Cloudstate solution.