Glossary
A review of Sodacan components and major features in alphabetical order.
Actor
Think of an Actor as representing a domain entity such as a person, a department, an account, an invoice or an IOT device.
Logically, an individual Actor is a free-standing entity. The only way it has of reacting to changes in the world is through messages and the only way it can affect the world outside the Actor is through Messages.
Conceptually, all actors execute in parallel and independent of each other. This is at the core of the Sodacan architecture: Scale is achieved through parallel execution.
Large applications are most likely to have long strings of execution. The traditional approach is to leave currency management to the database. And that works well for a smaller number of concurrent tasks. But, to really scale up an application to take advantage hundreds of CPU cores, the burden of managing concurrency at the DB level becomes a problem. With actors, each business entity (not just type of entity) could be executing in parallel. That's concurrency on a massive scale.
A good example of this is a patient record in a healthcare application. Each patient is an actor. At any point in time, some patients will be effectively inactive, some will be active with a few simple events such as lab tests and visit notes. And some patients will have chronic conditions which could involve tens-of-thousands of events. If an actor is expected to reason over patients, the processing time could vary greatly. It could be a problem if a simple patient would have to queue up behind a complex patient process.
The individual events also vary in complexity. So, the design should probably include a number of “Child” actors for different areas of the patient records such as pharmacy, which could involve drug interaction calculations and a problem (like diagnosis) list which might generate additional workflow affecting the patient.
All of these child actors can report summary information back to the patient actor, but the child actors are free to operate in parallel and, as with public health issues, a completely different hierarchy such as a case rather than a patient with both interacting with the same objective summary data, concurrently.
An Actor should reflect an Entity in the target domain. Formally, an Actor is an instance of an ActorType (a class). For example, a lamp or switch are Actors. A specific switch is an instance of the type of Actor called Switch.
Regardless of how many replicas of a specific Actor (ActorId) only one replica is ever active at any point in time.
In many application environments, time-consuming actions such as network, database and file IO are often performed asynchronously. Sodacan takes the opposite approach. Since Actors have already decomposed an application into probably the smallest possible elements, Sodacan piles everything possible into the Actor's thread of execution. This is not a limitation of Sodacan; Actors are free to spawn more threads as they see fit. Rather, this is a design choice: To keep any given Actor instance as single-threaded and therefore as simple as possible: Less contention over shared resources means the more linear a system can scale.
ActorGroup
Sodacan groups Actors into ActorGroups. An ActorGroup is not intended to address any business problem. As such, an individual Actor need not be concerned with the existence of its containing ActorGroup. ActorGroups are the smallest unit from an system perspective. For example, when a new Host is added to a Sodacan cluster, the Host will be able to off-load some number of ActorGroups from the existing Hosts. Actors are usually assigned to an ActorGroup randomly, though other schemes are possible.
ActorGroups have a Mode of operation. For each ActorGroup there will be exactly one that is considered to be in Live Mode by Sodacan. All Actors in that ActorGroup will be the official Actor receiving and sending messages. Other replicas of that ActorGroup will be in one of the following Modes on other Hosts. All of the replicas have the same ActorGroupId as the Live ActorGroup but they will have a different mode.
-
Hot Mode means that the Live actor sends its state to one or more Hot ActorGroups containing that actor. This is a fairly heavyweight mode because it keeps the in-memory as well as persistent state current.
-
Warm Mode means that the persistent state is maintained but the in-memory Actor(s) are not active. Thus, a transition from Warm to Live would be slightly slower. Warm Mode has a light footprint which is useful when a Host is running some other ActorGroup(s) in Live Mode. Thus the live ActorGroup(s) are using most of the memory and cpu resources while the one(s) in Warm Mode will be using very little resources.
-
Cold Mode means that the ActorGroup existed on this Host in the past and probably has stale state. A transition from Cold to any higher mode requires that the ActorGroup needs to bring the state of its Actors up to date using Live, Hot or Warm replicas of the ActorGroup. This process could take time.
-
Off-line mode means that an ActorGroup has never existed on this Host, or was deleted from the Host.
ActorId
An Actor has a permanent ActorId, unique in the Sodacan network. The assignment is usually random, but permanent, unique and usually meaningless.
Actor as Finite State Machine
Some Actors will behave as a Finite State Machine. This is handled easily by the Actor itself. Note: The pure definition of Actor Model actors says that when an Actor processes a message, it determines the behavior applied to the next message received. Sodacan doesn't intrinsically support this type of behavior although it is easy to emulate it using state transitions.
Actor Persistence
In other words, the logic that protects and uses data is kept together with the data in an Actor. Sodacan, by its nature, doesn't need a database though you are not prevented from using one.
Config
Two different configurations are maintained by Sodacan although for the most part, they are accessed in the same way and through the same route.
Local Configuration describes an attribute particular to a given Host. A good example is the Host number which obviously must be different for each Host.
Global Configuration describes attributes that apply globally and are available in all Hosts.
Another dimension of configuration is dynamic vs static settings. Static setting must be resolved at the time the configuration is created. For example, the Host number is static because it can't change dynamically.
Many of the configuration settings allow functions that act as factories for various aspects of the system. Notably, each actor type can specify a custom factory function.
Also, even simple numeric configuration values can be dynamic. For example, the backpressure limit for the message load can change dynamically, perhaps based on time-of-day or other possible loads on the system.
The Configuration is available to most areas of Sodacan. Each Host passes the Configuration to ActorGroups which pass it to Actors. For the most part, the configuration is read-only once a Host starts up. For dynamic settings. Access is serialized making the configuration thread-safe.
The simplest possible configuration, all settings defaulted:
Config config = new SimpleBuilder()
.build();
A configuration that registers all annotated actors in a package, registers single actor and specifies a non-default host number.
Config config = new SimpleBuilder()
.registerActorsInPackage("net.sodacan.sample.actor")
.registerActorType(TestActor.NAME, TestActor.class)
.hostNumber(5)
.build();
Coordinator
The Coordinator for a Host maintains a list of each ActorGroup known to the system. Each ActorGroup is then dynamically assigned to available Hosts depending on the current topology of the network as described by the Coordinator.
While each Host has a Coordinator, you can think of the Coordinator as a single network-wide resource. That is, each Coordinator is always trying to keep synchronized with all others.
The Coordinator for a Host also facilitates heartbeats among the Coordinators and the re-arrangement of ActorGroups as Hosts go up and down.
Also, dynamic configuration settings are maintained by the Coordinator. These setting must be the same on all Hosts.
Sodacan is a peer-to-peer network providing no single point of failure. It is common to operate at least two hot replicas for any given ActorGroup. Additional “cold” replicas can also be configured. Cold replicas require less compute but take longer to come on-line.
See ActorGroup for a more detailed explanation of the Modes of an ActorGroup.
Having only a single copy of any particular ActorGroup is not durable or fault-tolerant. A minimum of three ActorGroup replicas, each on a different Host, are required for durability. But the number of replicas can be much higher as needed.
The Coordinator dynamically assigns ActorGroups to Hosts. While the Coordinator knows which ActorGroups should be active on a Host, it is the Host that actually manages the lifecycle of ActorGroups on that Host.
When a Host comes on-line, it reports its availability to the Coordinator. The host continues to report its availability through a heartbeat function. If that heartbeat fails to be sent, such as when that host is shutdown unexpectedly, the remaining Hosts negotiate to determine which Hosts will handle the lost ActorGroup(s).
If the Coordinators cannot establish a quorum, then the system is considered down. No messages will be processed.
When an external system sends a new message into Sodacan, it can be sent to any active host. The Host will route the message to the correct Host as needed. In effect, it asks the Coordinator where the current Host for the target ActorGroup is running and sends it there.
The coordinator's job is to maintain an ideal state of the system, that is: What each server should be doing. The Host maintains a list of what it is actually doing (which actors are live, etc).
The state of the system can change dramatically from time to time. For example, when a node goes off-line, how are the remaining nodes reconfigured to take up the slack. Likewise, when a new or existing node is brought on-line, how is work balanced to take advantage of the new resource.
Sodacan's coordinators are pluggable. The default configuration specifies a simple, one-host coordinator.
Foreign Data
In a pure actor model, if an Actor “owns” some data, that data stays with the Actor that owns the data. For example, the customer's name and address are owned by that customer's Actor. The customer Actor is the “source of truth” for that customer's information. If any other actor, such as a shopping cart actor, needs the customer's name and address, that is considered “foreign data” in the shopping cart actor.
How can the shopping cart actor get the customer's name and address? The shopping cart Actor can't just look into the customer's Actor while processing a Shopping Cart event, such as to finalize the shopping cart before submission. And, we're not going to hand this problem off to a database. That leaves us with few alternatives to access foreign data when an event (message) is processed by a shopping cart:
- Before the shopping cart event via subscription, or
- With the shopping cart event, as a prerequisite, or
- With the shopping cart event, as part of the event flow
Which approach to use will depend on the requirements of the business flow:
In the subscription approach, the messages that keep the shopping cart updated are not associated with any particular shopping cart event. Yet, the shopping cart actor can, for example, remove the subscription in order to freeze the customer info after the shopping cart transitions to the submitted state. So, the subscription approach means that two different processes are defined: one for collecting customer info into the shopping cart and the other the shopping cart events. In any case, the foreign data will be persisted in the shopping cart actor.
The other two approaches don't interact with the customer except when a shopping cart event is processed. And these approaches don't require that the customer info be persisted as foreign data in the shopping cart actor because it is part of the message. The prerequisite approach means that the Shopping cart decides if and when customer data is needed and reroutes in incoming event to the customer actor to have the customer info added to the message. The event flow approach means that the Actor originating the event determines that collecting customer info is a step in the message flow.
Host
Each instance of Sodacan in a cluster is called a Host. Typically one Host per physical machine. Hosts are numbered. The number doesn't mean anything but must be unique in the system. In a multi-host configuration, a host also has a unique URL. There is no limit to the number of hosts. Typically, a Host assumes some portion of the workload of the system. Any particular numbered host may be up or down. If a particular host goes off-line permanently, its best to retire the Host number. More on the role of a Host shortly.
A Sodacan Host temporarily hosts one or more ActorGroups. ActorGroups can (and should) be replicated for durability. The configuration can specify the number of ActorGroup replicas to maintain including what it takes to maintain a quorum.
Host-Bound Actor
A HostBound actor's lifecycle is different from normal Actors: Normal Actors can come and go at the whim of Sodacan. That wouldn't work for this kind of actor. So, this and other HostBound Actors stay put; A host bound actor controls its own lifecycle.
Jug
A Jug is a Message in transit between two Actors. It contains a target ActorId and a ByteArray containing a serialized Sodacan Message.
The idea of a Jug is a trade-off between security and performance/flexibility. (Security wins.) The default configuration has no security features per se. However, the message is serialized into a byte array within the sending Actor's thread and deserialized just before being processed by the receiving Actor by a plugable Serializer. This approach limits the exposure of a message to man-in-the-middle attack.
In any case, routing behavior between the Actor sending the message and the Actor receiving the message is restricted to knowing just the destination ActorId. Nothing else along the way needs to peek at the message body. This makes, for example, message encryption easy to implement.
Of course, on the downside, this approach means that, with the exception of the target ActorId, a message has no ability to have special routing behavior along the way.
In summary, a Jug is kept simple; It has no behavior. Any encoding, format, or encryption information is contained within the byte array and is dealt with during serialization and deserialization.
Message
A message is a means of communicating between Actors. An Actor processes one inbound message at a time. An Actor can send zero or more messages as a result of processing an inbound message.
A Message has a unique MessageId. The default MessageId is time based so that messages could be sorted or ordered by time. However, Sodacan does not take time order into consideration when routing and delivering messages. The messageId can double as a correlation id when a message is split and then combined. When a message is split, each message has the same messageId.
While a message can be used for a single Actor to Actor communication, it can also represent a larger process. For example, a new message could be created by a source actor, such as an HTTP request with the goal of getting some information back from a specific target Actor. At first blush, it appears as a simple, synchronous call and return action semantic. But Sodacan doesn't allow that. Only fire-and-forget message delivery semantics is allowed in Sodacan without breaking the actor model semantics.
Also, it turns out that in this example the target actor is careful about who can request its data. We'll keep this example simple: The target actor is not going to release the data until it knows that the user requesting the data has permission to do so. To do this we introduce a third-party actor: The user actor. In this scenario we have a choice: The source actor could anticipate that the target actor will need user data to complete the transaction. Or, we can just send our request to the target actor and let it decide what it needs to complete the request. Either way, the user is going to be involved in the message flow.
One more detail and then we will get back to describing the structure of a message. The flow we've described will ultimately need to end up back at the request (Actor). But Sodacan messages do not specify a specific “reply-to”.
Enter the Message Route stack. When a message is sent, the ActorId at the top-of-stack is the destination. When received by the target Actor, the top-of-stack is popped. An Actor along the way can push a new destination onto the stack thereby extending the message's journey.
So, if the Actor originating the Message wants to see the resulting message when it has completed its journey, it just pushes its own ActorId onto the stack first.
A Message carries a payload that some Actors will contribute data to and others will access data from. The payload is organized as key-value pairs. The key is composed of the ActorId that created the value and the name of the field within that Actor. This approach allows multiple actors to contribute to the payload without name conflicts.
Sodacan defines a single Message class for all messages. While the Message class can be customized, is not customized at the Actor by Actor level. Rather, the intent of the message is conveyed as a target ActorId and a Verb which in Sodacan are called a Route. It is these Routes that comprise the Route stack in a message.
A message should be created by the message factory in the Sodacan config object. The message factory will also assign the correct MessageId.
A message also carries a route history, a list of all of the actorId/Verb combinations visited during the lifetime of the message.
Messages often start and end in a Host-bound Actor.
A message is serialized between Actors. See Jug.
Sodacan insures that an Actor cannot prematurely stop a message in its journey before it has made its way to the bottom of the route stack.
MessageId
A MessageId is used to make a message unique in the system. The MessageId is plugable in the Sodacan configuration. The default factory uses a Java instant plus a random number. A MessageId is not used to convey the source or destination of a message. Sodacan does not try discern any meaning from the id other than equality between messages.
Sodacan has a plugable Clock and Random number generator. For testing and debugging, there are alternate implementations which allow control of the clock and/or random number generator.
Payload
Persister
Quorum
A minimum configured number of Hosts must be on-line and agree on certain configuration details. If the quorum cannot be established, the Sodacan cluster is considered off-line. Quorum is ignored in a single Host configuration. Quorum is managed by the Sodacan Coordinator.
Receiver
A Receiver is a delegate of a host and is responsible for accepting inbound Jugs (Messages) addressed to this host. In general, when a message is received it is immediately queued for processing within Sodacan. The receiver dispatches Jugs received to the inbound queue of the appropriate actor group. Implementation Note: Because the path from a received Jug to the target ActorGroup is very short and fast, the server need not bother creating threads to process requests. Details of the steps at this point are:
Upon receiving an inbound Jug, the Jug is added to the inbound queue of the target ActorGroup. (The ActorGroup will then distribute the Jug to the appropriate Actor's inbound queue.) The Receiver can immediately return to listening for the next request. If an HTTP server is used then a response is returned saying, in effect, Jug received and queued.
The Receiver must determine which ActorGroup to queue the Jug. Once this is done, the Receiver's job is done. The Receiver does not maintain any Jug state.
Scheduler
Sender
A Sender is a delegate of a Host tasked with sending messages to other hosts. When a message must be sent from one host to another, the Sender facilitates the movement. A Sender does not operate directly between Actors but rather between Hosts. The message flow is as follows:
The Sender, on behalf of this Host, sends messages to another Host. The message will have already been serialized in a Jug.
A Jug is given to the ActorGroup containing that actor. Control is still in the source Actor's thread. The ActorGroup then hands the Jug(s) off to the Host containing the ActorGroup. The Host, while still in the Actor's thread, makes a quick routing decision. The Host decides if the Jug is to stay local, in which case, the Jug is added to the target (including the same) ActorGroup's Jug Queue. Thus ends the source Actor's thread involvement in the process. If the Host decides the Jug must go to another Host, either because the target actor is running on another Host or because the local Jug must be replicated to another Host, it will be queued, perhaps more than once, to the hosts Sender. As with local Jugs, at this point, control returns to the Actor where the Actor will likely go on to process the next inbound message (Jug). Sodacan Senders are plugable with implementations such as Jetty or Netty clients.
Stage
When a Sodacan Actor is called to process a message, the return value must be a Stage. of has a stage which holds zero or more outbound messages. The Stage, and hence its name, holds all external effects from processing one inbound message. This includes the Actor's persistent data. The Actor's Stage is what supports transaction semantics. The messages in the stage will be sent upon a commit within the Actor, or rolled back as needed. The default Actor implementation processes message out of the stage in the Actor's thread meaning that the Actor cannot go on to the next inbound message until the stage has been processed.
Route
Serializer
Verb
A message Verb is usually defined as an enum
…
Wakeup Message
An actor should never call a sleep function. Instead, if the actor needs to do something in the future, it should send a “wakeup message” to a special wakeup actor which will then send the wakeup message back to the requesting actor. Normally,the wakeup should not carry specific information but rather the wakeup message just activates the acto so that it will check (again) the condition leading to the need for the wakeup. For example, say had an invoice actor that needed to send a payment reminder in 30 days. When the actor wakes up, one of the actor's state handlers will notice that a reminder is due and send a reminder. However, if the balance due had been paid in the meantime, then the reminder will wake up the actor, the actor will notice that the payment was made, and no reminder will be sent to the customer.