Hello, Service Fabric!

  • 8/31/2016

Service Fabric concepts

In this section, you first briefly review the architecture of Service Fabric. Then, you learn about some of the key concepts of Service Fabric in preparation for service development.


An overview of Service Fabric architecture is shown in Figure 1-1. As you can see, Service Fabric is a comprehensive PaaS with quite a few subsystems in play. The discussion here gives you a high-level overview of these subsystems. We’ll go into details of each of the subsystems throughout this book, so don’t worry if you are not familiar with some of the terms.


FIGURE 1-1 Service Fabric architecture

The subsystems shown in Figure 1-1 are as follows:

  • Transport subsystem The transport subsystem is a Service Fabric internal subsystem that provides secured point-to-point communication channels within a Service Fabric cluster and between a Service Fabric cluster and clients.

  • Federation subsystem The federation subsystem provides failure detection, leader election, and consistent routing, which form the foundation of a unified cluster. We’ll examine these terms in upcoming chapters.

  • Reliable subsystem The reliable subsystem manages state replication, failovers, and load balancing, which a highly available and reliable system needs.

  • Management subsystem The management subsystem provides full application lifetime management, including services such as managing application binaries; deploying, updating and deprovisioning applications; and monitoring application health.

  • Hosting subsystem The hosting subsystem is responsible for managing application life cycles on a cluster node.

  • Communication subsystem The primary task of the communication subsystem is service discovery. With complete separation of workloads and infrastructure, service instances may migrate from host to host. The communication subsystem provides a naming service for clients to discover and connect to service instances.

  • Testability subsystem The idea of test in production was popularized by the Netflix Chaos Monkey (and later the Netflix Simian Army). The testability subsystem can simulate various failure scenarios to help developers shake out design and implementation flaws in the system.

Nodes and clusters

To understand Service Fabric clusters, you need to know about two concepts: node and cluster.

  • Node Technically, a node is just a Service Fabric runtime process. In a typical Service Fabric deployment, there’s one node per machine. So you can understand a node as a machine (physical or virtual). A Service Fabric cluster allows heterogeneous node types with different capacities and configurations.

  • Cluster A cluster is a set of nodes that are connected to form a highly available and reliable environment for running applications and services. A Service Fabric cluster can have thousands of nodes.

Figure 1-2 is a simple illustration of a Service Fabric cluster. Notice that all nodes are equal peers; there are no master nodes or subordinate nodes. Also notice that although in the diagram the nodes are arranged in a ring, all the nodes can communicate directly with each other via the transport subsystem.


FIGURE 1-2 A Service Fabric cluster

A Service Fabric cluster provides an abstraction layer between your workloads and the underlying infrastructure. Because you can run Service Fabric clusters on both physical machines and virtual machines, either on-premises or in the cloud, you can run your Service Fabric applications without modifications in a variety of environments such as on-premises datacenters and Microsoft Azure.

Applications and services

A Service Fabric application is a collection of services. A service is a complete functional unit that delivers certain functionalities.

You author a Service Fabric application by defining the Application Type and associated Service Types. When the application is deployed to a Service Fabric cluster, these types are instantiated into application instances and service instances, respectively.

An application defines an isolation unit in Service Fabric. You can deploy and manage multiple applications independently on the same cluster. Service Fabric keeps their code, configuration, and data isolated from one another. You can deploy multiple versions of an application on the same cluster.

Partitions and replicas

A service can have one or more partitions. Service Fabric uses partitions as the scaling mechanism to distribute workloads to different service instances.

A partition can have one or more replicas. Service Fabric uses replicas as the availability mechanism. A partition has one primary replica and may have multiple secondary replicas. The states of replicas are synchronized automatically. When a primary replica fails, a secondary replica automatically is promoted to primary to keep service availability. And the number of secondary replicas is brought back to desired level to keep enough redundancy.

We’ll introduce partitions and replicas in more detail in Chapter 2, Chapter 3, and Chapter 7.

Programming modes

Service Fabric provides two high-level frameworks to build applications: the Reliable Service APIs and the Reliable Actor APIs.

  • The Reliable Service APIs provide direct access to Service Fabric constructs such as reliable collections and communication stacks.

  • The Reliable Actor APIs provide a high-level abstraction layer so that you can model your applications as a number of interacting actors.

With Reliable Service APIs, you can add either stateless services or stateful services to a Service Fabric application. The key difference between the two service types is whether service state is saved locally on the hosting node.

Stateless vs. stateful

Some services don’t need to maintain any states across requests. Let’s say there’s a calculator service that provides both an Add operation and a Subtract operation. For each of the service calls, the service takes in two operands and generates a result. The service doesn’t need to maintain any contextual information between calls because every call can be carried out based solely on given parameters. The service behavior is not affected by any contextual information; that is, adding 5 and 3 always yields 8, and subtracting 6 from 9 always yields 3.

The majority of services, in contrast, need to keep some sort of states. A typical example of such a service is a shopping cart service. As a user adds items to the cart, the state of the cart needs to be maintained across different requests so that the user doesn’t lose what she has put in the cart.

Services that don’t need to maintain states or don’t save states locally are called stateless services. Services that keep local states are called stateful services. The only distinction between a stateful service and a stateless service is whether the state is saved locally. Continuing with the previous shopping cart example, the service can be implemented as a stateless service that saves shopping cart states in external data storage or as a stateful service that saves shopping cart states locally on the node.

A stateful service can cause some problems. When a service is scaled out, multiple instances share the total workload. For a stateless service, requests can be distributed among the instances because it doesn’t matter which instance handles the specific request. For a stateful service, because each service instance records its own state locally, a user session needs to be routed to the same instance to ensure a consistent experience for the user. Another problem with a stateful service is reliability. When a service instance goes down, it takes all its state with it, which causes service interruptions for all the users who are being served by the instance.

To solve these problems, a stateful service can be transformed into a stateless service by externalizing the state. However, this means every service call will incur additional calls to an external data source, increasing system latency. Fortunately, Service Fabric provides a way to escape this dilemma, which we’ll discuss in Chapter 3.