Introducing CQRS
- 9/10/2014
The query stack
Let’s delve a bit deeper into the two pipelines that make up the CQRS architecture. In doing so, another key aspect that drives the adoption of CQRS in some highly collaborative systems will emerge clearly—the necessity of dealing with stale data.
The read domain model
A model that deals only with queries would be much easier to arrange than a model that has to deal with both queries and commands. For example, a prickly problem we hinted at in Chapter 8, “Introducing the domain model,” is brilliantly and definitely solved with the introduction of a read-only domain model.
Why you need distinct models
The problem was summarized as follows. The Order class has an Items property that exposes the list of ordered products. The property holds inherently enumerable content, but which actual type should you use for the Items property? The first option that probably comes to mind is IList<T>. It might work, but it’s not perfect. So let’s put ourselves in a Domain Model scenario and assume we want to have a single model for the entire domain that is used to support both queries and commands. Also, let’s say we use a plain list for the Items property:
public IList<OrderItem> Items { get; private set; }
The private setter is good, but it prevents only users of an Order from replacing it. Any code that gets an instance of Order can easily add or remove elements from it. This might or might not be a legitimate operation; it depends on the use-case. If the use-case is managing the order, exposing order items through a list is just fine. If the use-case is showing the last 10 orders, a list is potentially dangerous because no changes to the order are expected.
On the other hand, if you expose the list as a plain enumeration of order items, you have no way to create an order and add items to it. In addition, individual items are still modifiable through direct access:
public IEnumerable<OrderItem> Items { get; private set; }
Things don’t change even if you use ReadOnlyCollection<T> instead of IEnumerable. A Microsoft .NET Framework read-only collection is read-only in the sense that it doesn’t allow changes to the structure of the collection. Furthermore, if the read-only collection is created as a wrapper for a regular list, changes to the underlying list do not affect the read-only wrapper. Here’s an example where order items are exposed as a read-only collection but methods still make it possible to populate the collection:
public class Order { private readonly IList<OrderItem> _items; public Order() { _items = new List<MOrderItem>(); } public ReadOnlyCollection<OrderItem> Items { get { return new ReadOnlyCollection<OrderItem>(_items); } } public void Add(int id, int quantity) { _items.Add(new OrderItem(id, quantity)); } } public class OrderItem { public OrderItem(int id, int quantity) { Quantity = quantity; ProductId = id; } public int Quantity { get; /*private*/ set; } public int ProductId { get; /*private*/ set; } }
However, direct access to elements in the collection is still possible—whether it is gained during a for-each loop, out of a LINQ query, or by index:
foreach (var i in order.Items) { i.Quantity ++; Console.WriteLine(i); }
To prevent changes to the data within the collection, you have to make the setter private.
This would work beautifully if it weren’t for yet another possible issue. Is it worthwhile to turn the OrderItem entity of the domain model into an immutable object?
Classes in the domain model are modified and made more and more complex because they can be used interchangeably in both query and command scenarios. Using the read-only wrapper, ultimately, is the first step toward making a read version of the Order entity.
From a domain model to a read model
When your goal is simply creating a domain model for read-only operations, everything comes easier and classes are simpler overall. Let’s look at a few varying points.
The notion of aggregates becomes less central, and with it the entire notion of the domain model as explained in Chapter 8. You probably still need to understand how entities aggregate in the model, but there’s no need to make this knowledge explicit through interfaces.
The overall structure of classes is more similar to data-transfer objects, and properties tend to be much more numerous than methods. Ideally, all you have are DTOs that map one-to-one with each screen in the application. Does that mean that model becomes anemic? Well, the model is 100 percent anemic when made of just data. An Order class, for example, will no longer have an AddItem method.
Again, there’s no issue with CQRS having a 100 percent anemic read model. Methods on such classes can still be useful, but only as long as they query the object and provide a quick way for the presentation or application layer to work. For example, a method IsPending on an Order class can still be defined as follows:
public bool IsPending() { return State == OrderState.Pending; }
This method is useful because it makes the code that uses the Order class easier to read and, more importantly, closer to the ubiquitous language.
Designing a read-model façade
The query stack might still need domain services to extract data from storage and serve it up to the application and presentation layers. In this case, domain services, and specifically repositories, should be retargeted to allow only read operations on the storage.
Restricting the database context
In the read stack, therefore, you don’t strictly need to have classic repositories with all CRUD methods and you don’t even need to expose all the power of the DbContext class, assuming you’re in an Entity Framework Code-First scenario, as described in Chapter 9, “Implementing the domain model,” and as it will be used in future chapters.
In Chapter 9, we had a class wrapping the Entity Framework DbContext and called it DomainModelFacade. The structure of the class is shown here:
public class DomainModelFacade : DbContext { public DomainModelFacade() : base(“naa4e-09”) { Products = base.Set<Product>(); Customers = base.Set<Customer>(); Orders = base.Set<Order>(); } public DbSet<Order> Orders { get; private set; } public DbSet<Customer> Customers { get; private set; } public DbSet<Product> Products { get; private set; } ... }
The DbSet class provides full access to the underlying database and can be used to set up queries and update operations via LINQ-to-Entities. The fundamental step toward a query pipeline is limiting the access to the database to queries only. Here are some changes:
public class ReadModelFacade : DbContext { public ReadModelFacade() : base(“naa4e-09”) { Products = base.Set<Product>(); Customers = base.Set<Customer>(); Orders = base.Set<Order>(); } public IQueryable<Customer> Customers { get { return _customers; } } public IQueryable<Order> Orders { get { return _orders; } } public IQueryable<Product> Products { get { return _products; } } ... }
Collections to query from the business logic on are now exposed via IQueryable interfaces. We said that the notion of aggregates loses focus in a read model. However, queryable data in the read-model façade mostly corresponds to aggregates in a full domain model.
Adjusting repositories
With a read-model façade, any attempt to access the database starts with an IQueryable object. You can still have a set of repository classes, populate them with a bunch of FindXxx methods, and use them from domain services and the application layer.
In doing so, you’ll certainly run into simple situations such as just needing to query all orders that have not been processed two weeks after they were placed. The FindXxx method can return a collection of Order items:
IEnumerable<Order> FindPendingOrderAfter(TimeSpan timespan);
But there are also situations in which you need to get all orders whose total exceeds a threshold. In this case, you need to report order details (like ID, date of creation, state, payment details) as well as customer details (at least the name and membership status). And, above all, you need to report the total of the order. There’s no such type in the domain; you need to create it. OK, no big deal: it’s just a classic DTO type:
IEnumerable<OrderSummary> FindOrdersBeyond(decimal threshold);
All is good if the OrderSummary DTO is general enough to be used in several repository queries. If it is not, you end up with too many DTO classes that are also too similar, which ultimately also poses a problem with names. But beyond the name and quantity of DTOs, there’s another underlying issue here: the number of repository methods and their names and implementation. Readability and maintainability are at stake.
A common way out is leaving only common queries as methods in the repositories that return common DTOs and handling all other cases through predicates:
public IEnumerable<T> Find(Expression<Func<T, Boolean>> predicate)
In this case, though, you’re stuck with using type T, and it might not be easy to massage any queried data into a generic DTO within a single method.
Layered expression trees
Over the past 20 years of developing software, we have seen a recurring pattern: when a common-use solution gets overwhelmingly complex and less and less manageable over time, it’s probably because it doesn’t address the problem well. At that point, it might be worth investigating a different approach to the problem. The different approach we suggest here to reduce the complexity of repositories and DTOs in a read model leverages the power of LINQ and expression trees.
Realistic scenarios
Let’s focus first on a few realistic scenarios where you need to query data in many different ways that are heavily dependent on business rules:
- Online store Given the profile of the user, the home page of the online store will present the three products that match the profile with the highest inventory level. It results in two conceptual queries: getting all products available for sale, and getting the three products with the highest inventory level that might be interesting to the user. The first query is common and belongs to some domain service. The second query is application specific and belongs to the application layer.
- ERP Retrieve all invoices of a business unit that haven’t been paid 30 days after their due payment terms. There are three conceptual queries here: getting all invoices, getting all invoices for the business unit, and getting all invoices for the business unit that are unpaid 30 days later. The first two queries are common and belong to some domain services. The third query sounds more application specific.
- CMS Retrieve all articles that have been published and, among them, pick those that match whatever search parameters have been specified. Again, it’s two conceptual queries: one domain-specific and one application-specific.
Why did we use the term conceptual query?
If you look at it conceptually, you see distinct queries. If you look at it from an implementation perspective, you just don’t want to have distinct queries. Use-cases often require queries that can be expressed in terms of filters applied over some large sets of data. Each filter expresses a business rule; rules can be composed and reused in different use-cases.
To get this, you have two approaches:
- Hide all filters in a repository method, build a single super-optimized query, run it, and return results. Each result is likely a different DTO. In doing this, you’re going to have nearly one method for each scenario and new or modified methods when something changes. The problem is not facing change; the problem is minimizing the effort (and risk of regression) when change occurs. Touching the repository interface is a lot of work because it might have an impact on upper layers. If you can make changes only at the application level, it would be much easier to handle and less invasive.
- Try LINQ and expression trees.
Let’s see what it takes to use layered expression trees (LET).
Using IQueryable as your currency
The idea behind LET is enabling the application layer to receive IQueryable<T> objects wherever possible. In this way, the required query emerges through the composition of filters and the actual projection of data is specified at the last minute, right in the application layer where data is being used to generate the view model for the presentation to render.
With this idea in mind, you don’t even need repositories in a read model, and perhaps not even as a container of common queries that return direct and immediately usable data that likely will not be filtered any more. A good example of a method you might still want to have in a separate repository class is a FindById.
You can use the public properties of the aforementioned read façade as the starting point to compose your queries. Or, if necessary, you can use ad hoc components for the same purpose. In this way, in fact, you encapsulate the read-model façade—still a point of contact with persistence technology—in such components. Here’s what the query to retrieve three products to feature on the home page might look like. This code ideally belongs to the application layer:
var queryProducts = (from p in CatalogServices.GetProductsAvailableForSale() orderby p.UnitsInStock descending select new ProductDescriptor { Id = p.Id, Name = p.Name, UnitPrice = p.UnitPrice, UnitsInStock = p.UnitsInStock, }).Take(3);
Here’s another example that uses the recommended async version of LINQ methods:
var userName = _securityService.GetUserName(); var currentEmployee = await _database .Employees .AsNoTracking() .WhereEmployeeIsCurrentUser(userName) .Select(employee => new CurrentEmployeeDTO { EmployeeId = employee.Id, FirstName = employee.PersonalInformation.FirstName, LastName = employee.PersonalInformation.LastName, Email = employee.PersonalInformation.Email, Identifier = employee.PersonalInformation.Identifier, JobTitle = employee.JobTitle, IsManager = employee.IsTeamManager, TeamId = employee.TeamId, }).SingleOrDefaultAsync(); currentEmployee.PictureUrl = Url.Link(“EmployeePicture”, new { employeeId = currentEmployee.EmployeeId });
As you might have noticed, the first code snippet doesn’t end with a call to ToList, First, or similar methods. So it is crucial to clarify what it means to work with IQueryable objects.
The IQueryable interface allows you to define a query against a LINQ provider, such as a database. The query, however, has deferred execution and subsequently can be built in multiple steps. No database access is performed until you call an execution method such as ToList. For example, when you query all products on sale, you’re not retrieving all 200,000 records that match those criteria. When you add Take(3), you’re just refining the query. The query executes when the following code is invoked:
var featuredProducts = queryProducts.ToList();
The SQL code that hits the database has the following template:
SELECT TOP 3 ... WHERE ...
In the end, you pass the IQueryable object through the layers and each layer can add filters along the way, making the query more precise. You typically resolve the query in the application layer and get just the subset of data you need in that particular use-case.
Isn’t LET the same as an in-memory list?
No, LET is not the same as having an in-memory list and querying it via LINQ-to-Objects. If you load all products in memory and then use LINQ to extract a subset, you’re discarding tons of data you pulled out of the database.
LET still performs a database access using the best query that the underlying LINQ provider can generate. However, IQueryable works transparently on any LINQ provider. So if the aforementioned method GetProductsAvailableForSale internally uses a static list of preloaded Product instances, the LET approach still works, except that it leverages LINQ-to-Objects instead of the LINQ dialect supported by the underlying database access layer.
Using LET is not the same as having a static list, but that doesn’t mean having a static list is a bad thing. If you see benefits in keeping, say, all products in memory, a static list is probably a good approach. LET is a better approach if the displayed data is read from some database every time.
Upsides of LET
The use of LET has several benefits. The most remarkable benefit is that you need almost no DTOs. More precisely, you don’t need DTOs to carry data across layers. If you let queries reach the application layer, all you do is fetch data directly in the view model classes. On the other hand, a view model is unavoidable because you still need to pass data to the user interface in some way.
Another benefit is that the code you write is somehow natural. It’s really like you’re using the database directly, except that the language is much easier to learn and use than plain-old T-SQL.
Queries are DDD-friendly because their logic closely follows the ubiquitous language, and sometimes it seems that domain experts wrote the queries. Among other things, DDD-friendly queries are also helpful when a customer calls to report a bug. You look into the section of the code that produces unexpected results and read the query. You can almost read your code to the customer and quickly figure out whether the reason unexpected data is showing up on the screen is logical (you wrote the wrong query) or technical (the implementation is broken). Have a look at the following code:
var db = new ReadModelFacade(); var model = from i in db.IncomingInvoices .ForBusinessUnit(buId) .Expired() orderby i.PaymentDueDate select new SummaryViewModel.Invoice { Id = i.ID, SupplierName = i.Party.Name, PaymentDueDate = i.PaymentDueDate.Value, TotalAmount = i.TotalPrice, Notes = i.Notes };
The code filters all invoices to retrieve those charged to a given business unit that haven’t been paid yet. Methods like ForBusinessUnit and Expired are (optional) extension methods on the IQueryable type. All they do is add a WHERE clause to the final query:
public static IQueryable<Invoice> ForBusinessUnit(this IQueryable<Invoice> query, int buId) { var invoices = from i in query where i.BusinessUnit.OrganizationID == buId select i; return invoices; }
Last but not certainly least, LET fetches all data in a single step. The resulting query might be complex, but it is not necessarily too slow for the application. Here we can’t help quoting the timeless wisdom of Donald Knuth: “Premature optimization is the root of all evil.” As Andrea repeats in every class he teaches, three things are really important in the assessment of enterprise architecture: measure, measure, and measure. We’re not here to say that LET will always outperform any other solution, but before looking for alternative solutions and better SQL code, first make sure you have concrete evidence that LET doesn’t work for you.
Downsides of LET
Overall LET is a solution you should always consider, but like anything else it is not a silver bullet. Let’s see which factors might make it less than appealing.
The first point to consider is that LET works beautifully on top of SQL Server and Entity Framework, but there’s no guarantee it can do the same when other databases and, more importantly, other LINQ providers are used.
LET sits in between the application layer and persistence in much the same way repositories do. So is LET a general abstraction mechanism? The IQueryable interface is, in effect, an abstraction layer. However, it strictly depends on the underlying LINQ provider, how it maps expression trees to SQL commands, and how it performs. We can attest that things always worked well on top of Entity Framework and SQL Server. Likewise, we experienced trouble using LET on top of the LINQ provider you find in NHibernate. Overall, the argument that LET is a leaky abstraction over persistence is acceptable in theory.
In practice, though, not all applications are really concerned about switching the data-access engine. Most applications just choose one engine and stick to that. If the engine is SQL Server and you use Entity Framework, the LET abstraction is not leaky. But we agree that if you’re building a framework that can be installed on top of your database of choice, repositories and DTOs are probably a better abstraction to use.
Finally, LET doesn’t work over tiers. Is this a problem? Tiers are expensive, and we suggest you always find a way to avoid them. Yet sometimes tiers provide more scalability. However, as far as scalability is concerned, let us reiterate a point we made in a past chapter: if scalability is your major concern, you should also consider scaling out by keeping the entire stack on a single tier and running more instances of it on a cloud host such as Microsoft Azure.