Simple Architectures for Complex Enterprises: A Case Study in Complexity
- 5/7/2008
- Overview of NPfIT
- Current Status of NPfIT
- The SIP Approach
- Summary
The SIP Approach
Clearly, NPfIT is a very expensive project in very deep trouble. But could SIP have helped? Let’s look at how the SIP process would have likely played out with NPfIT.
Let’s start in Phase 1. The first deliverable of Phase 1 is an audit of organizational readiness. Such an audit would have revealed deep distrust between the NHS IT organization and the business units (health care providers). This would have been an immediate sign of concern.
Also in Phase 1 we would have delivered extensive training in the nature of complexity. We would have spent considerable time discussing how important it was that complexity, especially on such a massive undertaking as NPfIT, be managed as the absolute highest priority.
In Phase 2, we would have been working on the partitioning. In the case of NPfIT, considerable effort had already been done on partitioning; Figure 6-1 could be viewed as an ABC diagram of NPfIT. The question is, does that diagram represent good partitioning? Is it even a partitioning (in the mathematical sense) at all?
Figure 6-1 does not give us enough information to answer this question. We need to understand not only how the organization is being decomposed into sets of functionality, but what the type relationships are between those sets.
So let’s tackle this. Figure 6-3 shows an ABC diagram of the clinical information part of NPfIT (the part that owns 80 percent of the NPfIT budget), focusing on types, implementations, and deployments. Compare this figure to Figure 6-1.
Figure 6-3. ABC diagram of NPfIT regional CIS.
In Figure 6-3, the central problem of NPfIT jumps out like a sore thumb (a $10 billion sore thumb). They implemented the various regional clinical information systems as siblings (in SIP talk) rather than as clones. In other words, they created five different implementations of the same system. The same very complex system.
Interestingly, NHS did this on purpose. Now why, you might ask, would anybody take a highly complex system that they would be lucky to implement properly once and tempt the fates with five completely different implementations created by five completely different vendors?
The reason NHS gave for the multiple implementations was that it didn’t want to be dependent on any one vendor. This example illustrates a common reason that so many projects become so complex so quickly: poor communication between the business and IT units.
Somebody in the business group decides on some business requirement—say, X. In this case, X can be stated as, “There must be no dependency on any one vendor for the regional CIS portion of NPfIT.” X gets passed around. It sounds reasonable. Who wants to be dependent on one vendor? X is accepted as a business requirement. It drives a series of technical requirements. In this case, the technical requirement is that there must five independent implementations of the regional CIS.
Everything seems reasonable. A reasonable business requirement driving the necessary technical requirements. So what would have been done differently using SIP?
A SIP process would have encouraged this business requirement to have been measured against the complexity it would introduce. Complexity, in the SIP world, trumps almost everything. The diagram in Figure 6-3 would have been a warning sign that we have a huge amount of unnecessary complexity. Because both the business and technical folks would have already been through the SIP training, they would understand the frightening implications of complexity. On a project of this scope, the project motto should be, “Our Top Three Concerns: Complexity, Complexity, Complexity.”
Given a common conditioned response to complexity, it would have been easy to discuss the importance of this particular business requirement relative to its cost. We would have asked some pointed questions. Is it really necessary to be vendor independent? Is the multibillion dollar cost worth vendor independence? Is meeting this requirement going to put the project at more risk than if we dropped this requirement? Is it even possible to be vendor independent? Are multiple implementations the only way to achieve vendor independence? Would parallel implementations, with one chosen in a final shootout, be a better approach to achieving vendor independence?
I don’t know which solution would have been chosen in a SIP approach. But I know one solution that would not have been chosen: five independent implementations of the same type. This is an extreme case of an unpartitioned architecture. And an unpartitioned architecture, in a SIP analysis, is unacceptable. It is not unacceptable because one person or another doesn’t like the diagrams it produces. It is unacceptable because it fails to satisfy the mathematical models that predict whether or not the architecture can be successful.
So by the end of Phase 2, we would have dropped four of the five proposed implementations for regional clinical information systems. Expected complexity reduction: 80 percent.
But we aren’t done yet. Next we enter Phase 3, the phase in which we simplify our partition. I’ll continue my focus on the regional CIS portion of NPfIT.
Of course, we have already done quite a bit to simplify the regional CIS portion. We have eliminated 80 percent of the work, but we are still left with a highly complex system. What is the best way to simplify a highly complex system? If you have been following the SIP discussion, the answer should be obvious: partitioning. The most effective way to tame the regional CIS monster is to partition it into four or five subsets, each with synergistic functionality, and each with functionality that is autonomous with respect to the functionality in the other subsets.
One possible partition of subsets might include, for example, patient registration, appointment booking, prescriptions, patient records, and lab and radiology tests.
To explain this in SIP terminology, we have taken an autonomous business capability (ABC) that includes the regional CIS and decomposed it into five lower level ABCs. Figure 6-4 shows the regional CIS ABC before and after this process.
Figure 6-4. Decomposition of regional CIS.
At this point, we check our post-SIP analysis against the five Laws of Partitions. The First Law says that all the original functionality of the regional CIS must end up in one and only one of the subsets. The Second Law says that the subsets must make sense from an organizational perspective. The Third Law says that there should be a reasonable number of subsets in the partition. The Fourth Law says that subsets must be roughly equal in size and stature. The Fifth Law says that subset interactions must be minimal and well regulated. The first four laws can be checked relatively easily. The fifth law needs to be revisited after we have more details about the technical architecture.
The partitioning of the regional CIS ABC will likely result in a huge further reduction in complexity. How much? The mathematical models predict possible reductions of more than 99.99 percent. These are based on theoretical numbers, not real-world numbers, but as I discussed in Chapter 3, 90-percent reductions in the real world are likely. And remember, we have already removed 80 percent of the complexity, so now we are removing 90 percent of the 20 percent that is left. This means that realistically we are now down to perhaps 2 percent of the complexity with which we started.
And there is yet more we can do to reduce complexity. We can look at reducing both the functionality footprint (the amount of functionality in the final system) and the implementation footprint (the impact on the IT staff).
Reducing the functionality footprint means re-examining all the business and technical requirements and confirming that, first of all, every business requirement is absolutely necessary, and second of all, that every technical requirement can be traced back to a business requirement. Remember that we have already found one business requirement (vendor dependence) that is either unnecessary or highly suspect.
Reducing the implementation footprint means looking for opportunities to consolidate or outsource subsets. The type information we have generated on the ABCs will be a great help in our efforts to reduce the implementation footprint.
The next phase is Phase 4, in which we prioritize the subsets making up the partition. Again, I will focus on the regional CIS portion of NPfIT.
In Phase 3, we identified five subsets of the regional CIS that together form a partition:
Patient Registration
Appointment Booking
Prescriptions
Patient Records
Lab Tests
In the actual NHS plan, this functionality was delivered en masse. In the SIP approach, we want to deliver this functionality iteratively. In Phase 4, we decide on the order of iteration.
Iteration order should be based on risk, cost, and benefit. The basic rule of thumb is to go for the low-hanging fruit first. In the SIP world, low-hanging fruit is defined as ABCs that are highly visible, low cost, and low risk. These requirements are sometimes at odds with each other (although, in my experience, less often than people think). The best way to sort this out is with the Value Graph Analysis that I described in Chapter 5. If we were using Value Graph Analysis in this project, we would have standardized the analysis back in Phase 1 of the project.
What usually makes an ABC “high visibility” is its association with organizational pain points. Let’s say, for example, that NHS was notorious for the length of time it took to book appointments. This factor would tend to move Appointment Booking ahead in the priority list. Lab Tests, on the other hand, might be something that is already handled reasonably well. Lab Tests might still be worth doing, say, because it can reduce the cost of processing lab tests, but without high visibility, it doesn’t rate as a high priority.
Let’s say that at the end of Phase 4 we have decided on the following order of iterations:
Appointment Booking
Patient Registration
Prescriptions
Patient Records
Lab Tests
Next is Phase 5, the iterative phase. As I have said, Phase 5 is the one in which we have the fewest opinions, other than that the candidate ABCs be implemented in an iterative fashion and that the order follow the priority laid out in Phase 4. The implementation of an ABC is effectively a solution architecture and implementation issue, and I’m assuming that an organization already has processes in place to create and implement a solution architecture. You might, for example, use The Open Group Architectural Framework (TOGAF), with its emphasis on process and current and future architectures. You might use some of the Federal Enterprise Architecture (FEA) characterizations of functionality given in its Reference Models. You might use Zachman’s taxonomy to ensure that you are considering each important perspective on each important capability of the system. You might use IBM’s Unified Process or the Microsoft Solution Framework to guide the implementation process. These are outside of the scope of SIP.
But the iterative approach is not outside the scope of SIP. I believe that the first ABC should be rolled out, tested, approved, deployed, and embraced before the next one is started. Such an approach allows you to learn your lessons as cheaply as possible and apply them as broadly as possible. It also helps you build enthusiasm for the overall project. Nothing succeeds, as they say, like success. Success attracts success. Let’s see how such an approach might have benefited NPfIT. We could look at any number of issues plaguing NPfIT. Let’s consider one that I haven’t discussed yet: risk management.
LORENZO was an existing product developed by iSOFT, and NHS was impressed with LORENZO’s user-friendly screens and broad CIS functionality. For this reason, NHS encouraged its use as a core component for all of its regional CIS systems.
Accenture seemed similarly impressed with LORENZO. In June 2004, Accenture/iSOFT released a joint press release saying,
A set of information processing tools promoting governance, quality, efficiency, and consent in healthcare, LORENZO facilitates the free flow of information among the entire healthcare community, including general practitioners, hospitals and patients. As Accenture deploys LORENZO across the two regions, the software’s unified architecture will form the basis of solutions tailored to meet local requirements and information needs of healthcare professionals.39
But there was a hidden time bomb in LORENZO. This time bomb can be summed up in two words: client/server.
According to a performance audit of LORENZO conducted in April 2006 by Health Industry Insights and commissioned by iSOFT, the LORENZO architecture as it existed in 2004 was “based on a fat client/server model.”40 Accenture was either blissfully unaware of the fact that LORENZO was a client/server system or was ignorant of the issues one faces with client/server architectures.
What is the problem with client/server models? The client/server architecture is based on a two-machine configuration. One machine (the “client”) contains the user-interface code and the business logic. The other machine (the “server”) contains the code that manages data in the database.
The two machines are “connected” by database connections. A database connection is created when a client machine requests access rights to the database owned by the server. The database looks at the credentials of the requesting machine, and, if it is satisfied, creates a database connection. A database connection is technically a block of data that the client presents to the server machine when making data access requests. When the client machine is ready to shut down for the day, it releases its database connection by letting the server machine know that it will no longer require the services of the database.
There are several reasons that client/server architectures are so popular. For one, they are very fast. They are fast because the client machine requests the database connection (a highly expensive request) only once, in the beginning of the day, when the client machine is first started.
Client/server systems are also easy to implement because the code that presents the data (the “user interface logic”) is located in the same process as the code that manipulates the data (the “business logic”). This makes it easy to mingle the presentation logic and the business logic, with the result of lightning-fast data presentation and manipulation.
So back to my original question. What is wrong with a client/server architecture? Actually, there is only one problem with client/server systems. They do not scale. Although they work great for small numbers of users (measured, say, in the dozens), they do not work at all well for large numbers of users (measured, say, in the thousands). And the user requirements of NPfIT were measured in the tens of thousands.
The reason client/server architectures do not scale well is that each client machine requires a dedicated database connection. Databases are limited in the number of database connections they can support. When each client requires a dedicated database connection, the number of client machines is limited by the number of database connections supported by the database. And because client machines are in a one-to-one relationship to users, this limits the number of users who can use the system at any one time.
So a client/server architecture, with its extreme limitation on numbers of clients, is a problem for NPfIT. A big problem.
To address the scalability limitations of client/server architectures, a new style of technical architecture was developed, initially, in the 1970s, and was quite mature by the mid-1980s. This new style of technical architecture is known as three-tier.
In a three-tier architecture, one machine runs the database, as it had in the client/server architecture. But now the user-interface logic and the business logic are separated. The user-interface logic lives on the machine before which the human being sits. But the business logic lives on another machine. This machine is often referred to as the “middle tier” because it conceptually lives in between the user interface machine and the database machine.
It is the middle tier machine that owns the database connections. This arrangement allows a pooling of those very expensive database connections so that when a database connection is not being used by one client, it can be used by another.
So the obvious issue that iSOFT faced with its LORENZO product, back in 2004, was how to take a product based on a fundamentally nonscalable architecture and turn it into a scalable system. There is really only one answer to this problem. The company had to rearchitect LORENZO from a client/server architecture to some variation of a three-tier architecture.
This, according to that previously quoted audit, is exactly what iSOFT did. In fact, the company decided that it would go one better. It would bypass the three-tier architecture and move directly to an even more advanced architecture known as service-oriented architecture (SOA). An SOA is essentially an architecture in which the middle tier has been split further apart, with business functionality distributed over a number of middle-tier-like machines, each using industry-standard service-oriented messages as a communications protocol.
As the audit stated,
this new [LORENZO] architecture... utilizes a service oriented architecture (SOA) ... making iSOFT the first major CIS vendor worldwide to base its overall architecture principally on SOA. This architecture will serve as the foundation for the entire line of LORENZO solutions, allowing different subsets or combinations of existing and planned functional capabilities to be delivered on a common technical platform. For both iSOFT and its clients, this strategy will facilitate the ability to cost-effectively configure and scale CIS applications to meet a wide range of organizational models and functional demands...because the client machine is almost entirely focused on working with the human client.
Although this transformation from client/server to SOA was absolutely necessary from a scalability perspective, it was also something else: highly risky.
Many organizations have “ported” three-tier architectures to SOAs. This process is usually straightforward because the two architectures are so similar. However, LORENZO, remember, was not a three-tier architecture. It was a client/server architecture.
The transformation from client/server to either three-tier or SOA is rarely straightforward. Either process requires massive changes to the underlying programs. All of that nicely intermingled user-interface and business logic needs to be painstakingly located and laboriously separated. More often than not, it is less expensive to re-implement the system from scratch rather than try to make (and debug) the necessary changes. So while LORENZO might have been a wonderful product, it was a product that would have to be rewritten from the ground up to meet the needs of NPfIT. And further, it would need to be rewritten by a group that had no previous experience in either three-tier architectures or SOAs, both of which are highly specialized areas.
There is no way to know if Accenture knew about this high-risk factor back in 2004. It should have. Any reasonably competent architect could have looked at the LORENZO code and recognized the unmistakable fingerprint of a client/server architecture. But there was no indication in its joint press release that this issue was understood or that the risk factor had been addressed.
The indications are that by the time the limitations of LORENZO’s architecture were understood, three of the five regional clusters of NPfIT were in serious trouble and Accenture was so deeply over its head that it was ready to jump from the sinking ship.
An iterative approach to delivering the regions would not have made the iSOFT architectural limitations any less real. But it would have made them obvious much earlier in the project. While it might have been too late to save all three regions that had bet on LORENZO, at least two of the regions could have learned from the painful lessons of the first. Billions of dollars would likely have been saved overall.
Iterative delivery is a key strategy in managing high-risk factors. Unfortunately, it is a strategy that was not used by NPfIT.
There is yet another problem facing NPfIT besides risky architectures, and this is low user confidence. Let’s see how this played out in NPfIT and how iteration could have helped.
Regardless of how good or bad NPfIT ends up, its ultimate success or failure is in the hands of its users. The support of the hundreds of thousands of health care workers and patients will determine the final judgment of this project. As with most large IT projects, user perception is reality. If users think the project is a success, it is a success. If users think the project is a failure, it is a failure, regardless of how much the project owners believe otherwise.
Iterative delivery can be a great help here. If the early deliveries are a failure, their failures are limited in scope and in visibility. If they are a success, the enthusiasm of the initial users becomes contagious. Everybody wants to be the next owner of the new toy!
As I pointed out earlier in this chapter, NPfIT suffers a major credibility gap with health care workers, patients, and the IT community. It seems that nobody other than NHS management believes that this multibillion dollar investment is going to pay off.
Could it have been different? Suppose NHS had chosen the highest visibility ABC from the list of candidates, the Appointment Booking ABC. Imagine that NHS had endured years of criticism for difficulties in its current booking procedures and then rolled out this new automated booking system. Suppose it first showed prototypes to the health care professionals. Say they loved the interface but had a few suggestions. NHS then incorporated those suggestions and rolled out the Appointment Booking to one region.
Very quickly booking in that region went from six-month waiting lists to four-day waiting lists. Appointments that used to require hours of standing in line now take a few minutes on a Web browser or on a phone. Other regions would be clamoring to be the next one in line for deployment.
As Appointment Booking was deployed across the UK, the entire health care system would have appeared to have been transformed. Even though only one small part of the overall health care process, appointments, had been affected, that impact would have been felt in a positive way by every constituent group.
As NHS started work on its next ABC, Patient Registration, it would be basking in the success of its previous work. It would be facing a world that supported its efforts, believed its promises, and eagerly awaited its next delivery.
This is the way it could have been had NHS used an iterative delivery model based on SIP. But it didn’t. And instead, it faces a world that ridicules its efforts, laughs at its promises, and dreads its next delivery. The world believes that NPfIT will be a failure. In the eyes of the world, failure is all NHS has delivered. Why should the future be any different?
Ironically, even if NPfIT does manage to deliver any successes, it will be hard pressed to prove it. Why? Because at the start of this multibillion dollar project, nobody bothered to document what success would look like in any measurable fashion. Let me show you what I mean.
The NPfIT business plan of 200541 gave these success indicators for patients:
Patients will have a greater opportunity to influence the way they are treated by the NHS.
Patients will be able to discuss their treatment options and experience a more personalised health service.
Patients will experience greater convenience and certainty, which will reduce the stress of referral.
Patients will have a choice of time and place, which will enable them to fit their treatment in with their life, not the other way round.
For health care providers, the business plan promised these benefits:
General practitioners and their practice staff will have much greater access to their patients’ care management plans, ensuring that the correct appointments are made.
General practitioners and practice staff will see a reduction in the amount of time spent on the paper chase and bureaucracy associated with existing referral processes.
Consultants and booking staff will see a reduction in the administrative burden of chasing hospital appointments on behalf of patients.
The volume of Did Not Attends (DNAs) will reduce, because patients will agree on their date, and consultants will have a more secure referral audit trail.
What do all of these deliverables have in common? None have any yardstick that can be used to measure success or failure. None are attached to any dollar amount that can help justify the project. In fact, one could argue that all of these “success factors” could have been met by simply replacing the manual pencil sharpeners by electric ones!
I made the assertion in Chapter 5 that while many organizations claim to use an ROI (return on investment) yardstick to justify new projects, few really do. NPfIT is an excellent example of this. There is not a single ROI measurable included in the so-called success factors.
SIP is dogmatic about the need for measurable success factors tied to dollar amounts. It is a critical part of the prioritization activity of Phase 4 and is made concrete in the Value Graph Analysis. What would SIP-mandated measurable success factors have looked like? Here are some possible examples:
A reduction by 50 percent in the personnel hours spent managing patient booking. This will save 140 million person hours per year at a savings of approximately $1.56 billion annually.
A reduction by 50 percent of the DNAs (Did Not Attends), for a savings of $780 million annually.
A reduction by 75 percent of the cost of managing patient records, for a savings of $3.50 billion annually.
Do these specific measurables make sense? They are at least consistent with the NHS released data. I have no way to know if they are accurate or not, but these are the kind of measurements that would have served two purposes. First, they would have allowed the NHS to determine if it had, in fact, met its goals once (or if) NPfIT is ever completed. Second, they could have been used to convince a skeptical public that the project was worth undertaking in the first place.
NHS is in the process of learning a very expensive, very painful lesson. Complexity is your enemy. Control it, or it will control you.