- By Tarik Soulami
Windows Developer Interface
Developers can extend the services that ship with the Windows operating system by building extensions in the way of kernel drivers or standalone user-mode Windows applications. This section examines some of the key layers and APIs that make building these extensions possible.
Developer Documentation Resources
Microsoft documents several APIs that developers can use when building their applications. The difference between these published interfaces and the internal (private) implementation details of the platform is that Microsoft has committed over the past two decades to building an ecosystem in which the public interfaces are carried forward with new releases of Windows, affording application developers the confidence that the applications they build today will continue to work on future OS versions. It's fair to say that this engineering discipline is one of the reasons Windows has been so successful with developers and end users alike.
Microsoft documents all of the interfaces and APIs it publishes on the Microsoft Developer Network (MSDN) website at http://www.microsoft.com/msdn. When writing your software, you should use only officially supported public APIs so that your software doesn't break when new versions of the operating system are released by Microsoft. Undocumented APIs can often disappear or get renamed, even in service pack releases of the same OS version, so you should never target them in your software unless you are prepared to deal with the wrath of your customers when your software ominously stops working after a new Windows update is installed. That's not to say you shouldn't be interested in internal implementation details, of course. In fact, this book proves that knowing some of those details is often important when debugging your programs and can help you analyze their behaviors more proficiently, which in turns also helps you write better and more reliable software in the long run.
In addition to documenting the public APIs written by Microsoft for developers, the MSDN website also contains a wealth of articles describing features or areas at a higher level. In particular, it hosts a special category of articles, called Knowledge Base articles (KB articles for short) that are published by Microsoft's customer support team to document workarounds for known issues. Generally speaking, the MSDN website should be your first stop when looking up the documented behavior of the APIs you use in your code or when trying to learn how to use a new feature in the OS.
WDM, KMDF, and UMDF
Developers can run their code with kernel-mode privileges and extend the functionality of the OS by implementing a kernel-mode driver. Though the vast majority of developers will never need to write a kernel driver, it's still useful to understand the layered plug-in models used by Windows to support these kernel extensions because this knowledge sometimes can help you make sense of kernel call stacks during kernel-mode debugging investigations.
Driver extensions are often needed to handle communication with hardware devices that aren't already supported because—as mentioned earlier—user-mode code isn't allowed to access I/O ports directly. In addition, drivers are sometimes used to implement or extend system features. For example, many tools in the SysInternals suite—including the process monitor tool, which installs a filter driver to monitor access to system resources—internally install drivers when they're executed to implement their functionality.
There are many ways to write drivers, but the main model used to implement them in Windows is the Windows Driver Model (WDM). Because this model asks a lot from driver developers in terms of handling all the interactions with the I/O manager and the rest of the operating system and often results in a lot of duplicated boilerplate code that has to be implemented by all driver developers, the kernel-mode driver framework (KMDF) was introduced to simplify the task of writing kernel-mode drivers. Keep in mind, however, that KMDF doesn't replace WDM; rather, it's a framework that helps you more easily write drivers that comply with WDM's requirements. Generally speaking, you should write your drivers using KMDF unless you find a good reason not to do so, such as when you need to write non-WDM drivers. This is the case, for instance, for network, SCSI, or video drivers, which have their own world, so to speak, and require you to write what is called a “miniport” driver to plug into their respective port drivers.
A subset of hardware drivers also can be executed completely in user mode (though without direct access to kernel memory space or I/O ports). These drivers can be developed using another framework shipped by Microsoft called the user-mode driver framework, or UMDF. For more details on the different driver models and their architectures, you can find a wealth of resources on the MSDN website at http://msdn.microsoft.com. The OSR website at http://www.osronline.com is also worth a visit if you ever need to write or debug drivers in Windows.
The NTDLL and USER32 Layers
As mentioned earlier in this chapter, the NTDLL and USER32 layers contain the entry points to the executive service routines and kernel-mode portion of the Win32 subsystem (win32k.sys), respectively.
There are hundreds of executive service stubs in the NTDLL module (ntdll!NtSetEvent, ntdll!NtReadFile, and many others). The majority of these service stubs are undocumented in the MSDN, but a few stub entry points were deemed generally useful to third-party system software and are documented. The NTDLL.DLL system module also hosts several low-level OS features, such as the module loader (ntdll!Ldr* routines), the Win32 subsystem process communication functions (ntdll!Csr* routines), and several run-time library functions (ntdll!Rtl* routines) that expose features such as the Windows heap manager and Win32 critical section implementations.
The NTDLL module is used by many other DLLs in the Win32 API to transition into kernel mode and call executive service routines. Similarly, the USER32 DLL is also used by the Windows graphics architectural stack (DirectX, GDI32, and so on) as the gateway to transition into kernel mode so that it can communicate with the graphics processing unit (GPU) hardware.
The Win32 API Layer
The Win32 API layer is probably the most important layer to learn for developers who are new to Windows because it's the official public interface to the services exposed by the operating system. All of the Win32 API functions are documented in the MSDN, along with their expected parameters and potential return codes. Even if you are writing your software using a higher-level development framework or API set, as most of us do these days, being aware of the capabilities exposed at this layer will help you get a much better feel for a framework's advantages and limitations relative to other choices, as well as the raw capabilities exposed by the Win32 API and Windows executive.
The Win32 API layer covers a large set of functionality, going from basic services like creating threads/processes or drawing shapes on the screen to higher-level areas such as cryptography. The most basic services at the bottom of the Win32 API's architectural stack are exposed in the kernel32.dll module. Other widely used Win32 DLL modules are advapi32.dll (general utility functions), user32.dll (Windows and user object functions), and gdi32.dll (graphics functions). In Windows 7, the Win32 DLL modules are now layered so that lower-level base functions aren't allowed to call up to higher-level modules in the hierarchical stack. This layering engineering discipline helps prevent circular dependencies between modules and also minimizes the performance impact of bringing a new DLL dependency from the Win32 API set into your process address space. This is why you will see that many of the public APIs exported in the kernel32.dll module now simply forward their calls to the implementation defined in the lower-level kernelbase.dll DLL module, which is useful to know when trying to set system breakpoints in the debugger. This layered hierarchy is demonstrated in Figure 1-6.
Figure 1-6 Low-level Win32 DLL modules.
The COM Layer
The Component Object Model (COM) was introduced by Microsoft in the mid-90s as a user-mode framework to enable developers to write reusable object-oriented components in different programming languages. If you are new to COM, the best resource to use to get started and gain an understanding of its model is the “Component Object Model Specification” document, which you can still find online. Though this is an old (circa 1995) and relatively long document, it provides a great overview of the COM binary specification, and much of what it describes still holds true to this day.
Over time, the use of the term “COM” grew to cover a number of different but related technologies, including the following:
- The object model itself, which the label “COM” was technically designed to describe at first. Key parts of this object model are the standard IUnknown and IClassFactory interfaces, the idea of separation of class interfaces (the public contract) from the internal COM class implementation, and the ability to query a server object for an implementation of the contracts that it supports (IUnknown::QueryInterface). This model, in its pure form, is one of the most elegant contributions by Microsoft to the developer ecosystem, and it has had a deep impact that still reverberates to this day in the form of various derivative technologies based on that model.
- The interprocess communication protocol and registration that allows components to communicate with each other without the client application having to know where the server component lives. This enables COM clients to transparently use servers implemented locally in-process (DLLs) or out-of-process (EXEs), as well as servers that are hosted on a remote computer. The distinction between COM and Distributed COM (DCOM, or COM across machines) is often only theoretical, and most of the internal building blocks are shared between the two technologies.
- The protocol specifications built on top of the COM object model to enable hosts to communicate with objects written in multiple languages. This includes the OLE Automation and ActiveX technologies, in particular.
COM in the Windows Operating System
COM is omnipresent in the Windows operating system. Although the Microsoft .NET Framework has largely superseded COM as the technology of choice for Windows application-level development, COM has had a lasting impact on the Windows development landscape, and it's far from a dead technology as it continues to be the foundation for many components (for example, the Windows shell UI uses COM extensively) and even some of the newest technologies in the Windows development landscape. Even on a Windows installation with no other additional applications, you will still find thousands of COM class identifiers (CLSID for short) describing existing COM classes (servers) in the COM registry catalog, as shown in Figure 1-7.
Figure 1-7 COM CLSID hive in the Windows registry.
More recently, the new model unveiled by Microsoft for developing touch-capable, Windows runtime (WinRT) applications in its upcoming version of Windows also uses COM for its core binary compatibility layer. In many ways, COM remains required knowledge if you really want to master the intricacies of Windows as a development platform. In addition, its design patterns and programming model have an educational value of their own. Even if you are never going to write a COM object (server), the programs and scripts you write often use existing COM objects, either explicitly or indirectly via API calls. So, knowing how COM works is particularly useful in debugging situations that involve such calls.
COM developers usually interact primarily with language-level features and tools when writing or consuming COM servers. However, debugging COM failures also requires knowledge of how the system implements the COM specification. In that sense, there are a few different aspects to the COM landscape that can all come into play during COM debugging investigations:
- The COM “Library” This is essentially Microsoft's system implementation of the COM binary specification. The COM library consists primarily of the COM run-time code and Win32 APIs (CoInitialize, CoCreateInstance, and others) that ship as part of the ole32.dll system module. This run-time code also uses support provided by two Windows services collectively referred to as the COM service control manager (COM SCM). These services are the RpcSs service, which runs with NetworkService privileges, and the DComLaunch service, which runs with LocalSystem privileges.
- COM language tools These are the source-level compilers and tools to support COM's binary specification. This includes the interface definition language (IDL) compiler (MIDL), which allows COM classes and interfaces to be consumed inside C/C++ programs, and the binary type library importer and exporter tools, which enable cross-language interoperability in COM. Note that these tools do not ship with the OS, but come as part of developer tools such as the Windows SDK.
- COM frameworks These are frameworks that make it easier for developers to write COM components that conform to the COM binary specification. An example is the Microsoft C++ Active Template Library (ATL).
Writing COM Servers
The COM specification places several requirements on developers writing COM objects and their hosting modules. Writing a COM object in C++ entails, at the very least, declaring its published interfaces in an IDL file and implementing the standard IUnknown COM interface that allows reference counting of the object and enables clients to negotiate contracts with the server. A class factory object—a COM-style support object that implements the IClassFactory COM interface but doesn't need to be published to the registry—must also be written for each CLSID to create its object instances.
On top of all of this, the developer is also required to implement the hosting module (DLL or EXE) so that it also conforms to all the other requirements of the COM specification. For example, in the case of a DLL module, a C-style function (DllGetClassObject) that returns a pointer to the class factory of CLSIDs hosted by the module must be exported for use by the COM library. For an executable module, the COM library can't simply call an exported function, so ole32!CoRegisterClassObjects must be called by the server executable itself when it starts up in order to publish its hosted COM class factories and make COM aware of them. Yet another requirement for DLL COM modules is to implement reference counting of their active objects and export a C-style function (DllCanUnloadNow) so that the COM library knows when it's safe to unload the module in question.
Microsoft realized that this is a lot to ask of C++ COM developers and introduced the Active Template Library (ATL) to help simplify writing COM server modules and objects in the C++ language. Although the majority of C++ COM developers in Windows use ATL to implement their COM servers, keep in mind that you can also write COM objects and their hosting modules without it (if you feel inclined to do so). In fact, ATL ships with the source code of its template classes, so you can study those implementation headers and see how ATL implements its functionality on top of the COM services provided by the COM library in the operating system. As you'll see after looking at that source code, ATL takes care of much of the heavy lifting and boilerplate code for writing COM servers so that you don't have to. This allows you to focus on writing the code that implements your business logic without much of the burden imposed by the necessary COM model plumbing.
Consuming COM Objects
Communication between COM clients and servers is a two-step process:
- COM activation This is the first step in the communication, where the COM library locates the class factory of the requested CLSID. This is done by first consulting the COM registry catalog to find the module that hosts the implementation of the COM server. The COM client code initiates this step with a call to either the ole32!CoCreateInstance or ole32!CoGetClassObject Win32 API call. Note that ole32!CoCreateInstance is simply a convenient wrapper that first calls ole32!CoGetClassObject to obtain the class factory, and then invokes the IClassFactory::CreateInstance method implemented by that class factory object to finally create a new instance of the target COM server.
- Method invocations After the COM activation step retrieves a proxy or direct pointer to the COM class object, the client can then query the interfaces exposed by the object and directly invoke the exposed methods.
A key point to understand when consuming COM objects in your code is the potential involvement of the COM SCM behind the scenes to instantiate the hosting module for the COM object and its corresponding class factory during the COM activation step. This is done because COM clients sometimes need to activate out-of-process servers in different contexts, such as when the COM server object needs to run in processes with higher privileges or in different user sessions, which requires the participation of a broker process that runs with higher privileges (the DComLaunch service). Another reason is that the RpcSs Windows service also handles cross-machine COM activation requests (the DCOM case) and implements the communication channel in a way that's completely transparent to both COM clients and servers. It's especially important to understand this involvement during debugging investigations of COM activation failures. Once the COM activation sequence retrieves the requested class factory, however, the COM client is then able to directly invoke COM methods published by the server class without any involvement on the part of the COM SCM. Figure 1-8 summarizes these steps and the key components involved during the COM activation sequence.
Figure 1-8 COM activation components.
The nice thing about the COM model, as mentioned earlier in this section, is that the client doesn't need to know where the COM server implementation lives, or even the language (C++, Microsoft Visual Basic, Delphi, or other) in which it was written (provided it has access to a type library or a direct virtual table layout that describes the COM types it wants to consume). The only thing that the client needs to know is the CLSID of the COM object (a GUID), after which it can query for the supported interfaces and invoke the desired methods. COM in the OS provides all the necessary “glue” for the client/server communication, provided the COM server was written to conform to the COM model. In particular, COM supports accessing the following COM server types using the same consistent programmatic model:
- In-process COM servers The hosting DLL module is loaded into the client process address space, and the object is invoked through a pointer returned by the COM activation sequence. This pointer can be either a direct virtual pointer or sometimes a proxy, depending on whether the COM runtime needs to be invoked to provide additional safeguards (such as thread safety) before invoking the methods of the COM server.
Local/remote out-of-process COM servers For local out-of-process COM servers, local RPC is used as the underlying interprocess communication protocol, with ALPC as the actual low-level foundation. For remote COM servers (DCOM), the RPC communication protocol is used for the intermachine communication. In both cases, the proxy memory pointer that is returned to the client application from the COM activation sequence takes care of everything that's required to accomplish COM's promise of transparent remoting. Figure 1-9 illustrates this aspect.
Figure 1-9 Out-of-process COM method invocations.
The CLR (.NET) Layer
Like COM, the .NET Framework is also a user-mode, object-oriented platform that enables developers to write their programs in their language of choice (C#, Microsoft Visual Basic .NET, C++/CLI, or other). However, .NET takes another leap and has those programs run under the control of an execution engine environment that provides additional useful features, such as type safety and automatic memory management using garbage collection. This execution engine environment is called the Common Language Runtime (CLR) and is in many ways the core of the .NET platform. Understanding this layer is often helpful when debugging applications built on top of the various .NET class libraries and technologies (ASP.NET, WCF, WinForms, WPF, Silverlight, and so on).
The CLR runtime is implemented as a set of native DLLs that get loaded into the address space of every .NET executable, though the core execution engine DLL decides when to load the other DLL dependencies. Because of their reliance on this execution environment, .NET modules (also called assemblies) are said to be managed, as opposed to the unmanaged native modules that execute in the regular user-mode environment. The same user-mode process can host both managed and unmanaged modules interoperating with each other, as will be explained shortly in this section.
Programs in .NET are not compiled directly into native assembly code, but rather into a platform-agnostic language called the Microsoft .NET Intermediate Language (usually referred to as MSIL, or simply IL). This IL is then lazily (methods are compiled on first use) turned into assembly instructions by a component of the execution engine called the Just-in-Time .NET compiler, or JIT.
.NET Side-by-Side Versioning
One of the issues that plagued software development in Windows prior to the introduction of the .NET Framework was the fact that new DLL versions sometimes introduced new behaviors, breaking existing software often through no fault of the application developer. There was no standard way to strongly bind applications to the version of the DLLs that they were tested against before they got released. This is known as the “DLL hell” issue. COM made the situation better by at least ensuring binary compatibility: instead of C-style exported DLL functions simply disappearing or altering their signatures from underneath their consumers (resulting in crashes!), COM servers were able to clearly version their interfaces, allowing COM clients to query the interfaces that they were tested against and providing the safety of this extra level of indirection.
The .NET Framework takes the idea of strong binding and versioning one step further by ensuring that .NET programs are always run against the version of the CLR that they were compiled to target or, alternatively, the version specified in the application's configuration file. So, new versions of the .NET Framework are installed side by side with older ones instead of replacing them. The exception to this is a small shim DLL called mscoree.dll that's installed to the system32 directory (on 64-bit Windows; a 32-bit version is also installed to the SysWow64 directory) and that always matches the newest .NET Framework version present on the machine. This works because newer versions of mscoree.dll are backward compatible with previous versions of the CLR. For example, if both .NET versions 4.0 and 2.0 are installed on the machine, the mscoree.dll module in the system32 directory will be the one that was installed with the CLR 4.0 release.
.NET Executable Programs Load Sequence
The IL assemblies produced by the various .NET compilers also follow the standard Windows PE (Portable Executable) format and are just special-case native images from the perspective of the OS loader, except they have a marker in their PE header to indicate they are managed code binaries that should be handled by the .NET CLR. When the Windows module loader code in ntdll.dll detects the existence of a CLR header in the executable PE image it is about to launch, control is immediately transferred to the native CLR entry point (mscoree!_CorExeMain), which then takes care of finding and invoking the managed IL entry point in the image.
Note that the OS module loader doesn't really know which version of the CLR should be loaded for the managed image. This is the role of the mscoree.dll native shim DLL, which determines the correct version of the CLR to load. For CLR v2 programs, for example, the execution engine DLL loaded by mscoree.dll is mscorwks.dll, while for CLR v4 the execution engine implementation resides inside the clr.dll module. Once the CLR execution engine DLL for the target version is loaded, it pretty much takes control and becomes responsible for the runtime execution environment, ensuring type safety and automatic memory management (garbage collection), invoking the JIT compiler to convert IL into native assembly code on-demand, performing security checks, hosting the native implementation for several key threading and run-time services, and so on. The CLR is also responsible for loading the managed assemblies hosting the object types referenced by the application's IL, including the core library that implements some of the most basic managed types (mscorlib.dll .NET assembly). Figure 1-10 illustrates these steps.
Figure 1-10 Loading steps for .NET executable programs.
Interoperability Between Managed and Native Code
When the first version of the .NET Framework was introduced back in early 2002, it had to be able to consume existing native code (both C-style Win32 APIs and COM objects) to ensure that the transition to managed code programming was seamless for Windows developers. Interoperability between managed and unmanaged code is possible fortunately, thanks in large part to two CLR mechanisms: P/Invoke and COM Interop. P/Invoke (the DllImport .NET method attribute) is used to invoke C-style unmanaged APIs such as those from the Win32 API set, and COM Interop (the ComImport .NET class attribute) can be used to invoke any existing classic COM object implemented in unmanaged code.
Managed/native code interoperability presents a few technical challenges, however. The biggest challenge is that the garbage collection scheme that the CLR uses for automatic memory management requires it to manage objects in the managed heap so that it is able to periodically clear defunct (nonrooted) objects, and also so that it can reduce heap fragmentation when dead objects are collected. However, when transitioning to run native code as part of the same process address space, it's often necessary to share managed memory with the native call (in the form of function parameters, for instance). Because the CLR garbage collector is free to move these managed objects around when it decides to perform a garbage collection, and because the garbage collector is completely unaware of the operations that the native code might attempt, it could end up moving the shared managed objects around, causing the native code to access invalid memory. The CLR execution engine takes care of these technical intricacies by marshaling function parameters and pinning managed objects in memory during the managed to unmanaged transitions.
Conversely, you can also consume code that you develop using .NET from your native applications using the COM Interop facilities provided by the CLR. The C/C++ native code is able to consume the types published by a .NET assembly (by means of the ComVisible .NET attribute) using their type library, in the same fashion that native COM languages are able to consume types from different languages. The .NET Framework ships with a tool called regasm.exe, which can be used to easily generate type libraries for the COM types in a .NET assembly. The .NET Framework and Windows SDKs also include a development tool, called tlbexp.exe, that's able to do the same thing.
The CLR shim DLL (mscoree.dll) again plays a key role in this reverse COM Interop scenario because it's the first native entry point to the CLR during the COM activation. This shim then loads the right CLR execution engine version, which then loads the managed COM types as they get invoked by the native code. This extends the functionality provided by the COM library in the OS without it having to know about the intricacies of managed code. During the COM activation sequence that the native application initiates, the COM library ends up invoking the standard DllGetClassObject method exported from mscoree.dll. If you used regasm.exe to generate the type library for the C# COM types, mscoree.dll also would've been added to the registry as the InProcServer32 for all the managed COM classes hosted by the .NET DLL assembly. The CLR shim DLL then forwards the call to the CLR execution engine, which takes care of implementing the class factory and standard native COM interfaces on behalf of the managed COM types.