The Managed Heap and Garbage Collection in the CLR

By Jeffrey Richter
11/15/2012

Contents

Managed Heap Basics
Generations: Improving Performance
Working with Types Requiring Special Cleanup
Monitoring and Controlling the Lifetime of Objects Manually

Monitoring and Controlling the Lifetime of Objects Manually

The CLR provides each AppDomain with a GC handle table. This table allows an application to monitor the lifetime of an object or manually control the lifetime of an object. When an AppDomain is created, the table is empty. Each entry on the table consists of a reference to an object on the managed heap and a flag indicating how you want to monitor or control the object. An application adds and removes entries from the table via the System.Runtime.InteropServices.GCHandle type, as follows.

// This type is defined in the System.Runtime.InteropServices namespace
public struct GCHandle {
   // Static methods that create an entry in the table
   public static GCHandle Alloc(object value);
   public static GCHandle Alloc(object value, GCHandleType type);

   // Static methods that convert a GCHandle to an IntPtr
   public static explicit operator IntPtr(GCHandle value);
   public static IntPtr ToIntPtr(GCHandle value);

   // Static methods that convert an IntPtr to a GCHandle
   public static explicit operator GCHandle(IntPtr value);
   public static GCHandle FromIntPtr(IntPtr value);

   // Static methods that compare two GCHandles
   public static Boolean operator ==(GCHandle a, GCHandle b);
   public static Boolean operator !=(GCHandle a, GCHandle b);

   // Instance method to free the entry in the table (index is set to 0)
   public void Free();

   // Instance property to get/set the entry's object reference
   public object Target { get; set; }

   // Instance property that returns true if index is not 0
   public Boolean IsAllocated { get; }

   // For a pinned entry, this returns the address of the object
   public IntPtr AddrOfPinnedObject();

   public override Int32 GetHashCode();
   public override Boolean Equals(object o);
}

Basically, to control or monitor an object’s lifetime, you call GCHandle’s static Alloc method, passing a reference to the object that you want to monitor/control, and a GCHandleType, which is a flag indicating how you want to monitor/control the object. The GCHandleType type is an enumerated type defined as follows.

public enum GCHandleType {
   Weak = 0,                  // Used for monitoring an object's existence
   WeakTrackResurrection = 1, // Used for monitoring an object's existence
   Normal = 2,                // Used for controlling an object's lifetime
   Pinned = 3                 // Used for controlling an object's lifetime
}

Now, here’s what each flag means:

Weak This flag allows you to monitor the lifetime of an object. Specifically, you can detect when the garbage collector has determined this object to be unreachable from application code. Note that the object’s Finalize method may or may not have executed yet and therefore, the object may still be in memory.
WeakTrackResurrection This flag allows you to monitor the lifetime of an object. Specifically, you can detect when the garbage collector has determined that this object is unreachable from application code. Note that the object’s Finalize method (if it exists) has definitely executed, and the object’s memory has been reclaimed.
Normal This flag allows you to control the lifetime of an object. Specifically, you are telling the garbage collector that this object must remain in memory even though there may be no roots in the application that refer to this object. When a garbage collection runs, the memory for this object can be compacted (moved). The Alloc method that doesn’t take a GCHandleType flag assumes that GCHandleType.Normal is specified.
Pinned This flag allows you to control the lifetime of an object. Specifically, you are telling the garbage collector that this object must remain in memory even though there might be no roots in the application that refer to this object. When a garbage collection runs, the memory for this object cannot be compacted. This is typically useful when you want to hand the address of the memory out to native code. The native code can write to this memory in the managed heap knowing that a GC will not move the object.

When you call GCHandle’s static Alloc method, it scans the AppDomain’s GC handle table, looking for an available entry where the reference of the object you passed to Alloc is stored, and a flag is set to whatever you passed for the GCHandleType argument. Then, Alloc returns a GCHandle instance back to you. A GCHandle is a lightweight value type that contains a single instance field, an IntPtr, which refers to the index of the entry in the table. When you want to free this entry in the GC handle table, you take the GCHandle instance and call the Free method (which also invalidates the GCHandle instance by setting its IntPtr field to zero).

Here’s how the garbage collector uses the GC handle table. When a garbage collection occurs:

The garbage collector marks all of the reachable objects (as described at the beginning of this chapter). Then, the garbage collector scans the GC handle table; all Normal or Pinned objects are considered roots, and these objects are marked as well (including any objects that these objects refer to via their fields).
The garbage collector scans the GC handle table looking for all of the Weak entries. If a Weak entry refers to an object that isn’t marked, the reference identifies an unreachable object (garbage), and the entry has its reference value changed to null.
The garbage collector scans the finalization list. If a reference in the list refers to an unmarked object, the reference identifies an unreachable object, and the reference is moved from the finalization list to the freachable queue. At this point, the object is marked because the object is now considered reachable.
The garbage collector scans the GC handle table looking for all of the WeakTrackResurrection entries. If a WeakTrackResurrection entry refers to an object that isn’t marked (which now is an object referenced by an entry in the freachable queue), the reference identifies an unreachable object (garbage), and the entry has its reference value changed to null.
The garbage collector compacts the memory, squeezing out the holes left by the unreachable objects. Pinned objects are not compacted (moved); the garbage collector will move other objects around them.

Now that you have an understanding of the mechanism, let’s take a look at when you’d use them. The easiest flags to understand are the Normal and Pinned flags, so let’s start with these two. Both of these flags are typically used when interoperating with native code.

The Normal flag is used when you need to hand a pointer to a managed object to native code because, at some point in the future, the native code is going to call back into managed code, passing it the pointer. You can’t actually pass a pointer to a managed object out to native code, because if a garbage collection occurs, the object could move in memory, invalidating the pointer. So to work around this, you would call GCHandle’s Alloc method, passing in a reference to the object and the Normal flag. Then you’d cast the returned GCHandle instance to an IntPtr and pass the IntPtr into the native code. When the native code calls back into managed code, the managed code would cast the passed IntPtr back to a GCHandle and then query the Target property to get the reference (or current address) of the managed object. When the native code no longer needs the reference, you’d call GCHandle’s Free method, which allows a future garbage collection to free the object (assuming no other root exists to this object).

Notice that in this scenario, the native code is not actually using the managed object itself; the native code wants a way just to reference the object. In some scenarios, the native code needs to actually use the managed object. In these scenarios, the managed object must be pinned. Pinning prevents the garbage collector from moving/compacting the object. A common example is when you want to pass a managed String object to a Win32 function. In this case, the String object must be pinned because you can’t pass the reference of a managed object to native code and then have the garbage collector move the object in memory. If the String object were moved, the native code would either be reading or writing to memory that no longer contained the String object’s characters—this will surely cause the application to run unpredictably.

When you use the CLR’s P/Invoke mechanism to call a method, the CLR pins the arguments for you automatically and unpins them when the native method returns. So, in most cases, you never have to use the GCHandle type to explicitly pin any managed objects yourself. You do have to use the GCHandle type explicitly when you need to pass the pointer to a managed object to native code; then the native function returns, but native code might still need to use the object later. The most common example of this is when performing asynchronous I/O operations.

Let’s say that you allocate a byte array that should be filled as data comes in from a socket. Then, you would call GCHandle’s Alloc method, passing in a reference to the array object and the Pinned flag. Then, using the returned GCHandle instance, you call the AddrOfPinnedObject method. This returns an IntPtr that is the actual address of the pinned object in the managed heap; you’d then pass this address into the native function, which will return back to managed code immediately. While the data is coming from the socket, this byte array buffer should not move in memory; preventing this buffer from moving is accomplished by using the Pinned flag. When the asynchronous I/O operation has completed, you’d call GCHandle’s Free method, which will allow a future garbage collection to move the buffer. Your managed code should still have a reference to the buffer so that you can access the data, and this reference will prevent a garbage collection from freeing the buffer from memory completely.

It is also worth mentioning that C# offers a fixed statement that effectively pins an object over a block of code. Here is some code that demonstrates its use.

unsafe public static void Go() {
   // Allocate a bunch of objects that immediately become garbage
   for (Int32 x = 0; x < 10000; x++) new Object();

   IntPtr  originalMemoryAddress;
   Byte[] bytes = new Byte[1000];   // Allocate this array after the garbage objects

   // Get the address in memory of the Byte[]
   fixed (Byte* pbytes = bytes) { originalMemoryAddress = (IntPtr) pbytes; }

   // Force a collection; the garbage objects will go away & the Byte[] might be compacted
   GC.Collect();

   // Get the address in memory of the Byte[] now & compare it to the first address
   fixed (Byte* pbytes = bytes) {
      Console.WriteLine("The Byte[] did{0} move during the GC",
         (originalMemoryAddress == (IntPtr) pbytes) ? " not" : null);
   }
}

Using C#’s fixed statement is more efficient than allocating a pinned GC handle. What happens is that the C# compiler emits a special “pinned” flag on the pbytes local variable. During a garbage collection, the GC examines the contents of this root, and if the root is not null, it knows not to move the object referred to by the variable during the compaction phase. The C# compiler emits IL to initialize the pbytes local variable to the address of the object at the start of a fixed block, and the compiler emits an IL instruction to set the pbytes local variable back to null at the end of the fixed block so that the variable doesn’t refer to any object, allowing the object to move when the next garbage collection occurs.

Now, let’s talk about the next two flags, Weak and WeakTrackResurrection. These two flags can be used in scenarios when interoperating with native code, but they can also be used in scenarios that use only managed code. The Weak flag lets you know when an object has been determined to be garbage but the object’s memory is not guaranteed to be reclaimed yet. The WeakTrackResurrection flag lets you know when an object’s memory has been reclaimed. Of the two flags, the Weak flag is much more commonly used than the WeakTrackResurrection flag. In fact, I’ve never seen anyone use the WeakTrackResurrection flag in a real application.

Let’s say that Object-A periodically calls a method on Object-B. However, the fact that Object-A has a reference to Object-B forbids Object-B from being garbage collected, and in some rare scenarios, this may not be desired; instead, we might want Object-A to call Object-B’s method if Object-B is still alive in the managed heap. To accomplish this scenario, Object-A would call GCHandle’s Alloc method, passing in the reference to Object-B and the Weak flag. Object-A would now just save the returned GCHandle instance instead of the reference to Object-B.

At this point, Object-B can be garbage collected if no other roots are keeping it alive. When Object-A wants to call Object-B’s method, it would query GCHandle’s read-only Target property. If this property returns a non-null value, then Object-B is still alive. Object-A’s code would then cast the returned reference to Object-B’s type and call the method. If the Target property returns null, then Object-B has been collected (but not necessarily finalized) and Object-A would not attempt to call the method. At this point, Object-A’s code would probably also call GCHandle’s Free method to relinquish the GCHandle instance.

Because working with the GCHandle type can be a bit cumbersome and because it requires elevated security to keep or pin an object in memory, the System namespace includes a WeakReference<T> class to help you.

public sealed class WeakReference<T> : ISerializable where T : class {
   public WeakReference(T target);
   public WeakReference(T target, Boolean trackResurrection);
   public void SetTarget(T target);
   public Boolean TryGetTarget(out T target);
}

This class is really just an object-oriented wrapper around a GCHandle instance: logically, its constructor calls GCHandle’s Alloc, its TryGetTarget method queries GCHandle’s Target property, its SetTarget method sets GCHandle’s Target property, and its Finalize method (not shown in the preceding code, because it’s protected) calls GCHandle’s Free method. In addition, no special permissions are required for code to use the WeakReference<T> class because the class supports only weak references; it doesn’t support the behavior provided by GCHandle instances allocated with a GCHandleType of Normal or Pinned. The downside of the WeakReference<T> class is that an instance of it must be allocated on the heap. So the WeakReference<T> class is a heavier-weight object than a GCHandle instance.

Important

When developers start learning about weak references, they immediately start thinking that they are useful in caching scenarios. For example, they think it would be cool to construct a bunch of objects that contain a lot of data and then to create weak references to these objects. When the program needs the data, the program checks the weak reference to see if the object that contains the data is still around, and if it is, the program just uses it; the program experiences high performance. However, if a garbage collection occurred, the objects that contained the data would be destroyed, and when the program has to re-create the data, the program experiences lower performance.

The problem with this technique is the following: garbage collections do not only occur when memory is full or close to full. Instead, garbage collections occur whenever generation 0 is full. So objects are being tossed out of memory much more frequently than desired, and your application’s performance suffers greatly.

Weak references can be used quite effectively in caching scenarios, but building a good cache algorithm that finds the right balance between memory consumption and speed is very complex. Basically, you want your cache to keep strong references to all of your objects and then, when you see that memory is getting tight, you start turning strong references into weak references. Currently, the CLR offers no mechanism to notify an application that memory is getting tight. But some people have had much success by periodically calling the Win32 GlobalMemoryStatusEx function and checking the returned MEMORYSTATUSEX structure’s dwMemoryLoad member. If this member reports a value above 80, memory is getting tight, and you can start converting strong references to weak references based on whether you want a least-recently used algorithm, a most-frequently used algorithm, a time-base algorithm, or whatever.

Developers frequently want to associate a piece of data with another entity. For example, you can associate data with a thread or with an AppDomain. It is also possible to associate data with an individual object by using the System.Runtime.CompilerServices.ConditionalWeakTable<TKey,TValue> class, which looks like this.

public sealed class ConditionalWeakTable<TKey, TValue>
   where TKey : class where TValue : class {
   public ConditionalWeakTable();
   public void    Add(TKey key, TValue value);
   public TValue  GetValue(TKey key, CreateValueCallback<TKey, TValue> createValueCa
llback);
   public Boolean TryGetValue(TKey key, out TValue value);
   public TValue  GetOrCreateValue(TKey key);
   public Boolean Remove(TKey key);

   public delegate TValue CreateValueCallback(TKey key);  // Nested delegate definition
}

If you want to associate some arbitrary data with one or more objects, you would first create an instance of this class. Then, call the Add method, passing in a reference to some object for the key parameter and the data you want to associate with the object in the value parameter. If you attempt to add a reference to the same object more than once, the Add method throws an ArgumentException; to change the value associated with an object, you must remove the key and then add it back in with the new value. Note that this class is thread-safe so multiple threads can use it concurrently, although this means that the performance of the class is not stellar; you should test the performance of this class to see how well it works for your scenario.

Of course, a table object internally stores a WeakReference to the object passed in as the key; this ensures that the table doesn’t forcibly keep the object alive. But what makes the ConditionalWeakTable class so special is that it guarantees that the value remains in memory as long as the object identified by the key is in memory. So this is more than a normal WeakReference because if it were, the value could be garbage collected even though the key object continued to live. The ConditionalWeakTable class could be used to implement the dependency property mechanism used by XAML. It can also be used internally by dynamic languages to dynamically associate data with objects.

Here is some code that demonstrates the use of the ConditionalWeakTable class. It allows you to call the GCWatch extension method on any object passing in some String tag. Then it notifies you via the console window whenever that particular object gets garbage collected.

internal static class ConditionalWeakTableDemo {
   public static void Main() {
      Object o = new Object().GCWatch("My Object created at " + DateTime.Now);
      GC.Collect();     // We will not see the GC notification here
      GC.KeepAlive(o);  // Make sure the object o refers to lives up to here
      o = null;         // The object that o refers to can die now

      GC.Collect();     // We'll see the GC notification sometime after this line
      Console.ReadLine();
   }
}

internal static class GCWatcher {
   // NOTE: Be careful with Strings due to interning and MarshalByRefObject proxy objects
   private readonly static ConditionalWeakTable<Object, NotifyWhenGCd<String>>
 s_cwt =
      new ConditionalWeakTable<Object, NotifyWhenGCd<String>>();

   private sealed class NotifyWhenGCd<T> {
      private readonly T m_value;

      internal NotifyWhenGCd(T value) { m_value = value; }
      public override string ToString() { return m_value.ToString(); }
      ~NotifyWhenGCd() { Console.WriteLine("GC'd: " + m_value); }
   }

   public static T GCWatch<T>(this T @object, String tag) where T : class {
      s_cwt.Add(@object, new NotifyWhenGCd<String>(tag));
      return @object;
   }
}

Save to your account