Programming with the Kinect for Windows Software Development Kit: Displaying Kinect Data

  • 9/15/2012

The depth display manager

The second stream you need to display is the depth stream. This stream is composed of 16 bits per pixel, and each pixel in the depth stream uses 13 bits (high order) for depth data and 3 bits (lower order) to identify a player.

A depth data value of 0 indicates that no depth data is available at that position because all the objects are either too close to the camera or too far away from it.

Important

When skeleton tracking is disabled, the three bits that identify a player are set to 0.

Comparable to the ColorStreamManager class, following is the code for the DepthStreamManager class:

using System.Windows.Media.Imaging
using Microsoft.Kinect;
using System.Windows.Media;
using System.Windows;

public class DepthStreamManager : Notifier
{
    byte[] depthFrame32;

    public WriteableBitmap Bitmap { get; private set; }

    public void Update(DepthImageFrame frame)
    {
        var pixelData = new short[frame.PixelDataLength];
        frame.CopyPixelDataTo(pixelData);

        if (depthFrame32 == null)
        {
            depthFrame32 = new byte[frame.Width * frame.Height * 4];
        }

        if (Bitmap == null)
        {
            Bitmap = new WriteableBitmap(frame.Width, frame.Height,
                                    96, 96, PixelFormats.Bgra32, null);
        }

        ConvertDepthFrame(pixelData);

        int stride = Bitmap.PixelWidth * Bitmap.Format.BitsPerPixel / 8;
        Int32Rect dirtyRect = new Int32Rect(0, 0, Bitmap.PixelWidth, Bitmap.PixelHeight);

        Bitmap.WritePixels(dirtyRect, depthFrame32, stride, 0);

        RaisePropertyChanged(() => Bitmap);
    }

    void ConvertDepthFrame(short[] depthFrame16)
    {
        for (int i16 = 0, i32 = 0; i16 < depthFrame16.Length
                && i32 < depthFrame32.Length; i16 ++, i32 += 4)
        {
            int user = depthFrame16[i16] & 0x07;
            int realDepth = (depthFrame16[i16] >> 3);

            byte intensity = (byte)(255 - (255 * realDepth / 0x1fff));

            depthFrame32[i32] = 0;
            depthFrame32[i32 + 1] = 0;
            depthFrame32[i32 + 2] = 0;
            depthFrame32[i32 + 3] = 255;

            switch (user)
            {
                case 0: // no one
                    depthFrame32[i32] = (byte)(intensity / 2);
                    depthFrame32[i32 + 1] = (byte)(intensity / 2);
                    depthFrame32[i32 + 2] = (byte)(intensity / 2);
                    break;
                case 1:
                    depthFrame32[i32] = intensity;
                    break;
                case 2:
                    depthFrame32[i32 + 1] = intensity;
                    break;
                case 3:
                    depthFrame32[i32 + 2] = intensity;
                    break;
                case 4:
                    depthFrame32[i32] = intensity;
                    depthFrame32[i32 + 1] = intensity;
                    break;
                case 5:
                    depthFrame32[i32] = intensity;
                    depthFrame32[i32 + 2] = intensity;
                    break;
                case 6:
                    depthFrame32[i32 + 1] = intensity;
                    depthFrame32[i32 + 2] = intensity;
                    break;
                case 7:
                    depthFrame32[i32] = intensity;
                    depthFrame32[i32 + 1] = intensity;
                    depthFrame32[i32 + 2] = intensity;
                    break;
            }
        }
    }
}

The main method here is ConvertDepthFrame, where the potential user ID and the depth value (expressed in millimeters) are extracted:

int user = depthFrame16[i16] & 0x07;
int realDepth = (depthFrame16[i16] >> 3);
byte intensity = (byte)(255 - (255 * realDepth / 0x1fff));

As mentioned in Chapter 2, you simply have to use some bitwise operations to get the information you need out of the pixel. The user index is on the three low-order bits, so a simple mask with 00000111 in binary form or 0x07 in hexadecimal form can extract the value. To get the depth value, you can remove the first three bits by offsetting the pixel to the right with the >> operator.

The intensity is computed by computing a ratio between the maximum depth value and the current depth value. The ratio is then used to get a value between 0 and 255 because color components are expressed using bytes.

The following part of the method generates a grayscale pixel (with the intensity related to the depth), as shown in Figure 3-2. It uses a specific color if a user is detected, as shown in Figure 3-3. (The blue color shown in Figure 3-3 appears as gray to readers of the print book.)

Figure 3-2

Figure 3-2 The depth stream display without a user detected.

Figure 3-3

Figure 3-3 The depth stream display with a user detected. (A specific color is used where the user is detected, but this appears as light gray to readers of the print book.)

Of course, the near and standard modes are supported the same way by the DepthStreamManager. The only difference is that in near mode, the depth values are available from 40cm, whereas in standard mode, the depth values are only available from 80cm, as shown in Figure 3-4.

Figure 3-4

Figure 3-4 Hand depth values out of range in standard mode are shown at left, and hand depth values in range in near mode are shown at right.

To connect your DepthStreamManager class with the kinectDisplay image control, use the following code inside your kinectSensor_DepthFrameReady event:

var depthManager = new DepthStreamManager();
void kinectSensor_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{
    using (var frame = e.OpenDepthImageFrame())
    {
        if (frame == null)
            return;

        depthManager.Update(frame);
    }
}

Then add this code in your initialization event:

kinectDisplay.DataContext = depthManager;

The DepthStreamManager provides an excellent way to give users visual feedback, because they can detect when and where the Kinect sensor sees them by referring to the colors in the visual display.