Programming with the Kinect for Windows Software Development Kit: Algorithmic Gestures and Postures

9/15/2012

Contents

Defining a gesture with an algorithm
Defining a posture with an algorithm

In this chapter from Programming with the Kinect for Windows Software Development Kit, you will learn how to detect postures and gestures using an algorithmic approach.

Kinect is a wonderful tool for communicating with a computer. And one of the most obvious ways to accomplish this communication is by using gestures. A gesture is the movement of a part of your body through time, such as when you move your hand from right to left to simulate a swipe.

Posture is similar to gesture, but it includes the entire body—a posture is the relative positions of all part of your body at a given time.

Postures and gestures are used by the Kinect sensor to send orders to the computer (a specific posture can start an action, and gestures can manipulate the user interface or UI, for instance).

In this chapter, you will learn how to detect postures and gestures using an algorithmic approach. Chapter 7 will demonstrate how to use a different technique to detect more complex gestures and postures. Chapter 8 will then show you how to use gestures and postures in a real application.

Defining a gesture with an algorithm

With gestures, it is all about movement. Trying to detect a gesture can then be defined as the process of detecting a given movement.

This solution can be applied to detected linear movement, such as hand swipe from left to right, as shown in Figure 6-1.

Figure 6-1 A gesture can be as simple as a hand swipe from left to right.

The global principle behind capturing a gesture for use as input is simple: you have to capture the nth last positions of a joint and apply an algorithm to them to detect a potential gesture.

Creating a base class for gesture detection

First you must create an abstract base class for gesture detection classes. This class provides common services such as:

Capturing tracked joint position
Drawing the captured positions for debugging purposes, as shown in Figure 6-2
Providing an event for signaling detected gestures
Providing a mechanism to prevent detecting “overlapping” gestures (with a minimal delay between two gestures)

Figure 6-2 Drawing captured joint positions, shown in red (for readers of the print book, the captured joint positions are indicated by the semicircle of dots to the right of the skeleton).

To store joint positions, you must create the following class:

using System;
using System.Windows.Shapes;

namespace Kinect.Toolbox
{
    public class Entry
    {
        public DateTime Time { get; set; }
        public Vector3 Position { get; set; }
        public Ellipse DisplayEllipse { get; set; }
    }
}

This class contains the position of the joint as well as the time of capture and an ellipse to draw it.

The base class for gesture detection starts with the following declarations:

using System;
using System.Collections.Generic;
using System.Windows;
using System.Windows.Media;
using System.Windows.Shapes;
using System.Windows.Controls;
using Microsoft.Kinect;

namespace Kinect.Toolbox
{
    public abstract class GestureDetector
    {
        public int MinimalPeriodBetweenGestures { get; set; }

        readonly List<Entry> entries = new List<Entry>();

        public event Action<string> OnGestureDetected;

        DateTime lastGestureDate = DateTime.Now;

        readonly int windowSize; // Number of recorded positions

        // For drawing
        public Canvas DisplayCanvas
        {
            get;
            set;
        }

        public Color DisplayColor
        {
            get;
            set;
        }

        protected GestureDetector(int windowSize = 20)
        {
            this.windowSize = windowSize;
            MinimalPeriodBetweenGestures = 0;
            DisplayColor = Colors.Red;
        }
     }
}

This class contains a list of captured entries (Entries), a property for defining the minimal delay between two gestures (MinimalPeriodBetweenGestures), and an event for signaling detected gestures (OnGestureDetected).

If you want to debug your gestures, you can use the DisplayCanvas and DisplayColor properties to draw the current captured positions on a XAML canvas (as shown in Figure 6-2).

The complete class also provides a method to add entries:

public virtual void Add(SkeletonPoint position, KinectSensor sensor)
{
    const int WindowSize = 20;
    Entry newEntry = new Entry {Position = position.ToVector3(), Time = DateTime.Now};
    Entries.Add(newEntry); // The Entries list will be defined later as List<Entry>

    // Drawing
    if (DisplayCanvas != null)
    {
        newEntry.DisplayEllipse = new Ellipse
        {
            Width = 4,
            Height = 4,
            HorizontalAlignment = HorizontalAlignment.Left,
            VerticalAlignment = VerticalAlignment.Top,
            StrokeThickness = 2.0,
            Stroke = new SolidColorBrush(DisplayColor),
            StrokeLineJoin = PenLineJoin.Round
        };

        Vector2 vector2 = Tools.Convert(sensor, position);

        float x = (float)(vector2.X * DisplayCanvas.ActualWidth);
        float y = (float)(vector2.Y * DisplayCanvas.ActualHeight);

        Canvas.SetLeft(newEntry.DisplayEllipse, x - newEntry.DisplayEllipse.Width / 2);
        Canvas.SetTop(newEntry.DisplayEllipse, y - newEntry.DisplayEllipse.Height / 2);

        DisplayCanvas.Children.Add(newEntry.DisplayEllipse);
    }

    // Remove too old positions
    if (Entries.Count > WindowSize)
    {
        Entry entryToRemove = Entries[0];

        if (DisplayCanvas != null)
        {
            DisplayCanvas.Children.Remove(entryToRemove.DisplayEllipse);
        }

        Entries.Remove(entryToRemove);
    }

    // Look for gestures
    LookForGesture();
}

protected abstract void LookForGesture();

This method adds the new entry, possibly displays the associated ellipse, checks to make sure the number of recorded entries is not too big, and finally calls an abstract method (that must be provided by the children classes) to look for gestures.

A last method is required:

protected void RaiseGestureDetected(string gesture)
{
    // Gesture too close to the previous one?
    if (DateTime.Now.Subtract(lastGestureDate).TotalMilliseconds > MinimalPeriodBetweenGestures)
    {
        if (OnGestureDetected != null)
            OnGestureDetected(gesture);

        lastGestureDate = DateTime.Now;
    }

    Entries.ForEach(e=>
                        {
                            if (DisplayCanvas != null)
                                DisplayCanvas.Children.Remove(e.DisplayEllipse);
                        });
    Entries.Clear();
}

This method raises the event if the previous detected gesture is not too close to the current one.

The complete class is defined as follows:

using System;
using System.Collections.Generic;
using System.Windows;
using System.Windows.Media;
using System.Windows.Shapes;
using System.Windows.Controls;
using Microsoft.Kinect;

namespace Kinect.Toolbox
{
    public abstract class GestureDetector
    {
        public int MinimalPeriodBetweenGestures { get; set; }

        readonly List<Entry> entries = new List<Entry>();

        public event Action<string> OnGestureDetected;

        DateTime lastGestureDate = DateTime.Now;

        readonly int windowSize; // Number of recorded positions

        // For drawing
        public Canvas DisplayCanvas
        {
            get;
            set;
        }

        public Color DisplayColor
        {
            get;
            set;
        }

        protected GestureDetector(int windowSize = 20)
        {
            this.windowSize = windowSize;
            MinimalPeriodBetweenGestures = 0;
            DisplayColor = Colors.Red;
        }

        protected List<Entry> Entries
        {
            get { return entries; }
        }

        public int WindowSize
        {
            get { return windowSize; }
        }

        public virtual void Add(SkeletonPoint position, KinectSensor sensor)
        {
            Entry newEntry = new Entry {Position = position.ToVector3(), Time = DateTime.Now};
            Entries.Add(newEntry);

            // Drawing
            if (DisplayCanvas != null)
            {
                newEntry.DisplayEllipse = new Ellipse
                {
                    Width = 4,
                    Height = 4,
                    HorizontalAlignment = HorizontalAlignment.Left,
                    VerticalAlignment = VerticalAlignment.Top,
                    StrokeThickness = 2.0,
                    Stroke = new SolidColorBrush(DisplayColor),
                    StrokeLineJoin = PenLineJoin.Round
                };

                Vector2 vector2 = Tools.Convert(sensor, position);

                float x = (float)(vector2.X * DisplayCanvas.ActualWidth);
                float y = (float)(vector2.Y * DisplayCanvas.ActualHeight);

                Canvas.SetLeft(newEntry.DisplayEllipse, x - newEntry.DisplayEllipse.Width / 2);
                Canvas.SetTop(newEntry.DisplayEllipse, y - newEntry.DisplayEllipse.Height / 2);

                DisplayCanvas.Children.Add(newEntry.DisplayEllipse);
            }

            // Remove too old positions
            if (Entries.Count > WindowSize)
            {
                Entry entryToRemove = Entries[0];

                if (DisplayCanvas != null)
                {
                    DisplayCanvas.Children.Remove(entryToRemove.DisplayEllipse);
                }

                Entries.Remove(entryToRemove);
            }

            // Look for gestures
            LookForGesture();
        }

        protected abstract void LookForGesture();

        protected void RaiseGestureDetected(string gesture)
        {
            // Too close?
            if (DateTime.Now.Subtract(lastGestureDate).TotalMilliseconds >
MinimalPeriodBetweenGestures)
            {
                if (OnGestureDetected != null)
                    OnGestureDetected(gesture);

                lastGestureDate = DateTime.Now;
            }

            Entries.ForEach(e=>
                                {
                                    if (DisplayCanvas != null)
                                        DisplayCanvas.Children.Remove(e.DisplayEllipse);
                                });
            Entries.Clear();
        }
    }
}

Detecting linear gestures

Inheriting from the GestureDetector class, you are able to create a class that will scan the recorded positions to determine if all the points follow a given path. For example, to detect a swipe to the right, you must do the following:

Check that all points are in progression to the right (x axis).
Check that all points are not too far from the first one on the y and z axes.
Check that the first and the last points are at a good distance from each other.
Check that the first and last points were created within a given period of time.

To check these constraints, you can write the following method:

protected bool ScanPositions(Func<Vector3, Vector3, bool> heightFunction, Func<Vector3, Vector3,
bool> directionFunction,
    Func<Vector3, Vector3, bool> lengthFunction, int minTime, int maxTime)
{
    int start = 0;

    for (int index = 1; index < Entries.Count - 1; index++)
    {
        if (!heightFunction(Entries[0].Position, Entries[index].Position) ||
!directionFunction(Entries[index].Position, Entries[index + 1].Position))
        {
            start = index;
        }

        if (lengthFunction(Entries[index].Position, Entries[start].Position))
        {
            double totalMilliseconds =
(Entries[index].Time - Entries[start].Time).TotalMilliseconds;
            if (totalMilliseconds >= minTime && totalMilliseconds <= maxTime)
            {
                return true;
            }
        }
    }

    return false;
}

This method is a generic way to check all of your constraints. Using Func parameters, it browses all entries and checks to make sure they all respect the heightFunction and directionFunction. Then it checks the length with lengthFunction, and finally it checks the global duration against the range defined by minTime and maxTime.

To use this function for a hand swipe, you can call it this way:

if (ScanPositions((p1, p2) => Math.Abs(p2.Y - p1.Y) < SwipeMaximalHeight, // Height
    (p1, p2) => p2.X - p1.X > -0.01f, // Progression to right
    (p1, p2) => Math.Abs(p2.X - p1.X) > SwipeMinimalLength, // Length
    SwipeMininalDuration, SwipeMaximalDuration)) // Duration
{
    RaiseGestureDetected("SwipeToRight");
    return;
}

So the final SwipeGestureDetector looks like this:

using System;
using Microsoft.Kinect;

namespace Kinect.Toolbox
{
    public class SwipeGestureDetector : GestureDetector
    {
        public float SwipeMinimalLength {get;set;}
        public float SwipeMaximalHeight {get;set;}
        public int SwipeMininalDuration {get;set;}
        public int SwipeMaximalDuration {get;set;}

        public SwipeGestureDetector(int windowSize = 20)
            : base(windowSize)
        {
            SwipeMinimalLength = 0.4f;
            SwipeMaximalHeight = 0.2f;
            SwipeMininalDuration = 250;
            SwipeMaximalDuration = 1500;
        }

        protected bool ScanPositions(Func<Vector3, Vector3, bool> heightFunction,
Func<Vector3, Vector3, bool> directionFunction,
            Func<Vector3, Vector3, bool> lengthFunction, int minTime, int maxTime)
        {
            int start = 0;

            for (int index = 1; index < Entries.Count - 1; index++)
            {
                if (!heightFunction(Entries[0].Position, Entries[index].Position) ||
!directionFunction(Entries[index].Position, Entries[index + 1].Position))
                {
                    start = index;
                }

                if (lengthFunction(Entries[index].Position, Entries[start].Position))
                {
                    double totalMilliseconds =
(Entries[index].Time - Entries[start].Time).TotalMilliseconds;
                    if (totalMilliseconds >= minTime && totalMilliseconds <= maxTime)
                    {
                        return true;
                    }
                }
            }

            return false;
        }

        protected override void LookForGesture()
        {
            // Swipe to right
            if (ScanPositions((p1, p2) => Math.Abs(p2.Y - p1.Y) < SwipeMaximalHeight, // Height
                (p1, p2) => p2.X - p1.X > -0.01f, // Progression to right
                (p1, p2) => Math.Abs(p2.X - p1.X) > SwipeMinimalLength, // Length
                SwipeMininalDuration, SwipeMaximalDuration)) // Duration
            {
                RaiseGestureDetected("SwipeToRight");
                return;
            }

            // Swipe to left
            if (ScanPositions((p1, p2) => Math.Abs(p2.Y - p1.Y) < SwipeMaximalHeight,  // Height
                (p1, p2) => p2.X - p1.X < 0.01f, // Progression to right
                (p1, p2) => Math.Abs(p2.X - p1.X) > SwipeMinimalLength, // Length
                SwipeMininalDuration, SwipeMaximalDuration))// Duration
            {
                RaiseGestureDetected("SwipeToLeft");
                return;
            }
        }
    }
}

Save to your account