NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to recognize human skeleton with Kinect V2

2016.07.16: created by

To Table of Contents

Prerequisite knowledge

NtKinect: How to get RGB camera image with Kinect V2 (Fundamental Settings)

Recognizing Human Skeleton

To recognize the skeleton is to obtain the joint positions of a human. The joint type is "Joint", which is defined in Kinect for Windows SDK 2.0 as follows.

NtKinect

Definition of Joint in Kinect V2 SDK

Data concerning joint is represented by "Joint" type. "Joint" type is a struct containing 3 member variables.

Joint_Type : Type of joint,
Position : 3D coordinates representing the position of the joint,
TrackingState : A value indicating the joint tracking state ("Tracked", "Inferred", "Not Tracked")

A set of Joints data becomes human skeleton information.

Quoted from Kinect.h of Kinect for Windows SDK 2.0
enum _JointType { JointType_SpineBase= 0, JointType_SpineMid= 1, JointType_Neck= 2, JointType_Head= 3, JointType_ShoulderLeft= 4, JointType_ElbowLeft= 5, JointType_WristLeft= 6, JointType_HandLeft= 7, JointType_ShoulderRight= 8, JointType_ElbowRight= 9, JointType_WristRight= 10, JointType_HandRight= 11, JointType_HipLeft= 12, JointType_KneeLeft= 13, JointType_AnkleLeft= 14, JointType_FootLeft= 15, JointType_HipRight= 16, JointType_KneeRight= 17, JointType_AnkleRight= 18, JointType_FootRight= 19, JointType_SpineShoulder= 20, JointType_HandTipLeft= 21, JointType_ThumbLeft= 22, JointType_HandTipRight= 23, JointType_ThumbRight= 24, JointType_Count= ( JointType_ThumbRight + 1 ) }; enum _TrackingState { TrackingState_NotTracked= 0, TrackingState_Inferred= 1, TrackingState_Tracked= 2 }; typedef struct _Joint { JointType JointType; CameraSpacePoint Position; TrackingState TrackingState; } Joint;

Quoted from Kinect.h of Kinect for Windows SDK 2.0

enum _JointType {
    JointType_SpineBase= 0,
    JointType_SpineMid= 1,
    JointType_Neck= 2,
    JointType_Head= 3,
    JointType_ShoulderLeft= 4,
    JointType_ElbowLeft= 5,
    JointType_WristLeft= 6,
    JointType_HandLeft= 7,
    JointType_ShoulderRight= 8,
    JointType_ElbowRight= 9,
    JointType_WristRight= 10,
    JointType_HandRight= 11,
    JointType_HipLeft= 12,
    JointType_KneeLeft= 13,
    JointType_AnkleLeft= 14,
    JointType_FootLeft= 15,
    JointType_HipRight= 16,
    JointType_KneeRight= 17,
    JointType_AnkleRight= 18,
    JointType_FootRight= 19,
    JointType_SpineShoulder= 20,
    JointType_HandTipLeft= 21,
    JointType_ThumbLeft= 22,
    JointType_HandTipRight= 23,
    JointType_ThumbRight= 24,
    JointType_Count= ( JointType_ThumbRight + 1 ) 
};

enum _TrackingState {
    TrackingState_NotTracked= 0,
    TrackingState_Inferred= 1,
    TrackingState_Tracked= 2
};

typedef struct _Joint {
    JointType JointType;
    CameraSpacePoint Position;
    TrackingState TrackingState;
} Joint;

NtKinect

NtKinect's functions for skeleton

type of return value

function name

descriptions

void

setSkeleton()

Recognize skeleton and set the data to the following member variables.

type	variable name	descriptions
vector<vector<Joint>>	skeleton	skeleton information
vector<int>	skeletonId	bodyIndex of skeleton
vector<UINT64>	skeletonTrackingId	trackingId of skeleton

NtKinect

NtKinect's member variables for skeleton

type	variable name	descriptions
vector<vector<Joint>>	skeleton	Vector of skeleton information. The set of a human joints is vector<Joint> , and this is the skeleton information. In order to handle multiple people, it is a vector of skeleton information, that is, vector<vector<Joint>> . The coordinates of the joint are positions in the CameraSpace coordinate system. number of recognized people --- sekeleton.size() skeleton information of index -th human --- sekeleton[index ] number of joint information of index -th human --- sekeleton[index ].size() jointType joint information of index -th human --- sekeleton[index ][jointType ] coordinates of the jointType joint of index -th human --- sekeleton[index ][jointType ].Position tracking state of jointType joint of index -th human --- sekeleton[index ][jointType ].TrackingState Since the relation skelton[index ][jointType ].JointType == jointType holds, you can directly access the information of a specified joint of a specified skeleton.
vector<int>	skeletonId	A vector of bodyIndex corresponding to the skeletons. The bodyIndex of skeleton[index ] is held at skeletonId[index ].
vector<UINT64>	skeletonTrackingId	A vector of trackingId corresponding to the skeletons. The trackingId fo skeleton[index ] is held at skeletonTrackingId[index ].

NtKinect

3 types of coordinate system of Kinect V2

Since the position and resolution of each sensor is different, the data is obtained as a value expressed in the coordinate system of each sensor. When using data obtained from different sensors at the same time, it is necessary to convert the coordinates to match.

Kinect V2 has 3 coordinate systems, ColorSpace, DepthSpace, and CameraSpace. There are 3 data types ColorSpacePoint, DepthSpacePoint, and CameraSpacePoint representing coordinates in each coordinate system.

Quoted from Kinect.h of Kinect for Windows SDK 2.0
typedef struct _ColorSpacePoint { float X; float Y; } ColorSpacePoint; typedef struct _DepthSpacePoint { float X; float Y; } DepthSpacePoint; typedef struct _CameraSpacePoint { float X; float Y; float Z; } CameraSpacePoint;

Quoted from Kinect.h of Kinect for Windows SDK 2.0

typedef struct _ColorSpacePoint {
    float X;
    float Y;
} ColorSpacePoint;

typedef struct _DepthSpacePoint {
    float X;
    float Y;
} DepthSpacePoint;

typedef struct _CameraSpacePoint {
    float X;
    float Y;
    float Z;
} CameraSpacePoint;

NtKinect

Coordinate systems and data types of Kinect V2

For the RGB image, Depth image, and skeleton information, the coordinate system is different. The coordinate system of the RGB image is ColorSpace, that of the Depth image is DepthSpace, and that of the skeleton information is CameraSpace.

Coordinate system	type of coordinates	Captured Data
ColorSpace	ColorSpacePoint	RGB image
DepthSpace	DepthSpacePoint	depth image, bodyIndex image, infrared image
CameraSpace	CameraSpacePoint	skeleton information

CameraSpace coordinate system representing skeleton position
The CameraSpace is a 3-dimensional coordinate system with the following features. Kinect V2 is located at the origin of the coordinate system. The direction of the camera lense is the positive direction of the z-axis. Vertical upward direction is positive direction of y-axis. Right-handed. That is, in all 3 types of coordinate systems, CameraSpace, ColorSpace, and DepthSpace, "the horizontal direction from left to right seen from the user facing Kienct V2" is the positive direction of the x-axis. I think you may understand "Data is aquired and displayed as if the image reflected in the mirror is seen from the user facing Kinect V2". (2016/11/12 figure changed, and description added).

CameraSpace coordinate system representing skeleton position

The CameraSpace is a 3-dimensional coordinate system with the following features.

Kinect V2 is located at the origin of the coordinate system.
The direction of the camera lense is the positive direction of the z-axis.
Vertical upward direction is positive direction of y-axis.
Right-handed.

That is, in all 3 types of coordinate systems, CameraSpace, ColorSpace, and DepthSpace, "the horizontal direction from left to right seen from the user facing Kienct V2" is the positive direction of the x-axis. I think you may understand "Data is aquired and displayed as if the image reflected in the mirror is seen from the user facing Kinect V2".
(2016/11/12 figure changed, and description added).

NtKinect

Kinect V2's function for mapping coordinate systems

"Coordinate system conversion function" held by ICoordinateMapper class of Kinect V2 is as follows.

type of return value	function name	descriptions
HRESULT	MapCameraPointToColorSpace( CameraSpacePoint sp , ColorSpacePoint *cp )	Convert the coordinates sp in the CameraSpace to the coordinates cp in the ColorSpace. Return value is S_OK or error code.
HRESULT	MapCameraPointToDepthSpace( CameraSpacePoint sp , DelpthSpacePoint *dp )	Convert the coordinates sp in the CameraSpace to the coordinates dp in DepthSpace. Return value is S_OK or error code.
HRESULT	MapDepthPointToColorSpace( DepthSpacePoint dp , UINT16 depth , ColorSpacePoint *cp )	Convert the coordinates dp in DepthSpace and distance depth to the coordinates cp in ColorSpace. Return value is S_OK or error code.
HRESULT	MapDepthPointToCameraSpace( DepthSpacePoint dp , UINT16 depth , CameraSpacePoint *sp )	Convert the coordinates dp in DepthSpace and distance depth to the coordinates sp in CameraSpace. Return value is S_OK or error code.

NtKinect

NtKinect's member variable for mapping coordinate system

An instance of ICoordinateMapper class used for mapping coordinate systems in Kinect V2 is held in NtKinect's member variable "coordinateMapper".

type	variable name	descriptions
CComPtr<ICoordinateMapper>	coordinateMapper	An instance of ICoordinateMapper used for mapping coordinate systems.

How to write program

Start using the Visual Studio's project KinectV2.zip of "NtKinect: How to get RGB camera image with Kinect V2 (Fundamental Settings)"
Change the contents of "main.cpp" as follows.

Call kinect.setSkeleton() function to get skeleton information to kinect.skeleton. Since the joint positions are expressed in the CameraSpace coordinate system, we convert them to the coordinates of ColorSpace and draw rectangles on the RGB image. (in the green letter part)。

Since Joint type data has meaning only when the value of its "TrackingState" member variable is "TrackingState_Tracked" or "TrackingState_Inferred", if it is "TrackingState_NotTracked" , drawing rectangle for that joint is skipped.

main.cpp

#include <iostream>
#include <sstream>

#include "NtKinect.h"

using namespace std;

void doJob() {
  NtKinect kinect;
  while (1) {
    kinect.setRGB();
    kinect.setSkeleton();
    for (auto person : kinect.skeleton) {
      for (auto joint : person) {
        if (joint.TrackingState == TrackingState_NotTracked) continue;
        ColorSpacePoint cp;
        kinect.coordinateMapper->MapCameraPointToColorSpace(joint.Position,&cp);
        cv::rectangle(kinect.rgbImage, cv::Rect((int)cp.X-5, (int)cp.Y-5,10,10), cv::Scalar(0,0,255),2);
      }
    }
    cv::imshow("rgb", kinect.rgbImage);
    auto key = cv::waitKey(1);
    if (key == 'q') break;
  }
  cv::destroyAllWindows();
}

int main(int argc, char** argv) {
  try {
    doJob();
  } catch (exception &ex) {
    cout << ex.what() << endl;
    string s;
    cin >> s;
  }
  return 0;
}

When you run the program, RGB images are displayed. Exit with 'q' key.

On the RGB image, the recognized joints are indicated by small red rectangles. Note that it is usual to express colors in BGR order in OpenCV, so cv::Scalar(0,0,255) means (blue, green, red) = (0, 0, 255), that is, red.

Please click here for this sample project KinectV2_skeleton.zip.

Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.

http://nw.tsuda.ac.jp/