NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to recognize human face with Kinect V2 in DepthSpace coordinate system

2016.08.12: created by

topics for NtKinect.h version 1.3 or later.
To Table of Contents

Prerequisite knowledge

NtKinect: How to recognize human face with Kinect V2 in ColorSpace coordinate system
NtKinect: How to recognize human face with Kinect V2 in ColorSpace coordinate system (2)

Recognizing Human Face in DepthSpace coordinate system

In NtKinect version or earlier, the position of the face parts could only be acquired as coordinates in the ColorSpace. However, in version1.3 or later, it can be acquired as coordinates in the DepthSpace.

NtKinect

NtKinect's Functions for Face Recognition

type of return value

function name

descriptions

void

setFace()

version1.2 or earlier.
After calling setSkeleton(), this function can be called to recognize human face.
Values are set to the next member variables.

type	variable name	descriptions
vector<vector<PointF>>	facePoint	Face part position
vector<cv::Rect>	faceRect	BoundingBox of face
vector<cv::Vec3f>	faceDirection	Face direction
vector<vector<DetectionResult>>	faceProperty	Face states

void

setFace(bool isColorSpace = true)

version1.3 or later.
After calling setSkeleton(), this function can be called to recognize human face.
Values are set to the next member variables.

type	variable name	descriptions
vector<vector<PointF>>	facePoint	Face part position
vector<cv::Rect>	faceRect	BoundingBox of face
vector<cv::Vec3f>	faceDirection	Face direction
vector<vector<DetectionResult>>	faceProperty	Face states

Calling this function with no argument or "true" as the first argument, the position in the ColorSpace coordinate system is set in the variable.

NtKinect

NtKinect's member variable for Face Recognition

type	variable name	descriptions
vector<vector<PointF>>	facePoint	Face part positions. The position of one person's "left eye, right eye, nose, left end of mouth, right end of mouse" is represented with vector<PointF> . To handle multiple people's data, the type is vector<vector<PointF>> . (version1.2 or earlier) coordinates in ColorSpace. (version1.3 or after) coordinates in ColorSpace or DepthSpace.
vector<cv::Rect>	faceRect	Vector of BoundingBox of face. (version 1.2 or earlier) coordinates in ColorSpace. (version1.3 or after) coordinates in ColorSpace or DepthSpace.
vector<cv::Vec3f>	faceDirection	Vector of face direction (pitch, yaw, roll).
vector<vector<DetectionResult>>	faceProperty	Face States. The state of one person's "happy, engaged, wearing glases, left eye closed, right eye closed, mouth open, mouth moved, looking away" is the vector<DetectionResult> . To handle multiple people, the data type is vector<vector<DetectionResult>> .
vector<UINT64>	faceTrackingId	version 1.4 or later. Vector of trackingId. The trackingId corresponding to face information faceRect[index ] is faceTrackingId[index ].

NtKinect

3 types of coordinate system of Kinect V2

Since the position and resolution of each sensor is different, the data is obtained as a value expressed in the coordinate system of each sensor. When using data obtained from different sensors at the same time, it is necessary to convert the coordinates to match.

Kinect V2 has 3 coordinate systems, ColorSpace, DepthSpace, and CameraSpace. There are 3 data types ColorSpacePoint, DepthSpacePoint, and CameraSpacePoint representing coordinates in each coordinate system.

Quoted from Kinect.h of Kinect for Windows SDK 2.0
typedef struct _ColorSpacePoint { float X; float Y; } ColorSpacePoint; typedef struct _DepthSpacePoint { float X; float Y; } DepthSpacePoint; typedef struct _CameraSpacePoint { float X; float Y; float Z; } CameraSpacePoint;

Quoted from Kinect.h of Kinect for Windows SDK 2.0

typedef struct _ColorSpacePoint {
    float X;
    float Y;
} ColorSpacePoint;

typedef struct _DepthSpacePoint {
    float X;
    float Y;
} DepthSpacePoint;

typedef struct _CameraSpacePoint {
    float X;
    float Y;
    float Z;
} CameraSpacePoint;

NtKinect

Coordinate systems and data types of Kinect V2

For the RGB image, Depth image, and skeleton information, the coordinate system is different. The coordinate system of the RGB image is ColorSpace, that of the Depth image is DepthSpace, and that of the skeleton information is CameraSpace.

Coordinate system	type of coordinates	Captured Data
ColorSpace	ColorSpacePoint	RGB image
DepthSpace	DepthSpacePoint	depth image, bodyIndex image, infrared image
CameraSpace	CameraSpacePoint	skeleton information

CameraSpace coordinate system representing skeleton position
The CameraSpace is a 3-dimensional coordinate system with the following features. Kinect V2 is located at the origin of the coordinate system. The direction of the camera lense is the positive direction of the z-axis. Vertical upward direction is positive direction of y-axis. Right-handed. That is, in all 3 types of coordinate systems, CameraSpace, ColorSpace, and DepthSpace, "the horizontal direction from left to right seen from the user facing Kienct V2" is the positive direction of the x-axis. I think you may understand "Data is aquired and displayed as if the image reflected in the mirror is seen from the user facing Kinect V2". (2016/11/12 figure changed, and description added).

CameraSpace coordinate system representing skeleton position

The CameraSpace is a 3-dimensional coordinate system with the following features.

Kinect V2 is located at the origin of the coordinate system.
The direction of the camera lense is the positive direction of the z-axis.
Vertical upward direction is positive direction of y-axis.
Right-handed.

That is, in all 3 types of coordinate systems, CameraSpace, ColorSpace, and DepthSpace, "the horizontal direction from left to right seen from the user facing Kienct V2" is the positive direction of the x-axis. I think you may understand "Data is aquired and displayed as if the image reflected in the mirror is seen from the user facing Kinect V2".
(2016/11/12 figure changed, and description added).

NtKinect

Kinect V2's function for mapping coordinate systems

"Coordinate system conversion function" held by ICoordinateMapper class of Kinect V2 is as follows.

type of return value	function name	descriptions
HRESULT	MapCameraPointToColorSpace( CameraSpacePoint sp , ColorSpacePoint *cp )	Convert the coordinates sp in the CameraSpace to the coordinates cp in the ColorSpace. Return value is S_OK or error code.
HRESULT	MapCameraPointToDepthSpace( CameraSpacePoint sp , DelpthSpacePoint *dp )	Convert the coordinates sp in the CameraSpace to the coordinates dp in DepthSpace. Return value is S_OK or error code.
HRESULT	MapDepthPointToColorSpace( DepthSpacePoint dp , UINT16 depth , ColorSpacePoint *cp )	Convert the coordinates dp in DepthSpace and distance depth to the coordinates cp in ColorSpace. Return value is S_OK or error code.
HRESULT	MapDepthPointToCameraSpace( DepthSpacePoint dp , UINT16 depth , CameraSpacePoint *sp )	Convert the coordinates dp in DepthSpace and distance depth to the coordinates sp in CameraSpace. Return value is S_OK or error code.

NtKinect

NtKinect's member variable for mapping coordinate system

An instance of ICoordinateMapper class used for mapping coordinate systems in Kinect V2 is held in NtKinect's member variable "coordinateMapper".

type	variable name	descriptions
CComPtr<ICoordinateMapper>	coordinateMapper	An instance of ICoordinateMapper used for mapping coordinate systems.

How to write program

Start using the Visual Studio's project KinectV2_face.zip of "How to recognize human face with Kinect V2 in ColorSpace coordinate system" .

This project is set as follows.

link Kinect20.Face.lib library
copy all the necessary files (Kinect20.Face.dll, etc.) after compilation.

Change the contents of main.cpp as follows.

Get the Depth image.

Since the value of each pixel of the Depth image is represented by CV_16UC1 (one unsigned int of 16 bitwidth), it is converted to an image that can be written in RGB color. Each pixel is converted to CV_8UC1 (one unsigned int of 8 bitwidth) once to become the maximum luminance (= 255) at 4500 mm distance, and converted to CV_8UC3 (three unsigned int of 8 bitwidth, BGR format).

The positions of the face parts are drawn above the Depth image.

main.cpp

#include <iostream>
#include <sstream>

#define USE_FACE
#include "NtKinect.h"

using namespace std;

string faceProp[] = {
  "happy", "engaed", "glass", "leftEyeClosed",
  "rightEyeClosed", "mouseOpen", "mouseMoved", "lookingAway"
};
string dstate[] = { "unknown", "no", "maybe", "yes" };

void doJob() {
  NtKinect kinect;
  cv::Mat img8;
  cv::Mat img;
  while (1) {
    kinect.setDepth();
    kinect.depthImage.convertTo(img8,CV_8UC1,255.0/4500);
    cv::cvtColor(img8,img,cv::COLOR_GRAY2BGR);
    kinect.setSkeleton();
    for (auto person : kinect.skeleton) {
      for (auto joint : person) {
        if (joint.TrackingState == TrackingState_NotTracked) continue;
        DepthSpacePoint dp;
        kinect.coordinateMapper->MapCameraPointToDepthSpace(joint.Position,&dp);
        cv::rectangle(img, cv::Rect((int)dp.X-5, (int)dp.Y-5,10,10), cv::Scalar(0,0,255),2);
      }
    }
    kinect.setFace(false);
    for (cv::Rect r : kinect.faceRect) {
      cv::rectangle(img, r, cv::Scalar(255, 255, 0), 2);
    }

    for (vector<PointF> vf : kinect.facePoint) {
      for (PointF p : vf) {
        cv::rectangle(img, cv::Rect((int)p.X-3, (int)p.Y-3, 6, 6), cv::Scalar(0, 255, 255), 2);
      }
    }
    for (int p = 0; p < kinect.faceDirection.size(); p++) {
      cv::Vec3f dir = kinect.faceDirection[p];
      cv::putText(img, "pitch : " + to_string(dir[0]), cv::Point(200 * p + 50, 30), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0,255,255), 1, CV_AA);
// rename CV_AA as cv::LINE_AA (in case of opencv3 and later)
      cv::putText(img, "yaw : " + to_string(dir[1]), cv::Point(200 * p + 50, 60), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0,255,255), 1, CV_AA);
      cv::putText(img, "roll : " + to_string(dir[2]), cv::Point(200 * p + 50, 90), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0,255,255), 1, CV_AA);
    }
    for (int p = 0; p < kinect.faceProperty.size(); p++) {
      for (int k = 0; k < FaceProperty_Count; k++) {
        int v = kinect.faceProperty[p][k];
        cv::putText(img, faceProp[k] +" : "+ dstate[v], cv::Point(200 * p + 50, 30 * k + 120), cv::FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0,255,255), 1, CV_AA);

      }
    }
    cv::imshow("depth", img);
    auto key = cv::waitKey(1);
    if (key == 'q') break;
  }
  cv::destroyAllWindows();
}

int main(int argc, char** argv) {
  try {
    doJob();
  } catch (exception &ex) {
    cout << ex.what() << endl;
    string s;
    cin >> s;
  }
  return 0;
}

When you run the program, RGB images are displayed. Exit with 'q' key.

The positions of the face parts recognized are drawn as yellow rectangles above the Depth image.

In this example, each person's face orientation and states are displayed at the top of the screen.

At the upper left corner of the window, face orientation (pitch, yaw, roll) and recognized face properties are displayed.

Please click here for this sample project KinectV2_face3.zip.

Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.

http://nw.tsuda.ac.jp/