NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to recognize both of face and HDFace with Kinect V2

2016.11.29: created by
Japanese English
topics for NtKinect.h version1.4 or later.
To Table of Contents

Prerequisite knowledge

Recognizing both of Face and HDFace with Kinect V2

If you define USE_FACE constant before including NtKinect.h, the functions and variables of NtKinect for Face and HDFace Recogninition become effective.

In HDFace Recognition, the position of the recognized face part may be slightly misaligned in a state where individual's face model is not created. In this state, even if parts of the face are cut out based on the information of HDFace, correnct partial images of RGB image can not be obtained.

On the other hand, in "normal Face recognition that is not HDFace", you can only get the position of left eye, right eye, nose, left edge of mouse, and right edge of mouse, but it seems that you can get quite accurate data. Therefore, it comes up with an idea to correct the data of HDFace Recognition with normal Face Recognition.

In this article, we explain how to correct the HDFace data with normal Face Recognition data.

We show the program that generates a image doubled with eyes vertically and horizontally, namely quadrupled by area. When cutting out the eye image, the eye size of HDFace data is used as it is, but eye position is corrected with the data obtained by Face recognition.

[Notice] In order to use "Face Recognition" data and "HDFace Recognition" data together, it is necessary to confirm that it is data on the same person with

faceTrackingId[i] == hdfaceTrackingId[j] 

Face Recognition

In Kinect for Windows SDK 2.0, face recognition is defined as follows.

Quoted from Kinect.Face.h of Kinect for Windows SDK 2.0
enum _FacePointType {
    FacePointType_None= -1,
    FacePointType_EyeLeft= 0,
    FacePointType_EyeRight= 1,
    FacePointType_Nose= 2,
    FacePointType_MouthCornerLeft= 3,
    FacePointType_MouthCornerRight= 4,
    FacePointType_Count= ( FacePointType_MouthCornerRight + 1 ) 

enum _FaceProperty {
    FaceProperty_Happy= 0,
    FaceProperty_Engaged= 1,
    FaceProperty_WearingGlasses= 2,
    FaceProperty_LeftEyeClosed= 3,
    FaceProperty_RightEyeClosed= 4,
    FaceProperty_MouthOpen= 5,
    FaceProperty_MouthMoved= 6,
    FaceProperty_LookingAway= 7,
    FaceProperty_Count= ( FaceProperty_LookingAway + 1 ) 
Quoted from Kinect.h of Kinect for Windows SDK 2.0
enum _DetectionResult {
    DetectionResult_Unknown= 0,
    DetectionResult_No= 1,
    DetectionResult_Maybe= 2,
    DetectionResult_Yes= 3
Quoted from Kinect.h of Kinect for Windows SDK 2.0
typedef struct _PointF {
    float X;
    float Y;
} PointF;

After calling setSkeleton() function to recognize skeleton, call setFace() function to recognize face.

NtKinect's Functions for Face Recognition

type of return value function name descriptions
void setFace() version1.2 or earlier.
After calling setSkeleton(), this function can be called to recognize human face.
Values are set to the next member variables.
typevariable namedescriptions
vector<vector<PointF>> facePointFace part position
vector<cv::Rect> faceRectBoundingBox of face
vector<cv::Vec3f> faceDirectionFace direction
vector<vector<DetectionResult>> facePropertyFace states
void setFace(bool isColorSpace = true) version1.3 or later.
After calling setSkeleton(), this function can be called to recognize human face.
Values are set to the next member variables.
typevariable namedescriptions
vector<vector<PointF>> facePointFace part position
vector<cv::Rect> faceRectBoundingBox of face
vector<cv::Vec3f> faceDirectionFace direction
vector<vector<DetectionResult>> facePropertyFace states
Calling this function with no argument or "true" as the first argument, the position in the ColorSpace coordinate system is set in the variable.

NtKinect's member variable for Face Recognition

type variable name descriptions
vector<vector<PointF>> facePoint Face part positions.
The position of one person's "left eye, right eye, nose, left end of mouth, right end of mouse" is represented with vector<PointF> .
To handle multiple people's data, the type is vector<vector<PointF>> .
(version1.2 or earlier) coordinates in ColorSpace.
(version1.3 or after) coordinates in ColorSpace or DepthSpace.
vector<cv::Rect> faceRect Vector of BoundingBox of face.
(version 1.2 or earlier) coordinates in ColorSpace.
(version1.3 or after) coordinates in ColorSpace or DepthSpace.
vector<cv::Vec3f> faceDirection Vector of face direction (pitch, yaw, roll).
vector<vector<DetectionResult>> faceProperty Face States.
The state of one person's "happy, engaged, wearing glases, left eye closed, right eye closed, mouth open, mouth moved, looking away" is the vector<DetectionResult> . To handle multiple people, the data type is vector<vector<DetectionResult>> .
vector<UINT64> faceTrackingId version 1.4 or later.
Vector of trackingId.
The trackingId corresponding to face information faceRect[index ] is faceTrackingId[index ].

HDFace Recognition

Detailed face information (HDFace) can be recognized.

After calling setSkeleton() funtion to recognize skeleton, call setHDFace() function to recognize HDFace.

NtKinect's functions for Detailed Face Recognition (HDFace)

type of return value function name descriptions
void setHDFace() version1.4 or later.
After calling setSkeleton() function, this function can be called to recognize the detailed face information (HDFace).
Values are set to the following member variables.
typevariable namedescriptions
vector<vector<CameraSpacePoint>> hdfaceVerticesposition of face part
vector<UINT64> hdfaceTrackingIdskeleton trackingId corresponding to the face
vector<pair<int,int>> hdfaceStatuspair of "FaceModelBuilderCollectionStatus" and "FaceModelBuilderCaptureStatus"
pair<string,string> hdfaceStatusToString(pair<int,int>) version1.4 or later.
hdfaceStatus[index ] is the collection status of data required to create a face model. When it is passed to this function, the pair of state string is returned.
bool setHDFaceModelFlag(bool flag=false) version1.8 or later.
This function set the internal flag to generate individual's face model at a time when data for creating a face model is sufficiently collected.
The default value is false, and individual's face model will not be generated. If you call the setHDFace() function multiple times after calling this function with argument "true", individual face models will be generated at an appropriate timing. Individual's face model is expected to increase the precision of detailed face (HDFace) recognition.
Since the program may become unstable, this function is treated experimentally.

NtKinect's member variables for Detailed Face Recognition (HDFace)

type variable name descriptions
vector<vector<CameraSpacePoint>> hdfaceVertices version1.4 or later.
Position of face parts in CameraSpace coordinate system.
A vector<CameraSpacePoint> holds the position of 1347 points on one human's face.
To handle multiple people, the type of this variable is vector<vector<CameraSpacePoint>> .
vector<UINT64> hdfaceTrackingId version1.4 or after.
vector of trackingId.
hdfaceTrackingId[index ] corresponds to hdfaceVertices[index ].
vector<pair<int,int>> hdfaceStatus version1.4 or later.
state of face recognition.
The state of HDFace recognition for one person is a pair of FaceModelBuilderCollectionStatus and FaceModelBuilderCaptureStatus, and is expressed as pair<int,int> . To handle multiple people, the type of this variable is vector<pair<int,int>> .


The value is OR of the next states.
Constant name of FaceModelBuilderCollectionStatus value
FaceModelBuilderCollectionStatus_ Complete 0
MoreFramesNeeded 0x2
LeftViewsNeeded 0x4
RightViewsNeeded 0x8
TiltedUpViewsNeeded 0x10


The value is one of the following.
Constant name of FaceModelBuilderCaptureStatus value
FaceModelBuilderCaptureStatus_ GoodFrameCapture 0
OtherViewsNeeded 1
LostFaceTrack 2
FaceTooFar 3
FaceTooNear 4
MovingTooFast 5
SystemError 6

How to write program

  1. Start using the Visual Studio's project KinectV2_hdface.zip of " NtKinect: How to recognize detailed face information (HDFace) with Kinect V2 " .
  2. This project is set as follows.

  3. Change the contents of "main.cpp" as follows.
  4. The blue letter patrt is related to the contents of this time, please read and understand well. The green letter part is related to the display of the "work" window, it can be deleted after execution is confirmed.

    #include <iostream>
    #include <sstream>
    #define USE_FACE
    #include "NtKinect.h"
    using namespace std;
    int getFaceIndex(NtKinect& kinect, UINT64 trackingId) {
      for (int i=0; i< kinect.faceTrackingId.size(); i++) {
        if (kinect.faceTrackingId[i] == trackingId) return i;
      return -1;
    void copyRect(cv::Mat& src, cv::Mat& dst, int sx, int sy, int w, int h, int dx, int dy) {
      if (sx+w < 0 || sx >= src.cols || sy+h < 0 || sy >= src.rows) return;
      if (sx < 0) { w += sx; dx -= sx; sx=0; }
      if (sx+w > src.cols) w = src.cols - sx; 
      if (sy < 0) { h += sy; dy -= sy; sy=0; }
      if (sy+h > src.rows) h = src.rows - sy;
      if (dx+w < 0 || dx >= dst.cols || dy+h < 0 || dy >= dst.rows) return;
      if (dx < 0) { w += dx; sx -= dx; dx = 0; }
      if (dx+w > dst.cols) w = dst.cols - dx;
      if (dy < 0) { h += dy; sy -= dy; dy = 0; }
      if (dy+h > dst.rows) h = dst.rows - dy;
      cv::Mat roiSrc(src,cv::Rect(sx,sy,w,h));
      cv::Mat roiDst(dst,cv::Rect(dx,dy,w,h));
    void bigEye(NtKinect& kinect,cv::Mat& result,cv::Mat& work,vector<CameraSpacePoint>& hdEye,PointF& fEye) {
      cv::Rect rect = kinect.boundingBoxInColorSpace(hdEye);
      double cx = rect.x + rect.width/2, cy = rect.y + rect.height/2;
      double dx = fEye.X - cx, dy = fEye.Y - cy;
      cv::Rect rect2((int)(rect.x+dx), (int)(rect.y+dy), rect.width, rect.height);
      double margin = 0.5, mw = rect2.width * margin, mh = rect2.height * margin;
      cv::Rect rect3 ((int)(rect2.x-mw/2), (int)(rect2.y-mh/2), (int)(rect2.width+mw), (int)(rect2.height+mh));
      if (rect3.x < 0 || rect3.y < 0 || rect3.x+rect3.width >= kinect.rgbImage.cols || rect3.y+rect3.height >= kinect.rgbImage.rows) {
        cerr << "rect3: " << rect3 << endl;
      cv::Mat eyeImg(kinect.rgbImage, rect3);
      double scale = 2.0;
      cv::resize(eyeImg,eyeImg,cv::Size((int)(eyeImg.cols*scale), (int)(eyeImg.rows*scale)));
      copyRect(eyeImg, result, 0, 0, eyeImg.cols, eyeImg.rows, (int)(rect3.x-(scale-1)*rect3.width/2), (int)(rect3.y-(scale-1)*rect3.height/2));
      cv::rectangle(work, rect, cv::Scalar(0,255,0), 2);
      cv::rectangle(work, rect2, cv::Scalar(0,0,255), 2);
      cv::rectangle(work, rect3, cv::Scalar(255,0,0), 2);
      cv::rectangle(work, cv::Rect((int)(fEye.X-2), (int)(fEye.Y-2), 4, 4), cv::Scalar(0,255,255), -1);
      cv::rectangle(work, cv::Rect((int)(cx-2), (int)(cy-2), 4, 4), cv::Scalar(255,0,255), -1);
    void doJob() {
      NtKinect kinect;
      while (1) {
        cv::Mat result = kinect.rgbImage.clone();
        cv::Mat work = kinect.rgbImage.clone();
        for (int i=0; i<kinect.hdfaceTrackingId.size(); i++) {
          int idx = getFaceIndex(kinect,kinect.hdfaceTrackingId[i]);
          if (idx < 0) continue;
          auto& hdFace = kinect.hdfaceVertices[i];
          vector<CameraSpacePoint> hdLeft({
          vector<CameraSpacePoint> hdRight({
          bigEye(kinect,result,work,hdLeft,kinect.facePoint[idx][0]); // left eye
          bigEye(kinect,result,work,hdRight,kinect.facePoint[idx][1]); // right eye
        for (int i=0; i<kinect.hdfaceVertices.size(); i++) {
          for (CameraSpacePoint sp : kinect.hdfaceVertices[i]) {
            ColorSpacePoint cp;
            cv::rectangle(work, cv::Rect((int)cp.X-1, (int)cp.Y-1, 2, 2), cv::Scalar(0,192, 0), 1);
        cv::imshow("work", work);
        cv::imshow("result", result);
        auto key = cv::waitKey(1);
        if (key == 'q') break;
    int main(int argc, char** argv) {
      try {
      } catch (exception &ex) {
        cout << ex.what() << endl;
        string s;
        cin >> s;
      return 0;
  5. When you run the program, RGB images are displayed. Exit with 'q' key.
  6. Two windows are displayed. One of them named "work" shows the recognition state, and the other named "result" shows the image with enlarged eyes.

    The "work" windows shows the following information.

    Since it is not so important, the image size is reduced just before display.

    The RGB image in the blue rectanglar area of "work" window is cut out, magnified twice in the vertical and horizontal directions (the area is enlarged by a factor of 4), and paseted to the original. In the "result" window, the result image is displayed. Before pasting the image of the enlarged eyes, if you reduce the alpha value of the peripheral area and blend images so as to reflect it, you can get a natural synthetic image. However, in forder to make the explanation easy to understand, such a processing is omitted in the above program.

    work window (part)result window (part)
  7. Please click here for this sample project KinectV2_hdface3.zip
  8. Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.