NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to recognize both of face and HDFace with Kinect V2

2016.11.29: created by

topics for NtKinect.h version1.4 or later.
To Table of Contents

Prerequisite knowledge

NtKinect: How to recognize human face with Kinect V2 in ColorSpace coordinate system
NtKinect: How to recognize human face with Kinect V2 in ColorSpace coordinate system (2)
NtKinect: How to recognize detailed face information (HDFace) with Kinect V2

Recognizing both of Face and HDFace with Kinect V2

If you define USE_FACE constant before including NtKinect.h, the functions and variables of NtKinect for Face and HDFace Recogninition become effective.

In HDFace Recognition, the position of the recognized face part may be slightly misaligned in a state where individual's face model is not created. In this state, even if parts of the face are cut out based on the information of HDFace, correnct partial images of RGB image can not be obtained.

On the other hand, in "normal Face recognition that is not HDFace", you can only get the position of left eye, right eye, nose, left edge of mouse, and right edge of mouse, but it seems that you can get quite accurate data. Therefore, it comes up with an idea to correct the data of HDFace Recognition with normal Face Recognition.

In this article, we explain how to correct the HDFace data with normal Face Recognition data.

We show the program that generates a image doubled with eyes vertically and horizontally, namely quadrupled by area. When cutting out the eye image, the eye size of HDFace data is used as it is, but eye position is corrected with the data obtained by Face recognition.

[Notice] In order to use "Face Recognition" data and "HDFace Recognition" data together, it is necessary to confirm that it is data on the same person with

faceTrackingId[i] == hdfaceTrackingId[j]

Face Recognition

NtKinect

In Kinect for Windows SDK 2.0, face recognition is defined as follows.

Quoted from Kinect.Face.h of Kinect for Windows SDK 2.0
enum _FacePointType { FacePointType_None= -1, FacePointType_EyeLeft= 0, FacePointType_EyeRight= 1, FacePointType_Nose= 2, FacePointType_MouthCornerLeft= 3, FacePointType_MouthCornerRight= 4, FacePointType_Count= ( FacePointType_MouthCornerRight + 1 ) }; enum _FaceProperty { FaceProperty_Happy= 0, FaceProperty_Engaged= 1, FaceProperty_WearingGlasses= 2, FaceProperty_LeftEyeClosed= 3, FaceProperty_RightEyeClosed= 4, FaceProperty_MouthOpen= 5, FaceProperty_MouthMoved= 6, FaceProperty_LookingAway= 7, FaceProperty_Count= ( FaceProperty_LookingAway + 1 ) };

Quoted from Kinect.Face.h of Kinect for Windows SDK 2.0

enum _FacePointType {
    FacePointType_None= -1,
    FacePointType_EyeLeft= 0,
    FacePointType_EyeRight= 1,
    FacePointType_Nose= 2,
    FacePointType_MouthCornerLeft= 3,
    FacePointType_MouthCornerRight= 4,
    FacePointType_Count= ( FacePointType_MouthCornerRight + 1 ) 
};

enum _FaceProperty {
    FaceProperty_Happy= 0,
    FaceProperty_Engaged= 1,
    FaceProperty_WearingGlasses= 2,
    FaceProperty_LeftEyeClosed= 3,
    FaceProperty_RightEyeClosed= 4,
    FaceProperty_MouthOpen= 5,
    FaceProperty_MouthMoved= 6,
    FaceProperty_LookingAway= 7,
    FaceProperty_Count= ( FaceProperty_LookingAway + 1 ) 
};

Quoted from Kinect.h of Kinect for Windows SDK 2.0
enum _DetectionResult { DetectionResult_Unknown= 0, DetectionResult_No= 1, DetectionResult_Maybe= 2, DetectionResult_Yes= 3 };

Quoted from Kinect.h of Kinect for Windows SDK 2.0
typedef struct _PointF { float X; float Y; } PointF;

After calling setSkeleton() function to recognize skeleton, call setFace() function to recognize face.

NtKinect

NtKinect's Functions for Face Recognition

type of return value

function name

descriptions

void

setFace()

version1.2 or earlier.
After calling setSkeleton(), this function can be called to recognize human face.
Values are set to the next member variables.

type	variable name	descriptions
vector<vector<PointF>>	facePoint	Face part position
vector<cv::Rect>	faceRect	BoundingBox of face
vector<cv::Vec3f>	faceDirection	Face direction
vector<vector<DetectionResult>>	faceProperty	Face states

void

setFace(bool isColorSpace = true)

version1.3 or later.
After calling setSkeleton(), this function can be called to recognize human face.
Values are set to the next member variables.

type	variable name	descriptions
vector<vector<PointF>>	facePoint	Face part position
vector<cv::Rect>	faceRect	BoundingBox of face
vector<cv::Vec3f>	faceDirection	Face direction
vector<vector<DetectionResult>>	faceProperty	Face states

Calling this function with no argument or "true" as the first argument, the position in the ColorSpace coordinate system is set in the variable.

NtKinect

NtKinect's member variable for Face Recognition

type	variable name	descriptions
vector<vector<PointF>>	facePoint	Face part positions. The position of one person's "left eye, right eye, nose, left end of mouth, right end of mouse" is represented with vector<PointF> . To handle multiple people's data, the type is vector<vector<PointF>> . (version1.2 or earlier) coordinates in ColorSpace. (version1.3 or after) coordinates in ColorSpace or DepthSpace.
vector<cv::Rect>	faceRect	Vector of BoundingBox of face. (version 1.2 or earlier) coordinates in ColorSpace. (version1.3 or after) coordinates in ColorSpace or DepthSpace.
vector<cv::Vec3f>	faceDirection	Vector of face direction (pitch, yaw, roll).
vector<vector<DetectionResult>>	faceProperty	Face States. The state of one person's "happy, engaged, wearing glases, left eye closed, right eye closed, mouth open, mouth moved, looking away" is the vector<DetectionResult> . To handle multiple people, the data type is vector<vector<DetectionResult>> .
vector<UINT64>	faceTrackingId	version 1.4 or later. Vector of trackingId. The trackingId corresponding to face information faceRect[index ] is faceTrackingId[index ].

HDFace Recognition

Detailed face information (HDFace) can be recognized.

After calling setSkeleton() funtion to recognize skeleton, call setHDFace() function to recognize HDFace.

NtKinect

NtKinect's functions for Detailed Face Recognition (HDFace)

type of return value

function name

descriptions

void

setHDFace()

version1.4 or later.
After calling setSkeleton() function, this function can be called to recognize the detailed face information (HDFace).
Values are set to the following member variables.

type	variable name	descriptions
vector<vector<CameraSpacePoint>>	hdfaceVertices	position of face part
vector<UINT64>	hdfaceTrackingId	skeleton trackingId corresponding to the face
vector<pair<int,int>>	hdfaceStatus	pair of "FaceModelBuilderCollectionStatus" and "FaceModelBuilderCaptureStatus"

pair<string,string>

hdfaceStatusToString(pair<int,int>)

version1.4 or later.
hdfaceStatus[index ] is the collection status of data required to create a face model. When it is passed to this function, the pair of state string is returned.

bool

setHDFaceModelFlag(bool flag=false)

version1.8 or later.
This function set the internal flag to generate individual's face model at a time when data for creating a face model is sufficiently collected.
The default value is false, and individual's face model will not be generated. If you call the setHDFace() function multiple times after calling this function with argument "true", individual face models will be generated at an appropriate timing. Individual's face model is expected to increase the precision of detailed face (HDFace) recognition.
Since the program may become unstable, this function is treated experimentally.

NtKinect

NtKinect's member variables for Detailed Face Recognition (HDFace)

type

variable name

descriptions

vector<vector<CameraSpacePoint>>

hdfaceVertices

version1.4 or later.
Position of face parts in CameraSpace coordinate system.
A vector<CameraSpacePoint> holds the position of 1347 points on one human's face.
To handle multiple people, the type of this variable is vector<vector<CameraSpacePoint>> .

vector<UINT64>

hdfaceTrackingId

version1.4 or after.
vector of trackingId.
hdfaceTrackingId[index ] corresponds to hdfaceVertices[index ].

vector<pair<int,int>>

hdfaceStatus

version1.4 or later.
state of face recognition.
The state of HDFace recognition for one person is a pair of FaceModelBuilderCollectionStatus and FaceModelBuilderCaptureStatus, and is expressed as pair<int,int> . To handle multiple people, the type of this variable is vector<pair<int,int>> .

FaceModelBuilderCollectionStatus

The value is OR of the next states.

Constant name of FaceModelBuilderCollectionStatus		value
FaceModelBuilderCollectionStatus_	Complete	0
	MoreFramesNeeded	0x2
	LeftViewsNeeded	0x4
	RightViewsNeeded	0x8
	TiltedUpViewsNeeded	0x10

FaceModelBuilderCaptureStatus

The value is one of the following.

Constant name of FaceModelBuilderCaptureStatus		value
FaceModelBuilderCaptureStatus_	GoodFrameCapture	0
	OtherViewsNeeded	1
	LostFaceTrack	2
	FaceTooFar	3
	FaceTooNear	4
	MovingTooFast	5
	SystemError	6

How to write program

Start using the Visual Studio's project KinectV2_hdface.zip of " NtKinect: How to recognize detailed face information (HDFace) with Kinect V2 " .

This project is set as follows.

link Kinect20.Face.lib library
copy all the necessary files (Kinect20.Face.dll, etc.) after compilation.

Change the contents of "main.cpp" as follows.

The blue letter patrt is related to the contents of this time, please read and understand well. The green letter part is related to the display of the "work" window, it can be deleted after execution is confirmed.

main.cpp

#include <iostream>
#include <sstream>

#define USE_FACE
#include "NtKinect.h"

using namespace std;

int getFaceIndex(NtKinect& kinect, UINT64 trackingId) {
  for (int i=0; i< kinect.faceTrackingId.size(); i++) {
    if (kinect.faceTrackingId[i] == trackingId) return i;
  }
  return -1;
}
void copyRect(cv::Mat& src, cv::Mat& dst, int sx, int sy, int w, int h, int dx, int dy) {
  if (sx+w < 0 || sx >= src.cols || sy+h < 0 || sy >= src.rows) return;
  if (sx < 0) { w += sx; dx -= sx; sx=0; }
  if (sx+w > src.cols) w = src.cols - sx; 
  if (sy < 0) { h += sy; dy -= sy; sy=0; }
  if (sy+h > src.rows) h = src.rows - sy;

  if (dx+w < 0 || dx >= dst.cols || dy+h < 0 || dy >= dst.rows) return;
  if (dx < 0) { w += dx; sx -= dx; dx = 0; }
  if (dx+w > dst.cols) w = dst.cols - dx;
  if (dy < 0) { h += dy; sy -= dy; dy = 0; }
  if (dy+h > dst.rows) h = dst.rows - dy;

  cv::Mat roiSrc(src,cv::Rect(sx,sy,w,h));
  cv::Mat roiDst(dst,cv::Rect(dx,dy,w,h));
  roiSrc.copyTo(roiDst);
}
void bigEye(NtKinect& kinect,cv::Mat& result,cv::Mat& work,vector<CameraSpacePoint>& hdEye,PointF& fEye) {
  cv::Rect rect = kinect.boundingBoxInColorSpace(hdEye);
  double cx = rect.x + rect.width/2, cy = rect.y + rect.height/2;
  double dx = fEye.X - cx, dy = fEye.Y - cy;
  cv::Rect rect2((int)(rect.x+dx), (int)(rect.y+dy), rect.width, rect.height);
  double margin = 0.5, mw = rect2.width * margin, mh = rect2.height * margin;
  cv::Rect rect3 ((int)(rect2.x-mw/2), (int)(rect2.y-mh/2), (int)(rect2.width+mw), (int)(rect2.height+mh));
  if (rect3.x < 0 || rect3.y < 0 || rect3.x+rect3.width >= kinect.rgbImage.cols || rect3.y+rect3.height >= kinect.rgbImage.rows) {
    cerr << "rect3: " << rect3 << endl;
    return;
  }
  cv::Mat eyeImg(kinect.rgbImage, rect3);
  double scale = 2.0;
  cv::resize(eyeImg,eyeImg,cv::Size((int)(eyeImg.cols*scale), (int)(eyeImg.rows*scale)));
  copyRect(eyeImg, result, 0, 0, eyeImg.cols, eyeImg.rows, (int)(rect3.x-(scale-1)*rect3.width/2), (int)(rect3.y-(scale-1)*rect3.height/2));
  cv::rectangle(work, rect, cv::Scalar(0,255,0), 2);
  cv::rectangle(work, rect2, cv::Scalar(0,0,255), 2);
  cv::rectangle(work, rect3, cv::Scalar(255,0,0), 2);
  cv::rectangle(work, cv::Rect((int)(fEye.X-2), (int)(fEye.Y-2), 4, 4), cv::Scalar(0,255,255), -1);
  cv::rectangle(work, cv::Rect((int)(cx-2), (int)(cy-2), 4, 4), cv::Scalar(255,0,255), -1);
}
void doJob() {
  NtKinect kinect;
  while (1) {
    kinect.setRGB();
    kinect.setSkeleton();
    kinect.setFace();
    kinect.setHDFace();
    cv::Mat result = kinect.rgbImage.clone();
    cv::Mat work = kinect.rgbImage.clone();
    for (int i=0; i<kinect.hdfaceTrackingId.size(); i++) {
      int idx = getFaceIndex(kinect,kinect.hdfaceTrackingId[i]);
      if (idx < 0) continue;
      auto& hdFace = kinect.hdfaceVertices[i];
      vector<CameraSpacePoint> hdLeft({
        hdFace[HighDetailFacePoints_LefteyeInnercorner],
        hdFace[HighDetailFacePoints_LefteyeOutercorner],
        hdFace[HighDetailFacePoints_LefteyeMidtop],
        hdFace[HighDetailFacePoints_LefteyeMidbottom]
      });
      vector<CameraSpacePoint> hdRight({
        hdFace[HighDetailFacePoints_RighteyeInnercorner],
        hdFace[HighDetailFacePoints_RighteyeOutercorner],
        hdFace[HighDetailFacePoints_RighteyeMidtop],
        hdFace[HighDetailFacePoints_RighteyeMidbottom]
      });
      bigEye(kinect,result,work,hdLeft,kinect.facePoint[idx][0]); // left eye
      bigEye(kinect,result,work,hdRight,kinect.facePoint[idx][1]); // right eye
    }
    for (int i=0; i<kinect.hdfaceVertices.size(); i++) {
      for (CameraSpacePoint sp : kinect.hdfaceVertices[i]) {
        ColorSpacePoint cp;
        kinect.coordinateMapper->MapCameraPointToColorSpace(sp,&cp);
        cv::rectangle(work, cv::Rect((int)cp.X-1, (int)cp.Y-1, 2, 2), cv::Scalar(0,192, 0), 1);
      }
    }
    cv::resize(work,work,cv::Size(work.cols/2,work.rows/2));
    cv::imshow("work", work);
    cv::imshow("result", result);
    auto key = cv::waitKey(1);
    if (key == 'q') break;
  }
  cv::destroyAllWindows();
}

int main(int argc, char** argv) {
  try {
    doJob();
  } catch (exception &ex) {
    cout << ex.what() << endl;
    string s;
    cin >> s;
  }
  return 0;
}

When you run the program, RGB images are displayed. Exit with 'q' key.

Two windows are displayed. One of them named "work" shows the recognition state, and the other named "result" shows the image with enlarged eyes.

The "work" windows shows the following information.

Green dot ... Position of face parts recognized by HDFace
Green rectangle ... Eye's bounding box recognized by HDFace
Red rectangle ... Green rectangle translated to the eye position recognized by Face
Blue rectangel ... Red rectangle enlarged by scale 1.5

The RGB image in the blue rectanglar area of "work" window is cut out, magnified twice in the vertical and horizontal directions (the area is enlarged by a factor of 4), and paseted to the original. In the "result" window, the result image is displayed. Before pasting the image of the enlarged eyes, if you reduce the alpha value of the peripheral area and blend images so as to reflect it, you can get a natural synthetic image. However, in forder to make the explanation easy to understand, such a processing is omitted in the above program.

work window (part)	result window (part)

Please click here for this sample project KinectV2_hdface3.zip。

Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.

http://nw.tsuda.ac.jp/