Kinect V2 C++ Programming with OpenCV on Windows10

NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to extract human images using Depth and BodyIndex image with Kinect V2

2016.07.21: created by

To Table of Contents

Prerequisite knowledge

NtKinect: How to get RGB camera image with Kinect V2 (Fundamental Settings)
NtKinect: How to get BodyIndex image with Kinect V2

Extract human image from RGB image using BodyIndex image

Using the bodyIndex image, you can determin which pixel the person is in (the pixel whose value is not 255). Since the bodyIndex image is the coordinates of DepthSpace, it needs to be converted to the coordinates of ColorSpace to correspond the RGB image. We also get Depth image to use depth information in the conversion.

BodyIndex image

BodyIndex images can be acquired with a resolution of 512 x 424 as well as Depth images. Up to six people can be distinguished at the same time.

In NtKinect, the obtained BodyIndex image is represented by uchar or cv::Vector3b for each pixel.

NtKinect

NtKinect's functions for BodyIndex image

type of return value型	function name	descriptions
void	setBodyIndex(bool raw = true)	Set BodyIndex image to the member variable "bodyIndexImage" If the argument raw is "true" or there is no argument, the type of each pixel is `uchar`. If the argument raw is "false", the type of each pixel is `cv::Vec3b`.

NtKinect

NtKinect's member variable for BodyIndex image

type variable name descriptions

cv::Mat

bodyIndexImage

BodyIndex image.
Up to six people can be detected at the same time, and BodyIndex itself is a number from 0 to 5 assigned to each detected person. The type of each pixel of the BodyIndex image may be uchar or cv::Vec3b. The coordinates of the image are positions in the DepthSpace coordinate system.

bodyIndex.cols --- Resolution for the horizontal direction of the image (= 512)
bodyIndex.rows --- Resolution for the vertical direction of the image (= 424)
bodyIndex.channels() --- Return 1 if the pixel is of type uchar, and 3 if cv::Vec3b.

In case of uchar data type

The value of each pixel is 0 to 5 in the pixel where a person is detected, and 255 in the other pixels.

bodyIndexImage.at<uchar>(y , x ) --- Access pixel at the (x , y ) coordinates of the image

    uchar pixel = bodyIndexImage.at<uchar>(y , x )

In case of cv::Vec3b data type

The value of each pixel is the RGB value corresponding to the bodyIndex number.

bodyIndexImage.at<cv::Vec3b>(y , x ) --- Access pixel in the (x , y ) coordinates of the image

    cv::Mat pixel = bodyIndexImage.at<cv::Vec3b>(y , x )

bodyIndex	cv::Vec3b
0	255	0	0
1	0	255	0
2	0	0	255
3	255	255	0
4	255	0	255
5	0	255	255

Getting Depth image

Depth (distance) images can be acquired with a resolution of 512 x 424. The measurable distance range is from 500 mm to 8000 mm, but the range to recognize human beings is from 500 mm to 4500 mm.

In NtKinect, the obtained Depth image is represented by UINT16 (16 bit unsigned integer) for each pixel.

NtKinect

NtKinect's function for Depth image

type of return value	function name	descriptions
void	setDepth(bool raw = true)	Set the Depth image to the member variable "depthImage". When this function is called with no argument or "true" as first argument, the distance is set in mm for each pixel. When this function is called with "false", a value obtained by multiplying the distance by 65535/4500 is set for each pixel. That is, the image is mapped to the luminance of the black and white image of 0 (black) to 65535 (white) with the distance of 0 mm to 4500 mm.

NtKinect

NtKinect's member variable for Depth iamge

type	variable name	descriptions
cv::Mat	depthImage	Depth image. The resolution is of 512 x 424 and each pixel is represented by UINT16. The coordinates of the image are the position in the DepthSpace coordinate system. depthImage.cols --- Resolution in the horizontal direction of the image (= 512) depthImage.rows --- Resolution in the vertical direction of the image (= 424) depthImage.at<UINT16>(y , x ) --- Access pixel in the (x , y ) coordinates of the image UINT16 depth = rgbImage.at<UINT16>(y , x );

NtKinect

3 types of coordinate system of Kinect V2

Since the position and resolution of each sensor is different, the data is obtained as a value expressed in the coordinate system of each sensor. When using data obtained from different sensors at the same time, it is necessary to convert the coordinates to match.

Kinect V2 has 3 coordinate systems, ColorSpace, DepthSpace, and CameraSpace. There are 3 data types ColorSpacePoint, DepthSpacePoint, and CameraSpacePoint representing coordinates in each coordinate system.

Quoted from Kinect.h of Kinect for Windows SDK 2.0
typedef struct _ColorSpacePoint { float X; float Y; } ColorSpacePoint; typedef struct _DepthSpacePoint { float X; float Y; } DepthSpacePoint; typedef struct _CameraSpacePoint { float X; float Y; float Z; } CameraSpacePoint;

Quoted from Kinect.h of Kinect for Windows SDK 2.0

typedef struct _ColorSpacePoint {
    float X;
    float Y;
} ColorSpacePoint;

typedef struct _DepthSpacePoint {
    float X;
    float Y;
} DepthSpacePoint;

typedef struct _CameraSpacePoint {
    float X;
    float Y;
    float Z;
} CameraSpacePoint;

NtKinect

Coordinate systems and data types of Kinect V2

For the RGB image, Depth image, and skeleton information, the coordinate system is different. The coordinate system of the RGB image is ColorSpace, that of the Depth image is DepthSpace, and that of the skeleton information is CameraSpace.

Coordinate system	type of coordinates	Captured Data
ColorSpace	ColorSpacePoint	RGB image
DepthSpace	DepthSpacePoint	depth image, bodyIndex image, infrared image
CameraSpace	CameraSpacePoint	skeleton information

CameraSpace coordinate system representing skeleton position
The CameraSpace is a 3-dimensional coordinate system with the following features. Kinect V2 is located at the origin of the coordinate system. The direction of the camera lense is the positive direction of the z-axis. Vertical upward direction is positive direction of y-axis. Right-handed. That is, in all 3 types of coordinate systems, CameraSpace, ColorSpace, and DepthSpace, "the horizontal direction from left to right seen from the user facing Kienct V2" is the positive direction of the x-axis. I think you may understand "Data is aquired and displayed as if the image reflected in the mirror is seen from the user facing Kinect V2". (2016/11/12 figure changed, and description added).

CameraSpace coordinate system representing skeleton position

The CameraSpace is a 3-dimensional coordinate system with the following features.

Kinect V2 is located at the origin of the coordinate system.
The direction of the camera lense is the positive direction of the z-axis.
Vertical upward direction is positive direction of y-axis.
Right-handed.

That is, in all 3 types of coordinate systems, CameraSpace, ColorSpace, and DepthSpace, "the horizontal direction from left to right seen from the user facing Kienct V2" is the positive direction of the x-axis. I think you may understand "Data is aquired and displayed as if the image reflected in the mirror is seen from the user facing Kinect V2".
(2016/11/12 figure changed, and description added).

NtKinect

Kinect V2's function for mapping coordinate systems

"Coordinate system conversion function" held by ICoordinateMapper class of Kinect V2 is as follows.

type of return value	function name	descriptions
HRESULT	MapCameraPointToColorSpace( CameraSpacePoint sp , ColorSpacePoint *cp )	Convert the coordinates sp in the CameraSpace to the coordinates cp in the ColorSpace. Return value is S_OK or error code.
HRESULT	MapCameraPointToDepthSpace( CameraSpacePoint sp , DelpthSpacePoint *dp )	Convert the coordinates sp in the CameraSpace to the coordinates dp in DepthSpace. Return value is S_OK or error code.
HRESULT	MapDepthPointToColorSpace( DepthSpacePoint dp , UINT16 depth , ColorSpacePoint *cp )	Convert the coordinates dp in DepthSpace and distance depth to the coordinates cp in ColorSpace. Return value is S_OK or error code.
HRESULT	MapDepthPointToCameraSpace( DepthSpacePoint dp , UINT16 depth , CameraSpacePoint *sp )	Convert the coordinates dp in DepthSpace and distance depth to the coordinates sp in CameraSpace. Return value is S_OK or error code.

NtKinect

NtKinect's member variable for mapping coordinate system

An instance of ICoordinateMapper class used for mapping coordinate systems in Kinect V2 is held in NtKinect's member variable "coordinateMapper".

type	variable name	descriptions
CComPtr<ICoordinateMapper>	coordinateMapper	An instance of ICoordinateMapper used for mapping coordinate systems.

How to write program

Start using the Visual Studio's project KinectV2.zip of "NtKinect: How to get RGB camera image with Kinect V2 (Fundamental Settings)"

In the following explanation, it is assumed that the folder is renamed to "KinectV2_bodyIndex2" after expanding the above zip file.

Place the image file in the folder where the source files of the project is located.

There is a folder called "KinectV2" in the project, and there are "main.cpp" and "Kinect.h" there.

    KinectV2_bodyIndex2\KinectV2\main.cpp
    KinectV2_bodyIndex2\KinectV2\NtKinect.h

cat.jpg

downloaded

free material site

    KinectV2_bodyIndex2\KinectV2\cat.jpg

Change the contents of main.cpp.

main.cpp

#include <iostream>
#include <sstream>

#include "NtKinect.h"

using namespace std;

void copyRect(cv::Mat& src, cv::Mat& dst, int sx, int sy, int w, int h, int dx, int dy) {
  if (sx+w < 0 || sx >= src.cols || sy+h < 0 || sy >= src.rows) return;
  if (sx < 0) { w += sx; dx -= sx; sx=0; }
  if (sx+w > src.cols) w = src.cols - sx; 
  if (sy < 0) { h += sy; dy -= sy; sy=0; }
  if (sy+h > src.rows) h = src.rows - sy;

  if (dx+w < 0 || dx >= dst.cols || dy+h < 0 || dy >= dst.rows) return;
  if (dx < 0) { w += dx; sx -= dx; dx = 0; }
  if (dx+w > dst.cols) w = dst.cols - dx;
  if (dy < 0) { h += dy; sy -= dy; dy = 0; }
  if (dy+h > dst.rows) h = dst.rows - dy;

  cv::Mat roiSrc(src,cv::Rect(sx,sy,w,h));
  cv::Mat roiDst(dst,cv::Rect(dx,dy,w,h));
  roiSrc.copyTo(roiDst);
}

void doJob() {
  NtKinect kinect;
  cv::Mat cat = cv::imread("cat.jpg");
  cv::Mat bgImg;
  cv::Mat fgImg;
  while (1) {
    kinect.setRGB();
    cv::cvtColor(kinect.rgbImage,fgImg,CV_BGRA2BGR); // cv::COLOR_BGRA2BGR  (in case of opencv3 and later)
    bgImg = cat.clone();
    kinect.setDepth();
    kinect.setBodyIndex();
    for (int y=0; y<kinect.bodyIndexImage.rows; y++) {
      for (int x=0; x<kinect.bodyIndexImage.cols; x++) {
        UINT16 d = kinect.depthImage.at<UINT16>(y,x);
        uchar bi = kinect.bodyIndexImage.at<uchar>(y,x);
        if (bi == 255) continue;
        ColorSpacePoint cp;
        DepthSpacePoint dp; dp.X = x; dp.Y = y;
        kinect.coordinateMapper->MapDepthPointToColorSpace(dp, d, &cp);
        int cx = (int) cp.X, cy = (int) cp.Y;
        copyRect(fgImg,bgImg,cx-2,cy-2,4,4,cx-2,cy-2);
      }
    }
    cv::imshow("cat", bgImg);
    auto key = cv::waitKey(1);
    if (key == 'q') break;
  }
  cv::destroyAllWindows();
}

int main(int argc, char** argv) {
  try {
    doJob();
  } catch (exception &ex) {
    cout << ex.what() << endl;
    string s;
    cin >> s;
  }
  return 0;
}

First, we load the background image into the variable "cat".

Repeat the following process until 'q' is entered from the keyboard.

Get an RGB image from the camera. since the format of the RGB image is "BGRA" format, convert it to the "BGR" format according to the format of the jpeg image read into the variable "cat". Then, hold it in the variable "fgImg".

We holds the synthesized image in the variable "bgImg". Copy the contents of the variable "cat" as the background image.

Get the Depth image and the bodyIndexImage. When we find a pixel in which a human being appears in the bodyIndex image, we calculate the position of the corresponding ColorSpace, and paste the 4 x 4 area around the ColorSpace position from the RGB image "fgImg" on the composite image "bgImg". Since the coordinates of bodyIndex image are values in DepthSpace, they are converted to the values of ColorSpace using the depth information (the value of the Depth image). Since the resolution is different between the bodyIndex image (512 x 424) and the RGB image (1920 x 1080), a range of 4 x 4 RGB image is pasted per 1 pixel of bodyIndex image.

The copyRect(cv::Mat& src , cv::Mat& dst , int sx , int sy , int w , int h , int dx , int dy ) function defined in the program paste the partial image on src image to dst image. In the region on src, the upper left is (sx , sy ) and width and height are w and h respectively. In resion on dst/i>, the upper left is (dx , dy ). The size of the rectangular area is adjusted so that it does not extend beyond the range of image.

The programming code

    copyRect(fgImg,bgImg,cx-2,cy-2,4,4,cx-2,cy-2);

    for (int y=cy-2; y < cy+2; y++) {
      for (int x=cx-2; x < cx+2; x++) {
        if (x, y is contained in both image) {
          bgImg.at<cv::Vec3b>(y,x) = fgImg.at<cv::Vec3b>(y,x); // copy BGR pixel at (line:y, col:x)
        }
      }
    }

In OpenCV, the element in the cv::Mat is accessed by the "at<TYPE>(int line, int col)" member function. For the "group of 4 byte type data (ie. BGRA format pixels)", the TYPE is cv::Vec4b, and For the "group of 3 byte type data (ie. BGR format pixels)", the TYPE is cv::Vec3b.

When you run the program, RGB images are displayed. Exit with 'q' key.

Human images are extracted and displayed on the background image.

Please click here for this sample project KinectV2_bodyIndex2.zip。

Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.

http://nw.tsuda.ac.jp/