NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

[Application] How to measure breath cycle with Kinect V2

2017.09.05: edit started by
Japanese English
To Table of Contents

Prerequisite knowledge

Getting RGB came image with Kinect V2

Kinect V2 can acquire RGB camera image with resolution of 1920 x 1080. Since OpenCV uses BGR format or BGRA format as the basis, NtKinect adopts the BGRA format.

NtKinect's member functions for RGB image

type of return value function name description
void setRGB() Get the RGB image and set it to the public member variable rgbImage.

NtKinect's member variable for RGB image

type variable name description
cv::Mat rgbImage Image of RGB camera. The resolution is 1920 x 1080 and GBRA format.
The coordinates of the image are the positions in the ColorSpace coordinate system.
  • rgbImage.cols --- Resolution in the horizontal direction of the image (=1920)
  • rgbImage.rows --- Resolution in the vertical direction of the image (=1080)
  • rgbImage.at<cv::Vec4b>(y , x ) --- Access the pixel in the (x , y ) coordinates of the image
  •         cv::Vec4b pixel = rgbImage.at<cv::Vec4b>(y,x);
                pixel[0] // Blue
                pixel[1] // Green
                pixel[2] // Red
                pixel[3] // Alpha

Getting Depth image

Depth (distance) images can be acquired with a resolution of 512 x 424. The measurable distance range is from 500 mm to 8000 mm, but the range to recognize human beings is from 500 mm to 4500 mm.

In the Kinect20.lib IDepthFrameSource has the "get_DepthMaxReliableDistance()" and "get_DepthMaxReliableDistance()" functions, each returns 500 and 4500 respectively.

In NtKinect, the obtained Depth image is represented by UINT16 (16 bit unsigned integer) for each pixel.

NtKinect's function for Depth image

type of return value function name descriptions
void setDepth(bool raw = true) Set the Depth image to the member variable "depthImage".
When this function is called with no argument or "true" as first argument, the distance is set in mm for each pixel.
When this function is called with "false", a value obtained by multiplying the distance by 65535/4500 is set for each pixel. That is, the image is mapped to the luminance of the black and white image of 0 (black) to 65535 (white) with the distance of 0 mm to 4500 mm.

NtKinect's member variable for Depth iamge

type variable name descriptions
cv::Mat depthImage Depth image. The resolution is of 512 x 424 and each pixel is represented by UINT16.
The coordinates of the image are the position in the DepthSpace coordinate system.
  • depthImage.cols --- Resolution in the horizontal direction of the image (= 512)
  • depthImage.rows --- Resolution in the vertical direction of the image (= 424)
  • depthImage.at<UINT16>(y , x ) --- Access pixel in the (x , y ) coordinates of the image
  •         UINT16 depth = rgbImage.at<UINT16>(y , x );

3 types of coordinate system of Kinect V2

Since the position and resolution of each sensor is different, the data is obtained as a value expressed in the coordinate system of each sensor. When using data obtained from different sensors at the same time, it is necessary to convert the coordinates to match.

Kinect V2 has 3 coordinate systems, ColorSpace, DepthSpace, and CameraSpace. There are 3 data types ColorSpacePoint, DepthSpacePoint, and CameraSpacePoint representing coordinates in each coordinate system.

Quoted from Kinect.h of Kinect for Windows SDK 2.0
typedef struct _ColorSpacePoint {
    float X;
    float Y;
} ColorSpacePoint;

typedef struct _DepthSpacePoint {
    float X;
    float Y;
} DepthSpacePoint;

typedef struct _CameraSpacePoint {
    float X;
    float Y;
    float Z;
} CameraSpacePoint;

Coordinate systems and data types of Kinect V2

For the RGB image, Depth image, and skeleton information, the coordinate system is different. The coordinate system of the RGB image is ColorSpace, that of the Depth image is DepthSpace, and that of the skeleton information is CameraSpace.

Coordinate systemtype of coordinatesCaptured Data
ColorSpaceColorSpacePointRGB image
DepthSpaceDepthSpacePointdepth image, bodyIndex image, infrared image
CameraSpaceCameraSpacePointskeleton information

CameraSpace coordinate system representing skeleton position

The CameraSpace is a 3-dimensional coordinate system with the following features.

  • Kinect V2 is located at the origin of the coordinate system.
  • The direction of the camera lense is the positive direction of the z-axis.
  • Vertical upward direction is positive direction of y-axis.
  • Right-handed.
That is, in all 3 types of coordinate systems, CameraSpace, ColorSpace, and DepthSpace, "the horizontal direction from left to right seen from the user facing Kienct V2" is the positive direction of the x-axis. I think you may understand "Data is aquired and displayed as if the image reflected in the mirror is seen from the user facing Kinect V2".
(2016/11/12 figure changed, and description added).

Kinect V2's function for mapping coordinate systems

"Coordinate system conversion function" held by ICoordinateMapper class of Kinect V2 is as follows.

type of return value function name descriptions
HRESULT MapCameraPointToColorSpace(
    CameraSpacePoint sp ,
    ColorSpacePoint *cp )
Convert the coordinates sp in the CameraSpace to the coordinates cp in the ColorSpace. Return value is S_OK or error code.
HRESULT MapCameraPointToDepthSpace(
  CameraSpacePoint sp ,
  DelpthSpacePoint *dp )
Convert the coordinates sp in the CameraSpace to the coordinates dp in DepthSpace. Return value is S_OK or error code.
HRESULT MapDepthPointToColorSpace(
  DepthSpacePoint dp ,
  UINT16 depth ,
  ColorSpacePoint *cp )
Convert the coordinates dp in DepthSpace and distance depth to the coordinates cp in ColorSpace. Return value is S_OK or error code.
HRESULT MapDepthPointToCameraSpace(
  DepthSpacePoint dp ,
  UINT16 depth ,
  CameraSpacePoint *sp )
Convert the coordinates dp in DepthSpace and distance depth to the coordinates sp in CameraSpace. Return value is S_OK or error code.

NtKinect's member variable for mapping coordinate system

An instance of ICoordinateMapper class used for mapping coordinate systems in Kinect V2 is held in NtKinect's member variable "coordinateMapper".

type variable name descriptions
CComPtr<ICoordinateMapper> coordinateMapper An instance of ICoordinateMapper used for mapping coordinate systems.

How to write program

  1. Start using the Visual Studio's project KinectV2.zip of "NtKinect: How to get RGB camera image with Kinect V2 (Fundamental Settings)"
  2. Change the contents of main.cpp.
  3. Call kinect.setDepth() function to set depth (distance) data to kinect.depthImage. Since no argument is specified, the value of pixel is raw, that is, the distance to the object in millimeters.

     * Copyright (c) 2017 Yoshihisa Nitta
     * Released under the MIT license
     * http://opensource.org/licenses/mit-license.php
    #include <iostream>
    #include <sstream>
    #define _USE_MATH_DEFINES
    #include <cmath>
    #include "NtKinect.h"
    using namespace std;
    void draw(cv::Mat& img, const vector<double>& v, int start = 0, int n = 1024) {
      stringstream ss;
      if (start < 0) start = v.size() + start;
      if (start < 0) start = 0;
      if (start >= v.size()) return;
      int end = start + n;
      if (end > v.size()) end = v.size();
      int m = end - start; // real data number
      if (m <= 0) return;
      int padding = 30;
      double wstep = ((double) img.cols - 2 * padding) / n;
      auto Dmin = *min_element(v.begin()+start,v.begin()+end);
      auto Dmax = *max_element(v.begin()+start,v.begin()+end);
      if (Dmin == Dmax) Dmax = Dmin + 1;
      for (int i=0; i<m; i++) {
        int x = (int) (padding + i * wstep);
        int y = (int) (padding + (img.rows - 2 * padding) * (v[start+i] - Dmin) / (Dmax-Dmin));
        y = img.rows - 1 - y;
      ss.str(""); ss << (int)Dmin;
      ss.str(""); ss << (int)Dmax;
    void DFT(vector<double>& data,vector<double>&ret,int start, int n) {
      if (start < 0) start = data.size() + start;
      if (start < 0) start = 0;
      if (start >= data.size()) start = data.size();
      if (start + n > data.size()) n = data.size() - start;
      if (n <= 0) return;
      vector<double> re(n), im(n);
      for (int i=0; i<n; i++) {
        re[i] = 0.0;
        im[i] = 0.0;
        double d = 2 * M_PI * i / n;
        for (int j=0; j<n; j++) {
          re[i] += data[start+j] * cos(d * j);
          im[i] -= data[start+j] * sin(d * j);
      for (int i=0; i<n; i++) ret[i] = sqrt(re[i]*re[i] + im[i]*im[i]);
    void drawTarget(NtKinect& kinect,cv::Mat& img,int dx,int dy,UINT16 depth) {
      int scale = 4;
      DepthSpacePoint dp; dp.X = (float)dx; dp.Y = (float)dy;
      ColorSpacePoint cp;
      stringstream ss;
      ss << dx << " " << dy << " " << depth << " " << (int)cp.X << " " << (int)cp.Y;
    void drawMsg(cv::Mat& img, vector<double>& ret,long dt) {
      stringstream ss;
      double df = 1000.0 / dt; // frequency resolution (sec)
      int y = 100;
      ss << "resolution = " << df ;
      y += 40;
      for (int i=1; i<ret.size()/2 -1; i++) {
        if (ret[i] > ret[i-1] && ret[i] > ret[i+1]) {
          double freq = i * df;
          if (freq > 1) continue;
          ss << i << ": period = " << (1.0/freq) << "   " << ret[i];
          y += 40;
    void doJob() {
      const int n_ave = 8;     // running average
      const long min_period = 32 * 1000; // milliseconds
      int n_dft = 512;   // sampling number (changed)
      NtKinect kinect;
      cv::Mat rgb, dImg(480,1280,CV_8UC3);;
      vector<double> depth, depth_ave;
      double sum = 0;
      vector<long> vtime;
      vector<double> result(n_dft);
      long t0 = GetTickCount();
      bool init_flag = false;
      for (int count=0; ; count++) {
        int dx = kinect.depthImage.cols / 2;
        int dy = kinect.depthImage.rows * 2 / 3;
        UINT16 dz = kinect.depthImage.at<UINT16>(dy,dx);
        long t = GetTickCount();
        sum += (double)dz;
        if (depth.size() > n_ave) sum -= depth[depth.size()-1-n_ave];
        if (depth.size() >= n_ave) {
        cv::imshow("rgb", rgb);
        draw(dImg, depth_ave, -n_dft, n_dft);
        if (init_flag) {
          stringstream ss;
          ss << "n_dft = " << n_dft;
        cv::imshow("moving average",dImg);
        if (t - t0 >= min_period) {
          if (init_flag == false) {
    	//n_dft = (int) pow(2.0, (int)ceil(log2((double) depth_ave.size())));
    	n_dft = depth_ave.size();
    	init_flag = true;
        } else {
          if (n_dft < depth_ave.size()) n_dft *= 2;
        if (init_flag) {
          auto Dmin = *min_element(depth_ave.end()-n_dft,depth_ave.end());
          auto Dmax = *max_element(depth_ave.end()-n_dft,depth_ave.end());
          auto Dmid = (Dmin + Dmax) / 2.0;
          for (int i=0; i<n_dft; i++) {
    	result[i] = depth_ave[depth_ave.size()-n_dft+i] - Dmid;
          drawMsg(dImg,result,t - vtime[vtime.size()-n_dft]);
        if (depth_ave.size() > 10 * n_dft) {
          depth.erase(depth.begin(), depth.end()-n_dft*2);
          depth_ave.erase(depth_ave.begin(), depth_ave.end()-n_dft*2);
          vtime.erase(vtime.begin(), vtime.end()-n_dft*2);
        auto key = cv::waitKey(1);
        if (key == 'q') break;
    int main(int argc, char** argv) {
      try {
      } catch (exception &ex) {
        cout << ex.what() << endl;
        string s;
        cin >> s;
      return 0;
  4. When you run the program, RGB images and moving average graph of depth are displayed. After 30 seconds, the result of DFFT is displayed. Exit with 'q' key.
  5. [Caution] Run this program in Debug mode in Visual Studio 2017. In Release mode, for some reason, it may crash during the process. Programs in Debug mode must link opencv_world330d.lib as an OpenCV library.

  6. This topic is intended to illustrate the application of Kinect v2's depth sensor which can acquire body data of small changes like breathing. It is not a recommendation to calculate the breathing cycle with Fourier transformation.
  7. To tell the truth, DFFT is not appropriate methods to compute breath period. The reason is as follows.

    Looking at the execution example, since $N_s = 165$ pieces of measurement data are obtained at $T = 30$ seconds in this example, the sampling frequency is $\displaystyle f_s = \frac{N_s}{T} = \frac{165}{30} = 5.5 Hz $ , that is, sampling is performed 5.5 times per second. I ran it on the MacBook Pro, but it's pretty slow. The decomposition ability of the sampleling period is $\displaystyle \Delta f = \frac{1}{T} = \frac{1}{30} = 0.0333\cdots $. When the discrete Fourier transform is performed on this data, it is decomposed into the waves of frequencies of $\Delta f$, $2 \Delta f$, $\cdots$, $\displaystyle \frac{N_s}{2} \Delta f$, that is, $\displaystyle \frac{1}{30}, \frac{2}{30}, \frac{3}{30}, \frac{4}{30}, \cdots$ Hz and the period of the reciprocal of the frequency, so $\displaystyle 30, 15, 7.5, 3.25, 1.125, 0.5625, \cdots$ seconds. If the breath is a cycle of about 2 seconds, the waves around here are very sparse, so sampling for a longer period is necessary to give meaningful values.

    Therefore, it is considered that another method other than FFT is appropriate for calculating the breath cycle period. I will not discuss here which method is appropriate to calculate breath cycle period.

  8. Please click here for this sample project KinectV2_breath.zip
  9. Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.