NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to detect talking skeleton with Kinect V2

2016.07.18: created by
Japanese English
To Table of Contents

Prerequisite knowledge

Detecting Talking Skeleton

When calling setSkeleton() function, you can know the position of each skeleton. When calling setAudio(true) function after setSkeleton(), you can identify the speaker by using the voice direction and the skeleton position.

The skeletonTrackingId fo the skeleton identified as a speaker is held in the "audioTrackingId" member variable. If the speaker is unknown, it will be negative. If the audiotTrackingId is equal to the skeletonTrackingId[index ], you can see that the skeleton[index ] is speaking.

NtKinect's functions for skeleton

type of return value function name descriptions
void setSkeleton() Recognize skeleton and set the data to the following member variables.
typevariable namedescriptions
vector<vector<Joint>> skeletonskeleton information
vector<int> skeletonIdbodyIndex of skeleton
vector<UINT64> skeletonTrackingIdtrackingId of skeleton

NtKinect's member variables for skeleton

type variable name descriptions
vector<vector<Joint>> skeleton Vector of skeleton information.
The set of a human joints is vector<Joint> , and this is the skeleton information. In order to handle multiple people, it is a vector of skeleton information, that is, vector<vector<Joint>> .
The coordinates of the joint are positions in the CameraSpace coordinate system.
  • number of recognized people --- sekeleton.size()
  • skeleton information of index -th human --- sekeleton[index ]
  • number of joint information of index -th human --- sekeleton[index ].size()
  • jointType joint information of index -th human --- sekeleton[index ][jointType ]
  • coordinates of the jointType joint of index -th human --- sekeleton[index ][jointType ].Position
  • tracking state of jointType joint of index -th human --- sekeleton[index ][jointType ].TrackingState
Since the relation
        skelton[index ][jointType ].JointType == jointType
holds, you can directly access the information of a specified joint of a specified skeleton.
vector<int> skeletonId A vector of bodyIndex corresponding to the skeletons.
The bodyIndex of skeleton[index ] is held at skeletonId[index ].
vector<UINT64> skeletonTrackingId A vector of trackingId corresponding to the skeletons.
The trackingId fo skeleton[index ] is held at skeletonTrackingId[index ].

NtKinect's function for Audio

type of return value function name descriptions
void setAudio(bool flag = false)

Acquire audio and save it in a file during recording
Determines the direction of the speech and sets it to the variable "beamAngle".

When Calling this function with the argument set to "true" after setSkeleton() function call, the speaker's skeletonTrackingId[i] is set to the member variable "audioTrackingId".
void drawAudioDirection(cv::Mat& image ) Draw the direction of audio beam on image.
bool isOpenedAudio() Returns whether or not recording is in progress.
void openAudio(string path ) Open path as a recording file
void closeAudio() Close the recording file.

NtKinect's member variable for Audio

type variable name descriptions
float beamAngle Direction of audio (angle to the left and right)
float beamAngleConfidence Confidence value of beamAngle (0.0 ... 1.0)
UINT64 audioTackingId Speaker's skeletonTrackingId

How to write program

  1. Start using the Visual Studio's project KinectV2_audio2.zip of " NtKinect: How to get audio beam direction with Kinect V2 " .
  2. WaveFile.h should already have been added to the project.

  3. Change the contents of main.cpp as follows.
  4. Define USE_AUDIO constant before including NtKinect.h.

    When calling setAudio(true) function after setSkeleton(), audio data is acquired by the array microphone and the direction is calculated, and the "skeletonTrackingId" of the skeleton whose direction matches is set to "audioTrackingId" variable.

    #include <iostream>
    #include <sstream>
    #define USE_AUDIO
    #include "NtKinect.h"
    using namespace std;
    void doJob() {
      NtKinect kinect;
      cv::Mat beam;
      while (1) {
        for (int i=0; i<kinect.skeleton.size(); i++) {
          int w = 10;
          cv::Scalar color = cv::Scalar(0,0,255);
          if (kinect.audioTrackingId == kinect.skeletonTrackingId[i]) {
            w = 20;
            color = cv::Scalar(0,255,0);
          for (auto joint: kinect.skeleton[i]) {
            if (joint.TrackingState == TrackingState_NotTracked) continue;
            ColorSpacePoint cp;
            cv::rectangle(kinect.rgbImage,cv::Rect((int)cp.X-w/2,(int)cp.Y-w/2,w,w),color, 2);
        cv::imshow("rgb", kinect.rgbImage);
        cv::imshow("beam", beam);
        auto key = cv::waitKey(1);
        if (key == 'q') break;
    int main(int argc, char** argv) {
      try {
      } catch (exception &ex) {
        cout << ex.what() << endl;
        string s;
        cin >> s;
      return 0;
  5. When you run the program, RGB images are displayed. Exit with 'q' key.
  6. Joints are drawn in red rectangles, but the spkeaker's joints are drawn in green rectangles.

    It is only for confirmation that the audio direction is displayed in another window, and the code of this part may be omitted.

  7. Please click here for this sample project KinectV2_audio3.zip
  8. Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.