NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to detect talking skeleton with Kinect V2

2016.07.18: created by

To Table of Contents

Prerequisite knowledge

Detecting Talking Skeleton

When calling setSkeleton() function, you can know the position of each skeleton. When calling setAudio(true) function after setSkeleton(), you can identify the speaker by using the voice direction and the skeleton position.

The skeletonTrackingId fo the skeleton identified as a speaker is held in the "audioTrackingId" member variable. If the speaker is unknown, it will be negative. If the audiotTrackingId is equal to the skeletonTrackingId[index ], you can see that the skeleton[index ] is speaking.

NtKinect

NtKinect's functions for skeleton

type of return value

function name

descriptions

void

setSkeleton()

Recognize skeleton and set the data to the following member variables.

type	variable name	descriptions
vector<vector<Joint>>	skeleton	skeleton information
vector<int>	skeletonId	bodyIndex of skeleton
vector<UINT64>	skeletonTrackingId	trackingId of skeleton

NtKinect

NtKinect's member variables for skeleton

type	variable name	descriptions
vector<vector<Joint>>	skeleton	Vector of skeleton information. The set of a human joints is vector<Joint> , and this is the skeleton information. In order to handle multiple people, it is a vector of skeleton information, that is, vector<vector<Joint>> . The coordinates of the joint are positions in the CameraSpace coordinate system. number of recognized people --- sekeleton.size() skeleton information of index -th human --- sekeleton[index ] number of joint information of index -th human --- sekeleton[index ].size() jointType joint information of index -th human --- sekeleton[index ][jointType ] coordinates of the jointType joint of index -th human --- sekeleton[index ][jointType ].Position tracking state of jointType joint of index -th human --- sekeleton[index ][jointType ].TrackingState Since the relation skelton[index ][jointType ].JointType == jointType holds, you can directly access the information of a specified joint of a specified skeleton.
vector<int>	skeletonId	A vector of bodyIndex corresponding to the skeletons. The bodyIndex of skeleton[index ] is held at skeletonId[index ].
vector<UINT64>	skeletonTrackingId	A vector of trackingId corresponding to the skeletons. The trackingId fo skeleton[index ] is held at skeletonTrackingId[index ].

NtKinect

NtKinect's function for Audio

type of return value	function name	descriptions
void	setAudio(bool flag = false)	Acquire audio and save it in a file during recording Determines the direction of the speech and sets it to the variable "beamAngle". Argument flag When Calling this function with the argument set to "true" after setSkeleton() function call, the speaker's skeletonTrackingId[i] is set to the member variable "audioTrackingId".
void	drawAudioDirection(cv::Mat& image )	Draw the direction of audio beam on image.
bool	isOpenedAudio()	Returns whether or not recording is in progress.
void	openAudio(string path )	Open path as a recording file
void	closeAudio()	Close the recording file.

NtKinect

NtKinect's member variable for Audio

type	variable name	descriptions
float	beamAngle	Direction of audio (angle to the left and right)
float	beamAngleConfidence	Confidence value of beamAngle (0.0 ... 1.0)
UINT64	audioTackingId	Speaker's skeletonTrackingId

How to write program

Start using the Visual Studio's project KinectV2_audio2.zip of " NtKinect: How to get audio beam direction with Kinect V2 " .

WaveFile.h should already have been added to the project.

Change the contents of main.cpp as follows.

Define USE_AUDIO constant before including NtKinect.h.

When calling setAudio(true) function after setSkeleton(), audio data is acquired by the array microphone and the direction is calculated, and the "skeletonTrackingId" of the skeleton whose direction matches is set to "audioTrackingId" variable.

main.cpp

#include <iostream>
#include <sstream>

#define USE_AUDIO
#include "NtKinect.h"

using namespace std;

void doJob() {
  NtKinect kinect;
  cv::Mat beam;
  while (1) {
    kinect.setRGB();
    kinect.setSkeleton();
    kinect.setAudio(true);
    for (int i=0; i<kinect.skeleton.size(); i++) {
      int w = 10;
      cv::Scalar color = cv::Scalar(0,0,255);
      if (kinect.audioTrackingId == kinect.skeletonTrackingId[i]) {
        w = 20;
        color = cv::Scalar(0,255,0);
      }
      for (auto joint: kinect.skeleton[i]) {
        if (joint.TrackingState == TrackingState_NotTracked) continue;
        ColorSpacePoint cp;
        kinect.coordinateMapper->MapCameraPointToColorSpace(joint.Position,&cp);
        cv::rectangle(kinect.rgbImage,cv::Rect((int)cp.X-w/2,(int)cp.Y-w/2,w,w),color, 2);
      }
    }
    cv::imshow("rgb", kinect.rgbImage);
    kinect.drawAudioDirection(beam);
    cv::imshow("beam", beam);
    auto key = cv::waitKey(1);
    if (key == 'q') break;
  }
  cv::destroyAllWindows();
}

int main(int argc, char** argv) {
  try {
    doJob();
  } catch (exception &ex) {
    cout << ex.what() << endl;
    string s;
    cin >> s;
  }
  return 0;
}

When you run the program, RGB images are displayed. Exit with 'q' key.

Joints are drawn in red rectangles, but the spkeaker's joints are drawn in green rectangles.

It is only for confirmation that the audio direction is displayed in another window, and the code of this part may be omitted.

Please click here for this sample project KinectV2_audio3.zip。

Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.

http://nw.tsuda.ac.jp/