NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

How to get and record audio with Kinect V2

2016.07.18: created by
Japanese English
To Table of Contents

Prerequisite knowledge

Getting and Recording Audio

If you define USE_AUDIO constant before including NtKinect.h, the functions and variables of NtKinect related to audio become effective.

If you call the setAudio() function, Audio data is acquired, and saved to the file if recording is in progress. To end recording, call closeAudioFile() function.

NtKinect's function for Audio

type of return value function name descriptions
void setAudio(bool flag = false)

Acquire audio and save it in a file during recording
Determines the direction of the speech and sets it to the variable "beamAngle".

When Calling this function with the argument set to "true" after setSkeleton() function call, the speaker's skeletonTrackingId[i] is set to the member variable "audioTrackingId".
void drawAudioDirection(cv::Mat& image ) Draw the direction of audio beam on image.
bool isOpenedAudio() Returns whether or not recording is in progress.
void openAudio(string path ) Open path as a recording file
void closeAudio() Close the recording file.

NtKinect's member variable for Audio

type variable name descriptions
float beamAngle Direction of audio (angle to the left and right)
float beamAngleConfidence Confidence value of beamAngle (0.0 ... 1.0)
UINT64 audioTackingId Speaker's skeletonTrackingId

How to write program

  1. Start using the Visual Studio's project KinectV2.zip of "NtKinect: How to get RGB camera image with Kinect V2 (Fundamental Settings)"
  2. Add WaveFile.h to the project.
  3. Download a little modified WaveFile.h of "AudioCaptureRaw-Console C++ Sample" by Microsoft, and place it in the folder where other souce files (such as main.cpp) are located. Then, add it to the project.

  4. Change the contents of main.cpp as follows.
  5. Define USE_AUDIO constant before including NtKinect.h.

    Call the openAudioFile() function to start recording, and closeAudioFile() to end recording. Audio is acquired with setAudio() function, and audio is automatically saved in the file during recording.

    In the example program below, we create a WAV file whose name is current time like "2016-07-18_09-16-32.wav".

    #include <iostream>
    #include <sstream>
    #define USE_AUDIO
    #include "NtKinect.h"
    using namespace std;
    #include <time.h>
    string now() {
      char s[1024];
      time_t t = time(NULL);
      struct tm lnow;
      localtime_s(&lnow, &t);
      sprintf_s(s, "%04d-%02d-%02d_%02d-%02d-%02d", lnow.tm_year + 1900, lnow.tm_mon + 1, lnow.tm_mday, 
    	    lnow.tm_hour, lnow.tm_min, lnow.tm_sec);
      return string(s);
    void doJob() {
      NtKinect kinect;
      bool flag = false;
      while (1) {
        if (flag) kinect.setAudio();
        cv::putText(kinect.rgbImage, flag ? "Recording" : "Stopped", cv::Point(50, 50),
    		cv::FONT_HERSHEY_SIMPLEX, 1.2, cv::Scalar(0, 0, 255), 1, CV_AA);
    // rename CV_AA as cv::LINE_AA (in case of opencv3 and later)
        cv::imshow("rgb", kinect.rgbImage);
        auto key = cv::waitKey(1);
        if (key == 'q') break;
        else if (key == 'r') flag = true;
        else if (key == 's') flag = false;
        if (flag && !kinect.isOpenedAudio()) kinect.openAudio(now() + ".wav");
        else if (!flag && kinect.isOpenedAudio()) kinect.closeAudio();
    int main(int argc, char** argv) {
      try {
      } catch (exception &ex) {
        cout << ex.what() << endl;
        string s;
        cin >> s;
      return 0;
  6. When you run the program, RGB images are displayed. Exit with 'q' key.
  7. Recording starts with 'r' key, and stops with 's' key. Recording status is displayed as "Recording" or "Stopped" at the upper left of the RGB image.

  8. Please click here for this sample project KinectV2_audio.zip
  9. Since the above zip file may not include the latest "NtKinect.h", Download the latest version from here and replace old one with it.