NtKinect: Kinect V2 C++ Programming with OpenCV on Windows10

iOSプログラミング with Swift 2

2016.06.21: created by

SwiftでCIDetectorを利用して顔+笑顔認識する(AVFoundationの例)

「SwiftでCamera を使う(AVFoundation 経由)」と CIDetector を組み合わせて、リアルタイムで顔認識をする Swift のプログラムを作成してみましょう。

Xcode を起動して "Create a new Xcode project" で "Single View Application" として新しいプロジェクトを開きます。ここではプロジェクト名を SwiftAVFCIDetector としています。

Main.storyboard のViewController 上に Button を3個と Image Viewを配置します。 Buttonの表記はそれぞれ "Start", "Stop", "Detect" に変えておきます。

Main.storyboard上を3個のButtonをそれぞれViewController.swift中に Action (Touch Up Inside) で, Image View はOutlet で connect します。

"Start" Button --> tapStart 関数
"Stop" Button --> tapStop 関数
"Detect" Button --> tapDetect 関数
Image View --> myImageView 変数

ViewController.swift を変更します。

青字部分が Main.storyboard の操作で追加した IBOutlet 変数と IBAction 関数に関するコードで、赤字部分が AVFoundation でビデオのフレームを取得しているコードで、緑字部分が CIDetector で顔を認識しているコードです。

顔認識の部分では

        let ciImage:CIImage! = CIImage(image: image)
        let features = detector.featuresInImage(ciImage)

        let ciImage:CIImage! = CIImage(image: image)
        let options = [CIDetectorSmile:true]
        let features = detector.featuresInImage(ciImage, options:options)

また、UIImgeでは左上が原点となりますが、顔認識で得られた領域を表す bounds 変数(CGRect)では左下が原点となっているようで、コード内でy軸方向の座標を変換しています。

            var rect:CGRect = feature.bounds;
            rect.origin.y = image.size.height - rect.origin.y - rect.height;

ViewController.swiftに追加するコード(赤字と緑字部分)

import UIKit
import AVFoundation

class ViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate {
    
    var detector: CIDetector!
    
    var mySession: AVCaptureSession!
    var myCamera: AVCaptureDevice!
    var myVideoInput: AVCaptureDeviceInput!
    var myVideoOutput: AVCaptureVideoDataOutput!
    var detectFlag: Bool = false

    @IBOutlet weak var myImageView: UIImageView!
    @IBAction func tapStart(sender: AnyObject) {
        mySession.startRunning()
    }
    @IBAction func tapStop(sender: AnyObject) {
        mySession.stopRunning()
    }
    @IBAction func tapDetect(sender: AnyObject) {
        detectFlag = !detectFlag
    }
    
    func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!) {
        print("captureOutput:didOutputSampleBuffer:fromConnection)")
        if connection.supportsVideoOrientation {
            connection.videoOrientation = AVCaptureVideoOrientation.Portrait
        }
        dispatch_async(dispatch_get_main_queue(), {
            let image = self.imageFromSampleBuffer(sampleBuffer)
            if self.detectFlag {
                self.myImageView.image = self.detectFace(image)
            } else {
                self.myImageView.image = image
            }
        })
    }
    
    func imageFromSampleBuffer(sampleBuffer: CMSampleBufferRef) -> UIImage {
        let imageBuffer: CVImageBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
        CVPixelBufferLockBaseAddress(imageBuffer, 0)   // Lock Base Address
        let baseAddress = CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 0)  // Get Original Image Information
        let bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer)
        let width = CVPixelBufferGetWidth(imageBuffer)
        let height = CVPixelBufferGetHeight(imageBuffer)
        
        let colorSpace = CGColorSpaceCreateDeviceRGB()  // RGB ColorSpace
        let bitmapInfo = (CGBitmapInfo.ByteOrder32Little.rawValue | CGImageAlphaInfo.PremultipliedFirst.rawValue)
        let context = CGBitmapContextCreate(baseAddress, width, height, 8, bytesPerRow, colorSpace, bitmapInfo)
        let imageRef = CGBitmapContextCreateImage(context) // Create Quarts image
        
        CVPixelBufferUnlockBaseAddress(imageBuffer, 0)    // Unlock Base Address
        
        let resultImage: UIImage = UIImage(CGImage: imageRef!)
        
        return resultImage
    }
    
    func prepareVideo() {
        mySession = AVCaptureSession()
        mySession.sessionPreset = AVCaptureSessionPresetHigh
        let devices = AVCaptureDevice.devices()
        for device in devices {
            if (device.position == AVCaptureDevicePosition.Back) {
                myCamera = device as! AVCaptureDevice
            }
        }
        do {
            myVideoInput = try AVCaptureDeviceInput(device: myCamera)
            if (mySession.canAddInput(myVideoInput)) {
                mySession.addInput(myVideoInput)
            } else {
                print("cannot add input to session")
            }
            
            myVideoOutput = AVCaptureVideoDataOutput()
            myVideoOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey : Int(kCVPixelFormatType_32BGRA)]
            myVideoOutput.setSampleBufferDelegate(self,queue:dispatch_get_main_queue())
            myVideoOutput.alwaysDiscardsLateVideoFrames = true
            if (mySession.canAddOutput(myVideoOutput)) {
                mySession.addOutput(myVideoOutput)
            } else {
                print("cannot add output to session")
            }
            
            /* // preview background
             let myVideoLayer = AVCaptureVideoPreviewLayer(session: mySession)
             myVideoLayer.frame = view.bounds
             myVideoLayer.videoGravity = AVLayerVideoGravityResizeAspectFill
             view.layer.insertSublayer(myVideoLayer,atIndex:0)
             */
        } catch let error as NSError {
            print("cannot use camera \(error)")
        }
    }
    
    func detectFace(image: UIImage) -> UIImage {
        let ciImage:CIImage! = CIImage(image: image)
        let options = [CIDetectorSmile:true]
        let features = detector.featuresInImage(ciImage, options:options)
        UIGraphicsBeginImageContext(image.size);
        image.drawInRect(CGRectMake(0,0,image.size.width,image.size.height))
        let context: CGContextRef = UIGraphicsGetCurrentContext()!
        CGContextSetLineWidth(context, 5.0);
        for feature in features as! [CIFaceFeature] {
            if feature.hasSmile {
                CGContextSetRGBStrokeColor(context, 0.0, 1.0, 0.0, 1.0)
            } else {
                CGContextSetRGBStrokeColor(context, 1.0, 0.0, 0.0, 1.0)
            }
            var rect:CGRect = feature.bounds;
            rect.origin.y = image.size.height - rect.origin.y - rect.height;
            CGContextAddRect(context, rect);
        }
        CGContextStrokePath(context)
        let img = UIGraphicsGetImageFromCurrentImageContext()
        UIGraphicsEndImageContext()
        
        return img;
    }
    
    override func viewDidLoad() {
        super.viewDidLoad()
        detector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: [CIDetectorAccuracy:CIDetectorAccuracyHigh])
        prepareVideo()
    }

    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
    }

}

実行すると Button が3個表示され、Image Viewは真っ白なままです。ここでStart Buttonを押すと Video 画像の表示が始まり、Detect Buttonで顔認識をするかどうか切り替えます。顔認識した領域は赤い枠で、笑顔認識した領域は緑の枠で囲まれて表示されます。 OpenCVとの比較ですが、顔認識を開始してもそんなに遅くならないようです。

-->

上記の例で認識する画像は http://www.ashinari.com/2012/03/04-358659.php?category=48 を利用しました。

サンプルのプロジェクトはこちら。(Xcode 7.3.1版)。
マルチ・スレッド動作について

iOSでは、処理をキューに登録することでマルチスレッド動作をプログラミングできるようにしています (GCD, Grand Central Dispatch)。 dispatch_sync や dispatch_async を用いると実行すべき処理をクロージャで記述できるので、マルチスレッド動作のプログラムがとても記述しやすくなっています。

上記の例では、「ビデオをフレーム毎に処理をするスレッドを登録するキュー」も、「画像をUIImageに変換してOpenCVの顔認識処理を行うスレッドを登録するキュー」も、同じメイン・キュー dispatch_get_main_queue() を使っています。 GUIを操作するスレッドはメイン・キュー上で動作する必要があるので後者は仕方がないのですが、「ビデオをフレーム毎に処理をするスレッド」は新たに作成した別のキューに登録するのが一般的のようです。ただし、その場合は、「画像をUIImageに変換してOpenCVの顔認識処理を行うスレッド」の呼び出しを dispatch_async ではなくdispatch_sync で行わないと、顔認識の重い処理が間に合わずどんどん溜っていくのでやたらと実行が遅くなってしまいます。

以上の点を考慮してViewController.swift を書き直すとすると、次のような変更になるでしょう。私の環境では変更前でも十分に実用的な速度で動作しているので、前者を採用しています。が、どちらがよいかは各自で試してみて下さい。

ViewController.swiftの変更点 (diff -c の出力)

*** ViewController.swift.org	2016-06-21 21:19:38.000000000 +0900
--- ViewController.swift	2016-06-21 21:19:57.000000000 +0900
***************
*** 34,40 ****
          if connection.supportsVideoOrientation {
              connection.videoOrientation = AVCaptureVideoOrientation.Portrait
          }
!         dispatch_async(dispatch_get_main_queue(), {
              let image = self.imageFromSampleBuffer(sampleBuffer)
              if self.detectFlag {
                  self.myImageView.image = self.detectFace(image)
--- 34,40 ----
          if connection.supportsVideoOrientation {
              connection.videoOrientation = AVCaptureVideoOrientation.Portrait
          }
!         dispatch_sync(dispatch_get_main_queue(), {
              let image = self.imageFromSampleBuffer(sampleBuffer)
              if self.detectFlag {
                  self.myImageView.image = self.detectFace(image)
***************
*** 83,89 ****
              
              myVideoOutput = AVCaptureVideoDataOutput()
              myVideoOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey : Int(kCVPixelFormatType_32BGRA)]
!             myVideoOutput.setSampleBufferDelegate(self,queue:dispatch_get_main_queue())
              myVideoOutput.alwaysDiscardsLateVideoFrames = true
              if (mySession.canAddOutput(myVideoOutput)) {
                  mySession.addOutput(myVideoOutput)
--- 83,89 ----
              
              myVideoOutput = AVCaptureVideoDataOutput()
              myVideoOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey : Int(kCVPixelFormatType_32BGRA)]
!             myVideoOutput.setSampleBufferDelegate(self,queue:dispatch_queue_create("myqueue",nil))
              myVideoOutput.alwaysDiscardsLateVideoFrames = true
              if (mySession.canAddOutput(myVideoOutput)) {
                  mySession.addOutput(myVideoOutput)

さらに目の位置も認識する

左目と右目をそれぞれ認識する例を示します。

ViewController.swift を変更します。

ViewController.swiftの変更点 (diff -c の出力)

*** ViewController.org.swift	2016-07-06 15:48:54.000000000 +0900
--- ViewController.swift	2016-07-06 15:32:20.000000000 +0900
***************
*** 122,127 ****
--- 122,141 ----
              rect.origin.y = image.size.height - rect.origin.y - rect.height;
              CGContextAddRect(context, rect);
              CGContextStrokePath(context)
+             if (feature.hasLeftEyePosition) {
+                 var pos = feature.leftEyePosition
+                 pos.y = image.size.height - pos.y
+                  CGContextSetRGBStrokeColor(context, 1.0, 0.0, 1.0, 1.0)
+                 CGContextAddRect(context,CGRectMake(pos.x - 5, pos.y - 5, 10, 10))
+                 CGContextStrokePath(context)
+             }
+             if (feature.hasRightEyePosition) {
+                 var pos = feature.rightEyePosition
+                 pos.y = image.size.height - pos.y
+                 CGContextSetRGBStrokeColor(context, 1.0, 0.0, 1.0, 1.0)
+                 CGContextAddRect(context,CGRectMake(pos.x - 5, pos.y - 5, 10, 10))
+                 CGContextStrokePath(context)
+             }
          }
          let img = UIGraphicsGetImageFromCurrentImageContext()
          UIGraphicsEndImageContext()

実行して、Start Buttonを押すと Video 画像の表示が始まり、Detect Buttonで顔認識をするかどうか切り替えます。顔認識した領域は赤い枠で、笑顔認識した領域は緑の枠で囲まれて表示されますが、さらに目がマゼンタ色の枠で表示されています。

上記の例で認識する画像は、素材フリーサイトの https://www.pakutaso.com/person/woman/ を利用しました。

サンプルのプロジェクトはこちら。(Xcode 7.3.1版)。

http://nw.tsuda.ac.jp