Face detection with OpenCL

March 04, 2012

I've been meaning to write about the topic of my thesis for quite some time, but didn't really get to it until now, so even though it's almost a year late, here we go.

Before I get into some technical details, here's a youtube video where you can see the OpenCL implementation of my detector in action:

Pretty neat, right? :) So what you just saw was an implementation of a detector based on the WaldBoost algorithm (a variant of AdaBoost) that had as its input a classifier trained for detecting frontal faces (and an awesome video of course) running on a GPU.

If you know anything about boosting algorithms, you'll know that one strong classifier is usually composed of lots of weak classifiers (which are usually very simple and computationally inexpensive functions) - in my case there are 1000 weak classifiers where each uses Local Binary Patterns to extract a feature from the input texture. Unfortunately such strong classifier is resolution dependent, and to be able to detect objects of various sizes in the input image, we need a pre-processing step.

During pre-processing we create a pyramid of images by gradually down-scaling the input (oh and we don't need colors, so we also convert it to greyscale). This way the detector can still detect only faces with resolution of 24x24, but using a mapping function we will know when it actually detected something in any of the downscaled versions of the image and there we have resolution independent detector. Interesting tidbit: it turned out that creating the pyramid texture by putting the downscaled images horizontally instead of vertically (which you can see on the image below) slightly improved performance of the detector - simply because the texture cache unit had higher hit ratio in such setup, but since the pyramid texture is approximately 3.6 times larger than the width of the original image, the detector wouldn't be able to process HD (1280x720) nor Full-HD (1920x1080) videos, because maximum texture size for OpenCL image is 4096 pixels (when using vertical layout though 1080 x 3.6 ~= 3900, so even Full-HD videos can be processed).

Left - original image, right - pyramid of downscaled images (real pyramid texture also has the original on top)

Once we have our pyramid image, it's divided into small blocks, which are processed by the GPU cores and each work item (or thread if you wish) in this block is evaluating the strong classifier at a particular window position of the pyramid image. Overall we'll evaluate every window position - think of every pixel. (in reality it's more complicated than that - the detector is using multiple kernels and each is evaluating only a part of the strong classifier - that's because WaldBoost can preliminary reject a window without evaluating all weak classifiers, so when a kernel finishes it just reduces the number of candidate windows and next kernel continues to evaluate only windows that survived the previous steps - this also ensures that we keep most of the work items in the work groups busy).

Once the detector finishes, we have a couple of window positions in the pyramid image and response value of the strong classifier in these windows, and these are sent back to the host. The CPU can then finish the detection (by simply thresholding the response values) and map the coordinates back to the input image. If you watched the video carefully you'd have noticed that there are multiple positive responses around a face, so this would be also a good place to do some post-processing and merge these. Plus there's a false detection from time to time, so again good place to get rid of them.

You're surely asking how does this compare to a pure CPU implementation and as you can imagine having to evaluate every window position in the pyramid image is very costly and even optimized SSE implementations can't get close to performance of a GPU (even though you need to copy a lot of data between the host and the GPU). So a simple graph to answer that (note the logarithmic scale):

Processed video frames per second (CPU: Core2 Duo E8200 @ 2.66GHz; GPU: GeForce GTX 285 - driver ver 270)

So why do I talk about all this on my free software related blog? Well of course I'm making the source available for anyone to play with it, optimize it further (there's still plenty of room for that) or do whatever you feel like doing with it. But I need to warn you first - the implementation is heavily optimized for nvidia's hardware and was never really tested on anything else (the AMD CPU implementation of OpenCL doesn't support images, the Intel CPU implementation does support images, but not the image formats I'm using, so that basically leaves only AMD GPU implementation, but I didn't have such hardware available). I'm also making assumptions that are true only on nvidia's hardware - like that there are 32 work items running at a given time (which is true for nvidia's warp). There are even some helper methods that allowed this to be run on hardware without local atomic operations (so even OpenCL 1.0 was enough), but I see now that I can no longer run it on my old GeForce 9300 with latest nvidia's driver (although it did work with version 270). So I don't even know if it works at all with the compiler in the latest driver... you've been warned.

Grab the code branch from Launchpad (bzr branch lp:~mhr3/+junk/ocl-detector), or get the tarball (the only dependencies are glib, opencv plus libOpenCL.so somewhere where the linker can find it). Run it with `./oclDetector -s CAM` (and if that doesn't seem to detect anything try `./oclDetector -r -20 -s CAM`).

Comments

fabiand4 March 2012 at 19:44
Nice to see OpenCL and nice work :)
ReplyDelete
Replies
Sandeep20 March 2012 at 21:28
Awesome work! Have you tried using this for streaming video e.g from a webcam? I'm wondering if this can be used for face detection and login..... at some point of time in the future.
ReplyDelete
Replies
Unknown12 July 2012 at 20:43
Nice work: robust and fast. It's time to replace the old (overused) opencv's adaboost classifiers.
ReplyDelete
Replies
Mauricio30 July 2012 at 01:18
Hey, compiled everything to Windows 7 with a GeForce GT 335 on my notebook.
It works but the output is a blank and hanged windows ...
Any ideas ?

Best !
ReplyDelete
Replies
Mohiuddin Rana29 August 2012 at 03:13
hi,
I am wondering, if you would be able to share the the code?
ReplyDelete
Replies
Anonymous27 September 2012 at 04:15
Hi,

I am a graduate student majoring in Computer Engineering. I was determining the topic of my thesis and I am interested in facial identification. Your work has inspired me a lot and I was wondering if you could send me a copy of your paper for reference? Thank you.

Yours sincerely

Xiaoyu
ReplyDelete
Replies
Unknown27 September 2012 at 21:02
Cool! Are you planning to also add face recog?
ReplyDelete
Replies
Casper17 October 2012 at 18:47
Hi,
I would like to use some of this for my thesis. Got it up and running with a few changes :) Do you have some documentation on it or on the different methods used? In whatever language. Maybe some references ?
Thanks
ReplyDelete
Replies
Casper30 October 2012 at 09:30
Hi. I have looked at the references and i think i understand the theory behind the waldboost algorithm. I have your code working and i would like to play a bit around with it. If you have the time i would be very happy if i could help me. Could you explain the algorithm of the program to me in english? How the actual searching in the pyramid image is being done and so on. What does the different kernels do (1,2,3 and 4)?
I could find out myself by inspecting the code but i want to spare some time and get a better understanding if you have time to explain these things :) My mail is cped08@student.aau.dk if the contact is easier there.
Hope you have a little spare time to explain some of the key things.
Best regards
ReplyDelete
Replies
Unknown14 November 2012 at 21:31
Hi, Would you be able to do contract work on OpenCL for us. We would like to build a people counter to ID and count individuals/people as they walk into a store. Thanks.
ReplyDelete
Replies
Unknown21 November 2012 at 14:31
Hi,
good job, post it as a library for sale on: www.myseyet.eu
ReplyDelete
Replies
Unknown29 December 2012 at 02:01
Hi,

I'm getting CL_INVALID_KERNEL_ARGS error on main.c:1150 when i'm trying to run.

What could be the problem?
ReplyDelete
Replies
Unknown12 March 2013 at 22:28
How can i generate a XML file file for detecting another object?
ReplyDelete
Replies
Unknown24 June 2014 at 22:52
Hi, What was your detection rate for a HD video? what was the FPS? wast it less than 15?
ReplyDelete
Replies
simone9125 December 2014 at 18:44
Hi , i have a problem to compile source code , i launch command and make and show this errors :

gcc -o oclDetector -std=c99 -I ./ `pkg-config --cflags opencv glib-2.0` main.c utils.c profiling.c xml-reader.c detector_data.h kernel_bin.h `pkg-config --libs opencv glib-2.0` -lOpenCL
clang: error: cannot specify -o when generating multiple output files
make: *** [oclDetector] Error 1
ReplyDelete
Replies
Unknown18 January 2015 at 22:32
Hi.
That's a wonderfull project.
but which is it works program language?
ReplyDelete
Replies
Unknown4 February 2015 at 06:32
If someone is interested in compiling on windows.
This is the content of my Makefile:

CC=gcc
# Tested with mingw32-make 3.82.90
PROG=oclDetector
CPU_PROG=cpuTester

SOURCES=main.c utils.c profiling.c xml-reader.c detector_data.h
KERNEL_FILE=reducing_kernel.cl
KERNEL_BIN=kernel_bin.h
CPU_SOURCES=test.c utils.c detector_data.h

#use gnu99 instead c99 in order to avoid "unknow type name off_64_t" error
FLAGS=-std=gnu99 -I ./
#don't bother with pkg-config, it does not work properly in windows
FLAGS+=-IC:/gtk/include/glib-2.0
FLAGS+=-IC:/gtk/lib/glib-2.0/include
FLAGS+=-IC:/opencv/build/include
FLAGS+=-IC:/opencv/build/include/opencv
#sdk for opencl
FLAGS+=-IC:/amd_sdk/include
EXTRA_FLAGS=

#Note: The libs of opencv and the program must be built with the same mingw version,
#or else you'll have this execution error: "Cannot find entry point of procedure __gxx_personality_v0 in dynamically linked library libstdc++-6.dll"
LIBS= C:/opencv/release/lib/libopencv_core248.dll.a
LIBS+= C:/opencv/release/lib/libopencv_highgui248.dll.a
LIBS+= C:/opencv/release/lib/libopencv_imgproc248.dll.a
#gtk for windows, http://www.gtk.org/download/win32.php download the "all-in-one" bundle
LIBS+=-LC:/gtk/lib -lglib-2.0 -lintl
#sdk for opencl
LIBS+=-LC:/amd_sdk/lib/x86 -lOpenCL

all: $(PROG)

$(PROG): $(SOURCES) $(KERNEL_BIN)
$(CC) -o $@ $(FLAGS) $(EXTRA_FLAGS) $^ $(LIBS)

cpu-test: $(CPU_PROG)

$(CPU_PROG): $(CPU_SOURCES)
$(CC) -o $@ -O3 $(FLAGS) $(EXTRA_FLAGS) $^

$(KERNEL_BIN): bin2c $(KERNEL_FILE)
./bin2c -z $(KERNEL_FILE) $@

bin2c: bin2c.c
$(CC) -o $@ $^

clean:
rm -f $(PROG) $(CPU_PROG)
ReplyDelete
Replies
Blogdelphimagic18 October 2015 at 11:05
The Best Delphi sample code I ever had for Face Detection.
GOD bless the good hearted Author.
http://delphimagic.blogspot.com.es/2011/08/reconocimiento-de-caras-con-delphi.html
ReplyDelete
Replies
Blogdelphimagic18 October 2015 at 11:05
The Best Delphi sample code I ever had for Face Detection.
GOD bless the good hearted Author.
http://delphimagic.blogspot.com.es/2011/08/reconocimiento-de-caras-con-delphi.html
ReplyDelete
Replies
Unknown17 January 2016 at 06:10
Hello.
I am interested in using Waldboost for human detection for surveillance applications (acutally intrusion detection). Do you think Waldboost works as pretty as the above video?
ReplyDelete
Replies
Unknown8 March 2016 at 17:47
Hi! I need to build custom application which looks close to what you did. Could you do it for me? Mike mike.sorochev@ya.ru
ReplyDelete
Replies
Unknown8 March 2016 at 17:47
Hi! I need to build custom application which looks close to what you did. Could you do it for me? Mike mike.sorochev@ya.ru
ReplyDelete
Replies

Add comment

~mhr3

Search This Blog

Face detection with OpenCL

Comments

Post a Comment