Introduction
OpenVINO™ - Open Visual Inference and Neural Network Optimization - is an Intel®-distributed toolkit targeting the rapid development of applications and solutions that emulate human vision. This toolkit can be used across domains (e.g. retail, industry) and for many uses. Best known for computer-vision-based detection of faces, people, and body postures, OpenVINO can be applied to any other real-time data processing exercise relying on deep learning models.
The Intuiface "Face Detection with OpenVINO" Interface Asset focuses on face detection derived from this Intel demo, enabling you to create age range, gender, head pose, dwell time, and emotion-driven content for your interactive experiences. It requires using the OpenVINO Face Detection Server, which is responsible for receiving and processing camera-based information. See details in the architecture below. The camera can be of any make or model as long as it is recognized by the Windows PC to which it is attached.
All facial data is collected anonymously, with no information that can be used to identify particular people.
On Github, you will find the source code and releases for both the Face Detection Server and the interface asset. With the source code, you can make enhancements and changes to suit your personal needs. This article discusses out-of-the-box capability.
We have also published this OpenVINO sample experience to the Intuiface Examples catalog.
NOTE: The Face Detection with OpenVINO Interface Asset was coded in .NET and thus only available for use in Player on Windows. However, since all code is open source, nothing prevents an Intuiface user from creating a TypeScript version of the interface asset and using it with the Face Detection Server. The current version of the Face Detection Server would still have to run on a Windows PC, although the code could be compiled to run on a Linux machine as well.
Trying It Out
To try out the face detection feature, follow these steps
- Download the OpenVINOFaceDetectionServer.zip archive from the Github releases page.
- Unzip it and run OpenVINOFaceDetectionServer.exe, after optionally editing the OpenVINOFaceDetectionServer.config configuration file depending on your hardware configuration.
- If your PC has an Intel GPU, set the "d" parameter value to "GPU". Else leave the default "CPU" value.
- For test purposes, we suggest setting the "no_show" parameter value to "false", especially to make sure your camera is correctly set.
- Download this OpenVINO sample experience in the Intuiface Examples catalog in your Composer or Player
- To avoid issues with float numbers used by converters in the experience, just for this sample, set your PC regional settings to English-US
- Play the sample experience.
Uses & Videos
Overview Video with Demos
Intuiface cohosted a webinar with Intel in which we conducted a deep-dive on the face detection feature and its use of the OpenVINO toolkit. Included is a detailed look at the integration.
Face Detection Uses
The Face Detection with OpenVINO Interface Asset can be used in various ways. Here are two principal uses.
Demographics-enhanced analytics
Computer vision-based interaction
Hardware prerequisites
Although you could use any Windows PC to run both the Face Detection Server and an experience using the interface asset, OpenVINO is optimized for use with both an Intel® CPU and GPU. Do not try to use the face detection server with a dedicated GPU (NVidia or ATI).
See Intel OpenVINO system requirements here.
Minimum: Windows PC with at least an 8th gen i5
Recommended: Windows PC with at least 10th gen i7
The Intuiface team validated good performance for this feature using:
- Intel NUC with i5-8259U and Intel Iris Plus Graphics 655
- Intel NUC with i7-10710U and Intel UHD Graphics 620
- Intel NUC with i7-8565U, no integrated graphics and a RadeonT 540X GPU.
- Server was running in CPU mode only.
- Laptop with i7-8750H and both Intel UHD Graphics 620 and NVIDIA GeForce GTX 1050 Ti.
- Server was using only the Intel GPU.
For PCs with non-Intel GPUs:
- If your PC has both an onboard Intel GPU and a dedicated GPU, make sure the Face Detection Server uses the Intel GPU. Intuiface Player can use either.
- If your PC only has a third-party GPU, we recommend using only CPU optimization for the Face Detection Server. See the server usage guide below.
Use any camera or webcam detected by a Windows PC. Intuiface tests were made using:
- Laptop webcams
- Logitech C920
- Logitech BCC950 (conference cam)
Architecture
Intuiface's OpenVINO-based face detection feature relies on two components:
- Face Detection Server: This process reads a live video feed from a webcam or camera, applies multiple deep learning models to extract face features, and exports this information using web sockets.
Installation of the Face Detection Server is as follows:- Download the OpenVINOFaceDetectionServer.zip archive from the Github releases page.
- Unzip it and run OpenVINOFaceDetectionServer.exe, after optionally editing the OpenVINOFaceDetectionServer.config configuration file depending on your hardware configuration.
- If your PC has an Intel GPU, set the "d" parameter value to "GPU". Else leave the default "CPU" value.
- For test purposes, we suggest setting the "no_show" parameter value to "false", especially to make sure your camera is correctly set.
- Face Detection with OpenVINO Interface Asset: This .NET Interface Asset listens for web socket data, enabling it to retrieve information from the face detection server. Its properties, triggers & actions enable your experiences to use this information.
This interface asset is shipped with Composer and available anytime via the Interface Assets panel.
This architecture is flexible and enables multiple scenarios thanks to web socket communication.
Simple Scenario: One Windows device
The same Windows PC can run both the face detection server and Intuiface Player, running an experience that contains the interface asset.
Recommendation: Use a powerful PC with the latest generation of Intel® CPU to make sure both processes run fluidly.
Advanced Scenario: Separate, dedicated PC for the face detection server
An alternative to the previous solution is to split the server workload and the Intuiface experience between two different PCs. Make sure they are connected to the same local network.
More Complex Scenarios: Multiple cameras and/or experiences
The OpenVINO + Intuirace architecture enables you to address more complex scenarios such as:
- N cameras to 1 experience: Merge inputs from multiple cameras into one experience, such as logging traffic information in a public area or using multiple camera angles to detect user activity
- 1 camera to N experiences: Multiple kiosk / signage screens reacting to the same camera feed, such as a camera positioned at the ceiling of a room and tracking multiple people.
- N cameras to M experiences: Any combination of the above
Face Detection Server: Usage Guide
The Face Detection Server is a modified version of the one used in this Intel OpenVINO demo. We recommend you read that demo's documentation as it explains in detail the deep learning pre-trained models used.
Here is a summary:
- face-detection-adas-0001, the primary detection framework for finding faces
- age-gender-recognition-retail-0013, executed on top of the results of the face detection model, reporting estimated age and gender for each detected face, limited to people between the ages of 18 and 75
- head-pose-estimation-adas-0001, executed on top of the results of the face detection model, reporting estimated head pose in Tait-Bryan angles
- emotions-recognition-retail-0003, executed on top of the results of the face detection model, reporting an emotion for each detected face
Intuiface decided not to use the facial landmarks model as there is no clear way to use them efficiently within an experience.
Command line options and configuration file
The Face Detection Server accepts many options as command line arguments. If your PC doesn't have an Intel-embedded GPU, you should remove the -d GPU
option.
NOTE: Running the OpenVINOFaceDetectionServer.exe -h
command will provide you with an exhaustive list of options.
For Intuiface use, the required and/or important options are:
-i cam
: video feed input. "cam" means the first detected camera will be used, as opposed to a video file path (used for test purposes).-m .\models\face-detection-adas-0001.xml
: Required. Path to an .xml file with a trained Face Detection model.-m_ag .\models\age-gender-recognition-retail-0013.xml
: Optional. Path to an .xml file with a trained Age/Gender Recognition model.-m_em .\models\emotions-recognition-retail-0003.xml
: Optional. Path to an .xml file with a trained Emotions Recognition model.-m_hp .\models\head-pose-estimation-adas-0001.xml
: Optional. Path to an .xml file with a trained Head Pose Estimation model.-d GPU
: Optional. Target device for Face Detection network (the list of available devices is shown below). Default value is CPU.-d no_show
: if this parameter is added, the OpenCV visualization window is not displayed. This improves performance.
If you do not need the age/gender, emotions, or head-pose information for your specific experience, you can remove the corresponding option(s) in the command line. This will lighten the CPU usage of the server.
To avoid going through a command line, the executable also tries to read a configuration file "OpenVINOFaceDetectionServer.config" which in JSON format:
{
"m": ".\\models\\face-detection-adas-0001.xml",
"m_ag": ".\\models\\age-gender-recognition-retail-0013.xml",
"m_em": ".\\models\\emotions-recognition-retail-0003.xml",
"m_hp": ".\\models\\head-pose-estimation-adas-0001.xml",
"d": "CPU",
"no_show": "true"
}
The command-line parameters of the executable have priority over the configuration file.
Face Detection with OpenVINO Interface Asset
After adding the interface asset to your experience, you need to call the Connect action to start receiving information in Play Mode. An easy way to do this is to add a Timer trigger at the experience level that calls Connect action one second after the experience launches.
Explanation of property units
Most numeric properties of this Interface Asset are normalized values, meaning they are within the range of 0 to 1. This applies to the following properties:
- Face position: X, Y
- Face size: Width, Height
- Emotion confidence
Example:
A face with a width of 0.1 and a height of 0.15 means its width represents 10% of the picture frame, while its height represents 15% of the picture frame. Depending on the original camera image size in pixels, this face would have a size of
- 64 x 72 pixels in a 640 x 480 webcam image size
- 192 x 162 pixels in a 1920 x 1080 webcam image size.
For the onscreen display of a visual representation of a detected face, multiply the relevant properties by the associated size of the camera image
- X & Width should be multiplied by the camera image width.
- Y & Height should be multiplied by the camera image height.
All Face size-related properties are computed using the following formula: width x height x 10000, producing ranges from 0 to 10000
Properties
Read-Write properties
- Face detection server host: IP address of the Face Detection Server.
- Face detection server port: port of the Face Detection Server.
- Minimum face size: any face detected with a size below this minimum value will be discarded. Use this property to filter "small faces", meaning faces far from the camera, into the background. Face size is defined by width x height x 10000. Default value is 100, corresponding to a face size of 0.1 x 0.1 (= 10% of the camera frame).
- Detection update frequency: used to reduce the framerate if needed. The default value is 100 ms, corresponding to 10 fps.
Read-Only properties
- Face count: number of faces currently detected.
- Main Face: face with the largest face size. Usually corresponds to the closest person to the camera, for persons with similar physical face sizes.
- Contains child properties identical to the "All Faces" list found below.
- Is main face detected: true if at least one face is detected.
- All Faces: list of all faces currently detected
- ID: unique ID for each detected face
- X: X position of the face, between 0 and 1. Multiply this value by the scene width in pixels to position face feedback in your scene.
- Y: Y position of the face, between 0 and 1. Multiply this value by the scene height in pixels to position face feedback in your scene.
- Width: Width of the face, between 0 and 1. Multiply this value by the scene width in pixels to draw an accurately sized face in your scene.
- Height: Height of the face, between 0 and 1. Multiply this value by the scene height in pixels to draw an accurately sized face in your scene.
- Gender: "male" or "female" value.
- Age: absolute value of age estimate
- Age range: defined base on Age property.
- Age below or equal to 16: "child"
- Age between 17 and 30: "young adult"
- Age between 31 and 45: "middle-aged adult"
- Age above 46: "old-aged adult"
- Dwell time: Time elapsed in seconds since face was first detected.
- Face size: defined by Width x Height x 10000. A value of 100 represents 10% of the camera frame. a value of 10000 represents 100%.
- Main emotion: emotion with the highest confidence value, selected among "angry", "happy", "neutral", "sad", "surprise".
- Main emotion confidence: confidence value for the main emotion, between 0 and 1.
- Emotion confidence: values for each emotion, between 0 and 1.
- Angry
- Happy
- Neutral
- Sad
- Surprised
- Head pose: estimation of the head pose on 3 axes. Values are in degrees.
- Pitch
- Yaw
- Roll
- Is connected to face detection server: true when the interface asset manages to connect to a server.
- Activity log: connection logs.
Triggers
- Face count changed: Raised when the face count has changed. It contains the following read-only property, available through binding:
- Count
- Face detected: Raised when a new face is detected. It contains the following read-only properties, available through binding:
- ID
- Gender
- Age
- Face lost: Raised when a new face is lost. It contains the following read-only properties, available through binding:
- ID
- Gender
- Age
- Dwell time
Actions
- Connect to face detection server
- Disconnect from face detection server
FAQ / Troubleshooting
Below are some frequently asked questions / issues you may encounter. If you have any other issues, please contact our support team.
- Can I display the camera feed in the experience?
- The camera feed is read by the Face detection Server, and therefore is not available to the Webcam Asset running in Intuiface. You may want to use third-party software to split the webcam feed across two virtual cameras, but this hasn't been tested. Video framerate could also drop with such a solution.
- Can I use other deep-learning models?
- The command-line arguments for the Face Detection Server enable you to use other pre-trained models, although they would need to use the same model structure (input / output) for the Face Detection Server to work properly.
- To use models with a different structure, you would have to update the server source code accordingly.
- Can I recognize things other than faces, such as objects?
- OpenVINO can be used with a range of models, but the Server source code will need to be updated accordingly to support your models. See OpenVINO documentation for more information.
Limitations
- The face detection server will stop working if the computer has entered or returned from sleep mode.
Comments
0 comments
Article is closed for comments.