
Introduction to SVM
Face recognition is a frequently discussed topic in computer science and has gained significant attention due to the exponential growth of computing power. Face recognition has widespread applications in various fields such as robotics, biosecurity, and the automotive industry, involving the application of mathematical algorithms to input images to extract different features, indicating whether a face is present in the provided image. The Histogram of Oriented Gradients (HOG) is a traditional algorithm used to extract image features, such as pixel directions, and can be used with a Linear Support Vector Machine (SVM) to identify whether the input image is a face or not.

We will use the image below as a reference and test:

Image Processing
Convolution
The convolution of two functions is an important mathematical operation widely used in signal processing. In computer graphics and image processing, we typically use discrete functions (such as images) and apply a discrete form of convolution to eliminate high-frequency noise, sharpen details, or detect edges.
Convolution is a mathematical operation on two signals f and g, defined as:

In the image domain, we can think of convolution as the relationship between a single pixel and its neighboring pixels. This relationship is primarily applied to detect unique features such as color changes, brightness differences, and pixel periodicity.
The following diagram illustrates the convolution filter using a small 3 x 3 kernel. The filter is defined as a matrix where the center element weights the center pixel, and the other elements define the weights of the neighboring pixels. We can also say that the radius of a 3×3 kernel is 1, as only the “one-ring” neighborhood is considered during the convolution process. The behavior of convolution at the image boundary needs to be defined, where the kernel maps to undefined values outside the image.

The convolution operation using a 3 x 3 window and a 3 x 3 kernel can be defined as follows:
To perform convolution on the entire image, a sliding window technique can be applied. Starting from the first pixel, every 8 neighboring pixels are grouped into a square window, and the input pixels within the window are convolved with the kernel, producing a pixel value that is placed in the output image. This step is repeated until the end of the image.

Sobel
Edge detection is the most common method for detecting discontinuities in grayscale images. An edge is defined as a set of connected pixels located on a specific boundary between two regions.
If the input image is a color image, it should be converted to a grayscale image before applying the convolution operation.
Assuming each pixel is represented by a 32-bit unsigned integer, the code for converting RGB to grayscale is as follows:
After running, the test image will look like this:

The Sobel operator is one of the most commonly used operators in edge detection. The Sobel operator uses two 3×3 kernels convolved with the original image to compute approximations of the derivatives – one for horizontal changes and the other for vertical changes. If we define A as the source image, G x and G y are two images that contain the approximations of horizontal and vertical derivatives at each point, respectively, the calculations are as follows:

Using the previous convolution function, we can calculate the output image with the following code:
Where the window is defined as a 3 x 3 sliding window, and the kernels are the Sobel operator kernels used:
The image obtained after the convolution calculation is as follows:

As seen, the vertical and horizontal details are enhanced and easier to observe. Although it helps, we need a more unique feature image that represents only the edges.
The next step is to combine these two images and obtain a bi-directional change image. We can do this by calculating the magnitude or intensity of each pixel value and the direction or angle that links the current pixel to another pixel in the edge line.
At each point in the image, the following method can be used to combine the resulting approximations to give the magnitude:

And the angle used:

The squareroot and atan2 functions have been implemented in the HLS namespace:
The result is:


We have obtained a more focused image of edges. Nevertheless, edges can become wider in various forms. We need to use a technique called non-maximum suppression to suppress these false edges:
Now the edges are thinner and more concise.

Implementation
As mentioned earlier, the input image is fed pixel by pixel in the form of a data stream. To apply the convolution operation, we need to package the data under a 3 x 3 window. This can be implemented using an architecture with two buffers, where the number of elements equals the width, if our input image:

Here, there will be two auxiliary functions for moving the line buffer and the sliding window:
Finally, we can package the entire process into an HLS function (see attached code).
After obtaining the code, it should also be tested. GIMP (https://www.oschina.net/p/gimp?hmsr=aladdin1e1) has a very cool feature that allows you to export images directly as header files. Assuming we export the test image to a file named image.h, we can utilize the following code to implement the functionality we want to test (see code at the end).
Another way to validate the HLS IP is to verify it directly on the FPGA.
The first step is to create a block design and add the synthesized Sobel IP to the repository:

Add the implemented IP, where one DMA provides data to it, and the other reads the output:



After generating the bitstream, we can validate the functionality.
The generated image should be similar to the simulated image.
Now we need to implement an architecture that directly inputs from the camera.
The first component is the Zynq processing system and the i2c controller for configuring the camera interface:



In terms of image flow, a MIPI controller and a Demosaic IP are needed to convert the stream to RGB24:



Finally, we add our image processing IP and VDMA:


HOG
Follow-up articles will introduce this separately, stay tuned~
SVM – Support Vector Machine
In machine learning, Support Vector Machines (SVM, also known as Support Vector Networks) are supervised learning models with associated learning algorithms used for analyzing data for classification and regression analysis. Given a set of training samples, each labeled as belonging to one of two categories, the SVM training algorithm builds a model that assigns new samples to one category or another, making it a non-probabilistic binary linear classifier (though methods like Platt scaling can be used to apply SVM in a probabilistic classification setting). The SVM model represents examples as points in space, mapping them to create a clear gap as wide as possible between different categories of examples. New examples are then mapped to the same space and predicted to belong to a certain category based on which side of the gap they fall on.
Code
https://github.com/cuciureansergiu/kv260_svm