Building a Simple Cat-Dog Classifier with NVIDIA Jetson Nano

This article will use the Jetson Nano to create a very simple cat-dog classifier, utilizing PyTorch’s ImageFolder to create the dataset and DataLoader to load the dataset, and learning to train with a self-built CNN. Finally, we will extract test images for prediction.

Jetson Nano Remote Setup

Building a Simple Cat-Dog Classifier with NVIDIA Jetson Nano

Today we will operate on the Jetson Nano development board. Our company already has many relevant introductions and applications for the Jetson Nano, so we won’t elaborate further; the link is provided here for reference: https://blog.cavedu.com/tag/jetson-nano/. For the remote connection part, we use Wireless Network Watcher to check the Nano’s IP address. In the options > Advanced options > Use the following network card > Select Ethernet, click OK to scan and find the Jetson Nano’s virtual IP.

Building a Simple Cat-Dog Classifier with NVIDIA Jetson Nano

Next, you can use MobaXterm or Jupyter Notebook for remote access. Both have file systems that allow for direct file transfer, which is very useful! For MobaXterm, just install it online, click on Session, enter the IP address, user account, and password to use it; while for Jupyter Notebook, the Jetson Nano has already set up the remote functionality for us, so you just need to open a browser on the PC and enter http://{your Jetson Nano’s IP address}:8888/, then enter the password to log in. This example will use Jupyter Notebook for operation.

Below is the interface used for MobaXterm:

Below is the interface used for Jupyter Notebook. It is recommended to first create a folder as a workspace. After creating it, you can start a Terminal and Jupyter Notebook; the Terminal is used to install packages, while Jupyter is used to write and execute programs:

Kaggle Introduction

Today, the dataset used will be downloaded from Kaggle, which is a data modeling and analytics competition platform that holds machine learning and deep learning competitions every year. Many aspiring machine learning engineers repeatedly use various datasets on it to achieve good results.

Dataset download link:https://www.kaggle.com/c/dogs-vs-cats/overview/description

There are two methods for downloading Kaggle datasets: the first is direct download, and the second is to use the Kaggle API for downloading. Today, I will also teach you how to download using the API.

Using API to Download Kaggle Dataset

Step 1: Install Kaggle API

You can execute in Jupyter Notebook:

!pip3 install -U -q kaggle

It can also be executed in the Terminal:

pip3 install -U -q kaggle

The only difference is that in Jupyter, you need to add an exclamation mark to distinguish it from program code!

Step 2: Obtain the Authentication JSON File

Click on your profile picture on the upper right > My Account > Scroll down to API section and click Create New API Token to obtain the authentication JSON file.

Note the storage location!

After downloading, you will be reminded to place it in the ~/.kaggle location of the device you are using. Be particularly careful when adding it to Jetson Nano!

Step 3: Add the Authentication File to Jetson Nano

Enter the following program code, replacing {usr} and {API key} with your own name and key. After execution, the Kaggle authentication file will be in Jetson Nano, and you can proceed to download directly.

Step 4: Download Dataset to a Specific Folder

This program code can be found in the Data information on the Kaggle dataset webpage, where the -p variable is for downloading to the specified directory:

Step 5: Confirm Data and Unzip

Data Processing

The provided data consists of two folders: train and test1. The train folder contains a total of 25,000 samples. We will display the first nine photos to take a look.

Building a Simple Cat-Dog Classifier with NVIDIA Jetson Nano

It can be noted that the data naming convention is {label}.{id}.jpg. Our goal is to categorize the two different data types of cats and dogs into different folders. Therefore, I will first create folders for cats and dogs, and then categorize them based on the file names.

First, import the library and declare the basic directory address:

Confirm whether the directory exists; if not, create one:

Next is the main sorting program code:

After sorting, you can see that cats and dogs have been placed in their respective folders, each containing 12,500 photos:

Creating a Dataset with Torch

In PyTorch, customizing the dataset is necessary because sometimes your data is “one photo with one label” and sometimes “one photo with multiple labels”, or even a multi-label format. Therefore, we will discuss how to create your own dataset.

The Relationship Between Dataset and DataLoader

PyTorch packages all data in torch.utils.data.Dataset, where you can choose how to extract your dataset (single entry). As previously mentioned, it can be one photo with one label or a more complex label. You can also use data augmentation while building the dataset to increase the amount of data and the robustness of the neural network model through deformation and cropping. After declaring, the Dataset is then packaged into torch.utils.data.DataLoader for pushing, where you can choose how many photos to output for parallel computation at once. Currently, I commonly use two ways to define the dataset. If the files are already categorized in folders, you can use ImageFolder to create the dataset; if you need more information, you will customize a dataset.

Building Dataset with ImageFolder

This is the most common dataset organization method provided by torch. We have already placed cats and dogs into their respective folders, so we will directly use this straightforward method.

The program code is as follows. While building the dataset, we will first define the transform, which is usually used for data processing and augmentation. It can perform cropping, deformation, file conversion, etc., and is a very important step:

When we take out the dataset, we can see that cats and dogs have already been categorized into 0 and 1:

Next, we will notice a problem. I took the first five images to check their sizes and found that each image has a different size. At this point, we must handle the dimensionality issue; otherwise, the convolutional neural network cannot run.

Usually, the simplest and most straightforward way to deal with such problems is to directly add resize in the transform. Of course, there are better solutions, but we will not discuss them here. We will resize all images to 224 * 224 and convert them to Tensor, followed by normalization.

Using DataLoader to Batch Output Data

Here, to prevent running for too long, I limited the loop to 10 iterations. You can see that each output consists of 16 images (dimensions [16, 3, 224, 224]), and from the labels, you can see that they are all shuffled.

Building a Convolutional Neural Network

The concept of convolutional neural networks will not be elaborated here. The main process is Convolution (Conv) > Pooling (MaxPool) > Flatten (view) > Fully Connected (linear) > Output (pay attention to dimensions). Here, I added softmax to the last layer to make the two outputs sum to 1 for better visualization. Additionally, it is essential to calculate how large the image will be after each convolution and pooling layer because we must declare the input dimensions before flattening the data. The calculation formula is:

Thus, you will find that the input to the first layer of the fully connected layer is 128 * 28 * 28, where 128 is the number of kernels from the last conv layer, and 28*28 is what you calculated yourself.

You can print the neural network or import a batch of data to see it.

The final output is two-dimensional because there are two categories: cat and dog. If the value at position 0 is larger, it indicates the neural network judges it as a cat; conversely, if the value at position 1 is larger, it indicates it is judged as a dog:

Starting Training

First, set the basic configurations. Here, it is essential to note that I will train the model using the GPU since it is a classification problem, using CrossEntropy as the loss function, and Adam as the optimizer.

Before training, remember to also transfer the tensors to the GPU. The rest of the training process is similar to previous teachings. I specifically stored the loss of each iteration for visualization purposes. I see that many people online use model.train() and model.eval(). This is because torch automatically turns off BatchNorm and Dropout during validation. Since our self-built neural network does not have these two layers, it is not necessary to write them. However, since we will gradually add techniques, BatchNorm will definitely be written in, so it is advisable to get used to writing these two lines now. Set the model to training mode before starting training and to validation mode at the end.

Results after training: on a standard computer using RTX 1080, it took about 545 seconds to train, while on Jetson Nano, training each epoch took about 1300 to 1350 seconds. Completing 5 rounds of training took about 6500 seconds, which is nearly two hours. Although the performance gap seems significant, considering its price, size, and computing power, it is actually quite impressive!

I have previously stored the loss, so I can directly call it and visualize it:

Testing Data

The validation set’s data format is different. To use ImageFolder, I also created a test folder in the test1 folder and placed the images in it, ultimately packaging it with ImageFolder.

Any changes made to the images during the training phase must also be applied during testing; otherwise, the predictions will be inaccurate:

Next, we can proceed with the prediction. Here, we will only take the first batch of images, which is 16 images, and feed them into the neural network model to obtain a set of data [16, 1], representing the prediction results for 16 images. We can use the program code to obtain the index of the larger value to determine whether it is a cat (0) or a dog (1).

The program code for the prediction is as follows:

In the final results, we can see that out of the 16 images predicted, only 2 were incorrect.

Related posts

Leave a Comment Cancel reply