.
New Article: ESP32 online Web Remote Watch: Video

Google Coral USB Accelerator performance with Raspberry Pi 3B, 3A+, 4B

  share with someone if you find it useful

Google Coral USB accelerator is a device that can be attached to a computer for speeding up inferencing process in Machine Learning projects. It acts as a coprocessor and provides hardware acceleration for Neural Networks. It makes Inferencing process 10 times faster.

This experiment is about measuring the performance of 4 models (Pi 4 4GB & 8GB , Pi 3B, Pi 3A+) of Raspberry Pi. The performance is measured with and without Coral USB accelerator. Same set of Python scripts (Test Code) are used to perform image classification using a Machine Learning Model (MobileNet V1) on all the models. This is achieved by switching the same micro SD card between the different variants. The setup is shown in the picture below.

coral usb accelerator with raspberry pi performance analysis

Before we can plug the Coral USB Accelerator to Raspberry Pi, we need to install certain dependencies. These are brought out very clearly on the official Coral website. The link is here https://coral.ai/docs/accelerator/get-started.

Just follow the instructions present on the link and you will be good to go to use the coral hardware with Raspberry Pi. 

 

Testing Coral USB Accelerator

 

The test scripts used in this experiment can be downloaded from this github link.

There are two Python scripts, two model files and a label file in the folder. The two Python scripts are:-

=> classify.py : It works with model file 'mobilenet_v1_1.0_224_quant.tflite' and does not make use of the Coral USB accelerator.

=> classify_coral.py:  It works with model file 'mobilenet_v1_1.0_224_quant_edgetpu.tflite' and makes use of the Coral USB accelerator. This file is identical to 'classify.py' except the minor modifications which are incorporated to make it work with Coral USB accelerator. The modifications are as follows:-

1. Import load_delgate from tflite_runtime.interpreter

2. Change the path of model file and make it point to the 'edgetpu' model file

3. Make interpreter with 'load_delegate' function.

The details are covered in the Code Walkthorugh section. The basic tasks performed by both the scripts are as shown below.

delegate inference to coral usb accelerator

'Camera capture' and 'Preview' involves getting a picture frame from the camera and displaying it on a output window with suitable annotations. There are multiple methods to do these tasks efficiently and minimise the processing time. One such method is to perform the camera related task through OpenCV.

'Inference' involves obtaining predictions from the model file based on the input image. The time taken in this step depends upon the model file being used. Inference time may vary from model to model depending upon how many classes it has. Without any external hardware acceleration, this task is performed by the CPU and devours the precious processor resources. In order to build applications that employ a machine learning model for a real-time use case, it is imperative that the inferencing time must be as low as possible to get maximum FPS.

The Python script 'classify_coral.py' delegates the inferencing part to the Coral USB Accelerator and brings down the processing time drastically. The observations of time taken by the various models of Raspberry Pi are brought out in the next section.

The Result Summary

The results obtained from running the test scripts are summarised here. While running the scripts, the time taken by three tasks (camera capture, inference, preview) varies with every frame. Snap shot of average case is shown in the results.

Raspberry Pi 4B (4GB)

Raspberry Pi 4B

CPU: 64 bit quad-core @ 1.5 GHz 

RAM: 4 GB

raspberry pi 4b terminal command

Commands verifying Raspberry Pi version

 

Without Coral Accelerator (results of running 'classify.py')

4GB_1_

With Coral Accelerator (results of running 'classify_coral.py')

Pi 4B with coral USB accelerator results

 

 Raspberry Pi 4B (8GB)

Pi4B_

CPU: 64 bit quad-core @ 1.5 GHz 

RAM: 8 GB

raspberry pi 4b 8GB terminal command_

Commands verifying Raspberry Pi version

 

Without Coral Accelerator (results of running 'classify.py')

8GB_1_

With Coral Accelerator (results of running 'classify_coral.py')

8GB_2_

 

 Raspberry Pi 3B

Pi 3B_

CPU: 64 bit quad-core @ 1.2GHz 

RAM: 1 GB

raspberry pi 3 terminal command_

Commands verifying Raspberry Pi version

 

 Without Coral Accelerator (results of running 'classify.py')

3B_1_

With Coral Accelerator (results of running 'classify_coral.py')

3B_2_

​​

Raspberry Pi 3A+

Raspberry Pi 3A+

CPU: 64 bit quad-core @ 1.4 GHz 

RAM: 512 MB

raspberry pi 3a+ terminal command

Commands verifying Raspberry Pi version

 

Without Coral Accelerator (results of running 'classify.py')

3A_1_

With Coral Accelerator (results of running 'classify_coral.py')

Pi 3A+ with coral USB accelerator results

In all the above cases, we can see the drastic reduction in the inference time upon invoking Coral hardware. However, 'camera capture' and 'preview' still take the same amount of time because only the inferencing part is processed inside Coral hardware.

An overview of the results is provided by this graph.

Raspberry pi models with coral USB accelerator performance comparison

Code Walkthrough

 

Since both the test scripts are identical, I will cover 'classify_coral.py'. Notice the modifications which you need to incorporate in any script to make it compatible with Coral hardware. The modifications are highlighted with green background.

In the import section, load_delegate needs to be imported for Coral Hardware

from tflite_runtime.interpreter import load_delegate

Before we get into the forever loop of performing the three tasks shown in the flowchart above, we need to initialise the interpreter and load the model into it. We start with selecting the model file. Here we need to specify the model file that is compiled for edgetpu. The Coral hardware won't work with the model file that is not compiled for edgetpu.

#this path needs to be changed to 'mobilenet_v1_1.0_224_quant.tflite' if the coral hardware is not used
model_path = "mobilenet_v1_1.0_224_quant_edgetpu.tflite" 

#label file is common to both the models
label_path = "labels_mobilenet_quant_v1_224.txt"

Now we use this model file to instantiate the interpreter. Here the 'experimental_delegates' parameter indicates that we want to delegate the inferencing part to Coral hardware. 

interpreter = Interpreter(model_path=model_path, experimental_delegates=[load_delegate('libedgetpu.so.1.0')])

# if not using Coral hardware, the experimental_delegates parameter is not required 
#and the above line of code can be written as 'interpreter = Interpreter(model_path=model_path)' only

All the modifications that are required for using Coral Hardware have been covered so far. Once the interpreter is made, we go on to allocate the tensors and use them for our purpose.

#allocate tensors
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

#read the label file 
with open(label_path, 'r') as f:
    labels = list(map(str.strip, f.readlines()))

#prediction threshold for triggering actions
threshold=0.5

#for every prediction the model will return 1000 probabilities as there are 1000 classes. We need to specify how many results we would want to keep. 
top_k_results = 2



Preview window is generated through Matplotlib. This window is updated with the current camera frame.

plt.ion()
plt.tight_layout()
	
fig = plt.gcf()
fig.canvas.set_window_title('TensorFlow Lite')
fig.suptitle('Image Classification')
ax = plt.gca()
ax.set_axis_off()
tmp = np.zeros([480,640] + [3], np.uint8)
preview = ax.imshow(tmp)

Post initialising the camera, the script loops continuously to perform the three tasks. 

=> camera capture.

start_t1=time.time()
stream = np.empty((480, 640, 3), dtype=np.uint8)

camera.capture(stream, 'rgb',use_video_port=True) # capturing image directly as numpy array
img = scale_image(stream)
#scale_image() function returns the image after trimming it to the size that is required by the model
 
time_elapsed(start_t1,"camera capture") # prints the time taken by the processor while executing above lines
 

=> Inference

start_t2=time.time()
# making the data compatible for feeding into the input tensor. This done by adding a batch dimension
input_data = np.expand_dims(img, axis=0)
        
# feed data to input tensor 
interpreter.set_tensor(input_details[0]['index'], input_data)

#run the interpreter
interpreter.invoke()
        
# Obtain results from the interpreter. Result comprises of labels and their associated probabilities.
predictions = interpreter.get_tensor(output_details[0]['index'])[0]
        
# Get indices of the top k results
top_k_indices = np.argsort(predictions)[::-1][:top_k_results]

# Getting the result with maximum score
pred_max=predictions[top_k_indices[0]]/255.0
lbl_max=labels[top_k_indices[0]]
        
#take action based on maximum prediction value. If the maximum score crosses the threshold then annotate the camera image, otherwise keep it blank.
if (pred_max < threshold):
     camera.annotate_text = "___"
               
if (pred_max >= threshold):
     percent=round(pred_max*100)
     txt= " " + lbl_max + " (" + str(percent) + "%)"
     camera.annotate_text = txt
                
time_elapsed(start_t2,"inference") #print the time taken in all the above steps

=> Preview

start_t3=time.time()
#update the camera preview window

preview.set_data(stream) 
fig.canvas.get_tk_widget().update()

time_elapsed(start_t3,"preview") # print time taken in above steps

 


Comments

  • from Alessandro Catorcini , 2 years ago

    Thank you for the experiment. There is one more paramters that would have been very interesting to measure - the overall CPU load. Running Tensorflow on the CPU is incredibly expensive (object detection on a single camera uses about 73% CPU for a single HD camera at 5 FPS in a RPi 4B 4Gb). A Coral USB is bound to reduce that dramatically, which also includes the overall CPU temperature long term. It would be really interesting to see a part 2 where you evaluate that aspect as well. Thank you for the time you dedicated to this - it is really useful.

  • from Sudhanshu , 3 years ago
    where can i get the colar usb accelerator in india

Leave a comment