.
New Article: ESP32 online Web Remote Watch: Video

Image Classification with TensorFlow Lite Explained

  share with someone if you find it useful

This Image Classification example can run on a Raspberry Pi without the need of any external accessory. You do not require any USB or PiCamera for this example as it makes use of images stored on the disk. The purpose of this example is to understand the working of Image Classification method using TensorFlow Lite. Most of the time we use the sample codes without understanding it. I tried to explain the important aspects of the code with finer details.

Best way to spark the interest is by running the project code on your Raspberry Pi. You can download the code, sample images and ML Models from this Github link. You need to install the tflite runtime interpreter to run the code. Run the 'requirements.sh' file as per the instructions provided on the Github page.

This article will show you how tensors are used to organize and process the information to give you the desired output i.e. name of the object present in the image. The code used in this example can also act as a building block for complex projects involving Image Classification. I will go through each line of code and describe its function. So lets begin by looking at the imports for this Python program.

The Imports

We start by importing TensorFlow Lite interpreter. This Interpreter provides the necessary methods to interact with the ML Model.

from tflite_runtime.interpreter import Interpreter

The 'numpy' library is used for array operations. You can do a variety of array operations in a single line of code. You are certain to encounter this in any program involving ML. 

import numpy as np

Since we are dealing with images in our program, Image module helps in modifying the input image to make it compatible with ML model.

from PIL import Image

Time functions are used  for measuring elapsed time .

import time

 

Using the Model files and Creating the Interpreter

The code folder you downloaded contains a folder named 'model_files'. This folder contains the Image Classification Models and a labels.txt file (common for all models). Storing the path of model and label files in variables.

model_path = "model_files/mobilenet_v1_1.0_224_quant.tflite" 
label_path = "model_files/labels.txt"

The model file 'mobilenet_v1_1.0_224_quant.tflite' is the pre-trained Image Classification model which can classify upto 1000 different types of objects. The names of these objects are present in the 'labels.txt' file. 

now, lets create an interpreter object by simply passing the path of Model file as shown in code below. The methods provided by this object will allow us to interact with the Model.

# Cerate interpreter for the specified model
interpreter = Interpreter(model_path=model_path)

The file 'labels.txt' contains a list of objects. The names of these objects are arranged in a specific order and should not be modified. We need to load the content of this file in an array so that we can dynamically access any name by specifying the index of the array. Following code does this job.

#Read the label file and load all the values in an array
with open(label_path, 'r') as f:
    labels = list(map(str.strip, f.readlines()))

#print(labels)
print('n Printing value of label at index 126:',labels[126])

The print statement above prints the name of 126th object present in the file 'label.txt' as shown below.

Printing value of label at index 126: hermit crab

I chose index 126 randomly just to test if the file is successfully loaded in an array. You can change this index anywhere between 0-1000 to see the output and correlate with the file content. 

 

A closer look at the model

We have already created an interpreter by specifying the name of exact model we are going to use in this project. Lets use it to obtain the input and output details of this model as shown in the code below.

# Obtain input and output details of the model.
print("n--------Input Details of Model-------------------n")
input_details = interpreter.get_input_details()
print(input_details)

print("n--------Output Details of Model-------------------n")
output_details = interpreter.get_output_details()
print(output_details)

The print(input_details) statement prints the input details of the Model as shown below.

[{'name': 'input', 'index': 88, 'shape': array([  1, 224, 224,   3]), 'shape_signature': array([  1, 224, 224,   3]), 'dtype': <class 'numpy.uint8'>, 'quantization': (0.0078125, 128), 'quantization_parameters': {'scales': array([0.0078125], dtype=float32), 'zero_points': array([128]), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]

By inspecting the input details we can know a lot about the model.  Don't get overwhelmed by the information above. Its ok if you do not understand any of these parameters. Just notice the structure of the information. We can see that information is arranged in an array whose only element is a python dictionary. Therefore,  in order to read the shape of input we need to write the following line.

# Obtain input size of image from input details of the model
input_shape = input_details[0]['shape']

The line means: read the 0th index of array 'input_shape' (which is a dictionary) and then read the value of the key 'shape' from the dictionary. Now, we can see the content of 'input_shape' by printing it and extract the size of the input image using the following code.

print("shape of input: ",input_shape)
size = input_shape[1:3]
print("size of image: ", size) 

The output of above two print statements 

shape of input:  [  1 224 224   3]

size of image:  [224 224]

Here size of the image means its resolution. Basically, our model needs a 224x224 size of image as input. This input resolution can be different for different models. That's why we need to dynamically obtain this information and preprocess the image as per the input requirement of the model.

Just notice another parameter 'index':88 in the input details. We will use it in a bit. Except these two parameters, you can ignore the rest.

Similarly, you can observe the output of print(output_details) statement.

Input image preprocessing

There can be variety of ways to capture and obtain an image for input. For example image can be captured by a PiCamera, USB Camera, downloaded from internet, clicked from your phone etc. Depending upon the source, all these images can differ in terms of resolution, number of channels and size. For simplicity, we are going to test the code with some sample images stored on the disk. These test images are downloaded from internet and present in the folder 'sample_images'.  Notice that all these images have different resolutions. We need to have a mechanism in place to resize the input image and make it as per the input requirements of the ML model. 

We are going to read the image from the disk, convert the image in RGB format, resize the image to the required resolution and then convert the image into an array. Following code does these tasks.

# Fetch image & preprocess it to match the input requirements of the model
file_path = "sample_pictures/1.jpg"
img = Image.open(file_path).convert('RGB') #read the image and convert it to RGB format
img = img.resize(size) #resize the image to 224x224
img = np.array(img) # convert the image in an array

#print(img)
print('value of pixel 145x223: ',img[145][223])

At this stage our image available as two dimensional array with 224x224 elements. Each element is a pixel with a value [R G B]. The print statement randomly prints value of the pixel at position (145,223). You can see the whole array by uncommenting the print(img) statement.

Now, we need to add an extra dimension to the img array. Again numpy comes handy in achieving this. 

processed_image = np.expand_dims(img, axis=0)# Add a batch dimension
print('value of pixel 145x223 in processed_image:',processed_image[0][145][223])

The variable 'processed_image' contains the final structure of the image required for performing inference. Notice, that now we have to append and extra [0] prior [145][223] to access the same element due to the added dimension. The output of above two print statements will be same as shown below.

value of pixel 145x223:  [218 206 190]

value of pixel 145x223 in processed_image: [218 206 190]

Performing Inference

Now, we have come to the real meat of the code. The entire Image classification process can be summarised in the next few lines of code. TensorFlow Lite interpreter has a function called set_tensor() where we pass the input data for performing inference. Similarly, there is another function called get_tensor() which is used to obtain the results of inference. But, in order to use these functions, we have to use the allocate_tensors() function first. That is how the TensorFlow Lite framework works.  So, lets do it and feed the 'processed_image' to set_tensor() function. The set_tensor() function expects two variables. One is the 'index' of input details, which is 88 for this model (check the value of 'index' in the input details printed above in this article) and other is the image itself. You can replace the first parameter by '88' and observe that the code still runs fine. However, the index value will change if you use any other Image Classification Model. Therefore, we need to obtain it dynamically so as to use the same code for all other Models.

# Now allocate tensors so that we can use the set_tensor() method to feed the processed_image
interpreter.allocate_tensors()
#print(input_details[0]['index'])
interpreter.set_tensor(input_details[0]['index'], processed_image)

Inference is performed by a single line "interpreter.invoke()". This is what takes most of the time in the entire code. Time taken during inferencing depends upon the CPU specifications. Raspberry Pi 4 will take less time than Raspberry Pi 3 due to better CPU specs. 

t1=time.time()
interpreter.invoke()
t2=time.time()
time_taken=(t2-t1)*1000 #milliseconds
print("time taken for Inference: ",str(time_taken), "ms")

I ran this code on a Raspberry Pi 4 with 8 GB RAM. In my case the time taken is printed as follows.

time taken for Inference:  133.0857276916504 ms

In those 133 ms, a lot of computations have happened behind the scene to generate the output. The output of our hardwork can be collected by using get_tensor() method. This method expects only one parameter i.e. the 'index' of the output details. This value is '87' for the Model being used. Check it yourself by observing 'output_details' of this Model.

# Obtain results 
predictions = interpreter.get_tensor(output_details[0]['index'])[0]

Processing the Output

At this stage, variable 'predictions' contains an array of scores for each of the class present in the Model. The list of classes can be seen in the 'labels.txt' file associated with the Model. Our Model contains 1000 classes. Here classes means names of the objects present in 'labels.txt' file. The following code simply prints the indices of the array 'predictions' with corresponding score, only if score is non zero (otherwise all 1000 indices would be printed with most of them having '0' score). Score is a value between 0-255. Where '0' means no resemblence and '255' means perfect resemblence. For example, if the input is a picture of Tiger, it will resemble slightly with a Cat and will result in a non zero score for class 'Cat'. However, it will not resemble at all with a Chair, resulting in a zero score for class 'Chair'. So, when an input image is fed for inference, it is matched with all the 1000 classes and a scores corresponding to all the 1000 classes are returned in form of an array. In our code, name of this array is 'predictions'. 

print("length of array: ", len(predictions),"n")

for i in range(len(predictions)):
    if(predictions[i]>0):
        print("predictions["+str(i)+"]: ",predictions[i])

Output of above code is as follows

length of array:  1001 

 

predictions[282]:  24

predictions[283]:  65

predictions[285]:  1

predictions[286]:  150

predictions[288]:  4

predictions[770]:  1

predictions[877]:  1

predictions[905]:  5

predictions[969]:  1

You can observe that only 9 out of 1000 classes have some degree of resemblemce with the input picture. The maximum score being '150' for index '286'. You can lookup this index manually in the 'labels.txt' file and discover that the name of object is 'Egyptian cat'. 

So, the only task left now is to extract this information out from the array 'predictions' programatically. To do that, we can sort the array with descending order of scores and select the top few entries, lets say top 5 entries. With the help of numpy library, we can achieve this task in one line of code. The following code does this task. 

top_k = 5
top_k_indices = np.argsort(predictions)[::-1][0:top_k]
print("Sorted array of top indices:",top_k_indices)

At this stage, our array 'top_k_indices' holds the indices of top 5 scores. Output of the print statement is shown below.

Sorted array of top indices: [286 283 282 905 288]

Lets print the scores and labels associated with these top indices with following code. Since the score is a value between 0-255, we have to divde it by 255 to obtain the percentage.

for i in range(top_k):
    score=predictions[top_k_indices[i]]/255.0
    lbl=labels[top_k_indices[i]]
    print(lbl, "=", score)

The output is as follows

Egyptian cat = 0.5882352941176471

tiger cat = 0.2549019607843137

tabby = 0.09411764705882353

window screen = 0.0196078431372549

lynx = 0.01568627450980392

We can now print the score and label associated with best match. This is the first index in array top_k_indices.

index_max_score=top_k_indices[0]
max_score=score=predictions[index_max_score]/255.0
max_label=labels[index_max_score]

print(max_label,": ",max_score)

That is how Image Classification works with TensorFlow Lite. Hope you gained some insight by reading this article.

What can you do next

Click photos of your household objects with your camera and place them here in the "sample_picture" folder and see this script recognising them. You can also switch between the various Image Classification models to see how the result differs for the same picture. See all the results by enabling all the print statements in the script. Compare the code with its results at every stage. 

The job of Tensorflow is over once it has provided you output of best match. It is up to you to use this output in your project to create something unique. for example, I made a speaking robot which speaks out the label of the object present in the picture using a text to speech converter (espeak). 

 


Comments

Leave a comment