The COVID-19 Pandemic has infected people of many races and ages and I fear to imagine if a disease like that emerged in crops. Crops are the number one source of food for humans and diseases can easily wipe them out. With that inspiration, I created an AI service that can predict what disease a tomato plant has such as Early Blight, Late Blight, Septoria, Curl Virus
Aim I wanted to learn the basics of Image Classification using a neural network.
What I did
The COVID-19 Pandemic has infected people of many races and ages and I fear to imagine if a disease like that emerged in crops. Crops are the number one source of food for humans and diseases can easily wipe them out. With that inspiration, I created an AI service that can predict what disease a tomato plant has such as Early Blight, Late Blight, Septoria, Curl Virus. To achieve this, I used images of tomato plant leaves as a dataset to the Image Classification AI training program provided by aiclub.world website. The dataset was made up of 5 categories (1 for each disease and 1 healthy group) with 20 images per category equaling 120 images.
When I tested my AI service, I got a disappointing accuracy of 23.3%. That is almost the same as randomly picking 1 out of the 5 categories. This happened because my AI service was confusing a disease called Early Blight with the healthy leaves as they look the same.
I first attempted to change my dataset 7 times but that didn’t improve the accuracy much. After some tests, I figured the Early Blight disease was being confused with the healthy leaves. I proceeded to use the elimination by trial method by eliminating one category from the whole dataset and testing it. I repeated this for all five categories.
This is an Early Blight Leaf that has been predicted Late Blight after the Early Blight has been put under the Late Blight Category.
I found that removing the Early Blight from the dataset results in an acceptable accuracy of 90.47%. I still wanted my AI to classify a leaf that had Early Blight as dangerous. The AI knows that the Late Blight disease has a certain kind of pixel arrangement such that 70% of the leaf looks diseased. Now that we have added the Early Blight which looks 20-30% of the leaf is diseased, the AI takes an average of these two percentages and gets the ‘weight’ or what percent of the leaf looks diseased. This helps as the AI’s calculations are closer and more accurate in identifying the thirty percent diseased looks.
I learned about how basic neural networks work and how images are converted to be compatible with them. A general idea of a neural network runs on weights and calculations determined by the dataset you give. An input is given to the first layer of a neural network in neurons of information. That information is passed on to the next layer in the network. This layer runs calculations with the input and sends it to the neurons in the next layer and so on. These layers are called hidden layers where the AI does its thinking.
At the end of the network, there is an output layer where the AI has several outputs and chooses the best one for the input you give. No matter how hard you try, you can never understand why the AI chooses to follow a certain path of neurons from your input to the final prediction.
One process where images are converted to numbers and used as input in a neural network is called flattening. Pixels in an image contain a number that represents a color in the light spectrum visible to humans. These numbers are then inputted into the neural network. A different way is just to use Convolutional Neural Networks (CNN) where you can just input the whole image and it will be able to spit out a prediction. My AI service is a type of CNN called a Residual Neural Network (RNN).
How can I make my AI better? I felt that a Linear Classifier would have done much better because of my pictures have very drastic changes from each category (Maybe not the early blight so much) and the Ai would have a much easier calculation process and could give a more accurate prediction. I can probably also give a bigger dataset of around 500-1000 images per category. That way I can get early blight into its own category and have an overall better accuracy.
Further Development My AI service runs on a supervised learning algorithm because I gave it a dataset of tomato leaves saying which leaf had what disease. I really want to take my dataset and run it in an unsupervised learning algorithm. I think that would be an interesting experiment during my free time. I also wish to expand my AI to be able to identify diseases in other cash crops in California such as grapes and almonds. This can be slowly expanded to pictures of farms where I can use Image Classification to find regions of diseased plants.