Updated: Mar 15
In times of natural calamities and disasters, people can lose their homes, be stranded somewhere alone, lose a family member, not have any food, etc. Some disasters aren’t as severe as others; hence, people won’t be in need of immediate aid. Depending on the severity of disasters, people will call for help immediately if they’re in a condition that is too difficult to handle. The goal of this AI service is to determine if an individual needs assistance based on the disaster response message
I used a Disaster Response Messages dataset from Kaggle. The dataset consists of 2500 messages along with several feature labels such as “medical help”, “search and rescue”, “deaths”, “in need of shelter/food/water”, if the disaster was “caused by a flood/storm/earthquake”.
The AI service that I developed is focused on predicting the necessity of aid based on the disaster response messages using Random Forest Classification. Testing the AI service with the dataset produces prediction accuracies on a scale from 1% to 100% which determines the necessity of aid. After the first round of testing, I got a prediction accuracy of 80%. The accuracy of 80% means that the AI service is 80% accurate in categorizing the disaster response messages into “zero value” (no aid needed) and “one value” (aid needed) based on the feature labels. I was pleasantly surprised how well my AI service performed in the first try. This accuracy was better than random guessing, but not satisfactory enough. Then, I decided to tune the hyperparameters of the Random Forest Classifier Algorithm to get a better prediction accuracy. After experimenting with the num_trees (the number of trees in the forest) and max_depth (the maximum allowed depth of the tree) hyperparameters, I was able to successfully bring the prediction accuracy up to 92.2%.
Analyze the AI
I wanted to further improve my AI to predict additional feature labels including if an individual is in “need of water, food or shelter”, and if the “disaster was caused by a flood, storm, or earthquake”. Using the Random Forest Classifier, I was able to tune the hyperparameters. My AI service resulted in fairly high accuracy values for each of these feature labels (water: 97.6%, food: 94.8%, shelter: 94.2%, flood: 95.6%, storm: 95.4%, earthquake: 98.4%).
During the process of creating the AI services, I learned about a classification algorithm, Random Forest Classifier. Decision Trees are the fundamental building blocks of Random Forest. Random Forest consists of several individual decision trees that operate like an ensemble. Each individual tree in the random forest provides a class prediction and the class with the most “votes”, is used to create the final prediction model.
The key concept behind using a Random Forest model is to make sure it consists of uncorrelated models (trees) because it is important to create ensemble predictions. These ensemble predictions are found to be more accurate than individual predictions. A larger number of uncorrelated models outperforms any individual models. The greater the number of trees results in a smaller number of errors, therefore, the predictions made by the individual trees must have a low correlation with each other. Ultimately, the random forest will make accurate class predictions to create our final model.