Data for AI: How to make it work

Updated: Aug 30, 2021


The most important step in creating an AI is getting the data to train it. There are many public datasets on websites like Kaggle, which can be great to use. However, sometimes you cannot find the right dataset on these websites. For example, if you want to get data from a certain place or for a very specific problem, these websites may not have the dataset you are looking for. If so, you have to get the data differently.


This is what happened to us. We were looking at the problem of middle school stress. Because most online datasets are about adult stress, these were not very helpful for our problem. Therefore, we collected our data by sending out a survey as well as creating some synthetic data. Collecting our own data was a very long and tedious process, and this is a shortened down version of it. We hope it helps other kids who are looking for data for AI projects.


The Survey


The first method we used to gather data was through a survey of questions. When using a survey to collect data, the questions being asked are extremely important. Asking the right questions and making sure that the answers will help train your AI is crucial. For example, when we were looking for questions that would help us predict stress, questions like “What’s your favorite food?” prove to be irrelevant and unnecessary. However, questions like “How much sleep did you get?” can be relevant, as sleep is a factor that can impact stress. To help create questions, we met with a child psychologist who helped us put the right questions in our survey.


Privacy and Responsible Data Handling


Next, you must make sure that the user’s privacy is maintained, w