Data is all around us. Data Science is the set of practices and methods for understanding data and extracting valuable insights from data. In the last several decades, our technology for gathering data about all aspects of our world and lives has grown substantially. As a result, we have data on everything from our farms to our individual heart rates, monitored, recorded, and stored every day. Every area of human development is now either guided by data or has the potential to benefit from data-driven insights.
If you are new to Data Science and want to learn, here are three key things to know to get started.
It is (of course!) about the Data
Data Science starts with data. The better the data, the more likely that a Data Scientist can extract meaningful and correct conclusions about the data. If the data is corrupted or damaged, the conclusion is likely to be as well. So the first question to ask is - where did this data come from? Was it gathered well? Is there enough data that we can trust conclusions? Was the data gathered in a non-biased way?
For example - if one is gathering data about people's dietary habits, and the survey was conducted of people who shop at a neighborhood health food store - the data is likely not representative of everyone in the neighborhood. It is likely that the people answering the survey practice healthy food habits, which can skew the survey and affect the results. This is the kind of thing you need to know before you even start analyzing the data.
What are the right questions?
So - say you have done your diligence and are convinced that the data is good. Now - what do you do? Data by itself is just information. The way to convert information to insights is to ask questions. You cannot get useful answers if you do not ask the right questions.
For example, you can possibly use the survey of dietary habits to find out if people eat at specific times of the day. Is this useful insight? It depends. If your goal is to find out when stores should be open - yes it is a useful insight. If your goal is to find out what produce to keep in a store, it is not necessarily helpful.
Deciding the right questions requires Domain Expertise. Domain Experts are people who understand the problem (not necessarily the data). They may not be able to tell you how to get the answer out of the data, but they can tell you what questions are useful to get answers to. Sometimes Data Scientists are the domain experts also, and sometimes external domain experts need to be consulted.
Explaining the answer
A very important part of Data Science is the presentation (also called Data Storytelling or Data Visualization). Any good data insights have to be explained to people who can take meaningful action. For example - if the study of the dietary habits indicated what people are likely to eat, this has to be presented to store owners who can take action to change the products they order and display in their stores. There are many tools and techniques to describe data (scatter plots, histograms, data paragraphs etc.). A key task of the data scientist is to look at the data, the domain context, and choose how to present the results so that others can understand what the data means and what the data tells us about the questions they care about.
Hope this helps! If you are interested in more data science and artificial intelligence topics, please subscribe to our AI Literacy Newsletter here.