By itself, raw data doesn’t look like much or mean much, but it has the potential to be processed for analysis.
Processed data comes from raw data and is typically easier to understand, better displayed, and leads to actionable insights, but you have less freedom and initial visibility into what happens with the data set.
Raw data, then, is like an uncracked egg, while processed — or cooked — data is like scrambled eggs served on a dish.
See below to learn about raw data, how it gets processed, and why it matters for data-driven industries.
Read Next: What is Data Analysis?
Raw data is the data that is collected from a source, but in its initial state. It has not yet been processed — or cleaned, organized, and visually presented. Raw data can be manually written down or typed, recorded, or automatically input by a machine. You can find raw data in a variety of places, including databases, files, spreadsheets, and even on source devices, such as a camera. Raw data is just one type of data with potential energy.
Here are some examples of data in raw form:
- A list of every purchase at a store during a month but with no further structure or analysis
- Every second of footage recorded by a security camera overnight
- The grades of all of the students in a school district for a quarter
- A list of every movie being streamed by video streaming company
- Open-ended responses to a survey question
Without organization and analysis, this data isn’t actionable. But all of the information is there to create benchmarks, ask questions, and process the data as well as make visuals to show what’s happening with a data set.
More qualitative data examples: What is Qualitative Data?
More quantitative data examples: What is Quantitative Data?
Data analysts, software, and artificial intelligence (AI) all work to transform raw data into processed data.
They start by organizing and cleaning the raw data. One of the most important parts of this process is removing outliers and duplicates within the data set.
The next step is an initial analysis that may involve data manipulation. Especially if analysts are analyzing raw data based on human responses to a question, they will look closely at those responses and determine if respondents inaccurately replied to the question in a way that will change the results. Analysts may also review the quality of the question to decide if the responses are relevant for further analysis.
Raw data serves several purposes, particularly in businesses where full data visibility is key to statistical and predictive analytics.
Here are a few reasons why businesses heavily rely on raw data sources:
- Raw data is the starting phase of all data and the initial source of data-based decisions. You can’t make visually compelling charts or overarching analytical statements about processed data until you’ve worked through all of the raw data.
- You can trust the integrity of raw data. You don’t have to worry that something has been removed or adjusted, because the format has not yet been manipulated by humans or machines.
- AI and machine learning methods can only analyze data in a raw format. Once the data has been processed, it is illegible to these technologies.
- Raw data gives you a backup resource. You can check your work and go back to the source after processing and manipulating your data sets. It’s all there for your reference if you run into a problem and need a new analysis.
Raw Data in Business
Raw data is critical in any line of business where frequent data analysis happens, such as health care, retail, and manufacturing.
Without accessible raw data, companies are confined to whatever format processed data comes in, and there’s always the risk the data has been processed in error or is misaligned with strategy.
Dr. Diaswati Mardiasmo, Chief Economist at PRD, a real estate firm in Australia, explained why her company relies on raw data for greater integrity and visibility in their data analysis practices.
“Access to raw data gives us the chance to view the behind-the-scenes data and also cleanse the data for any anomalies or data imperfections,” Mardiasmo said. “This enhances accuracy and increases trust in any output products.”
Working with raw data ensures that you give credible information to your customers and internal departments.
Mardiasmo also described raw data as a source of creative freedom for her team.
“Raw data allows us to process and analyze the data in a manner that is most suitable and beneficial to us,” Mardiasmo said. “For example, we can choose to display the data in differing timelines, groupings, and data displays. We are not confined to already made data/graphs.”
See more: What is a Database Management System?