It is usually said that a picture is worth a thousand words. In the world of data, we could say that one graph is worth a thousand tables. Humans usually prefer to see a visual interpretation of data rather than a bunch of characters. For humans, it is also the fastest and the most effortless way of processing information. The last one could be considered the most important fact, and all other benefits of data visualization come from this one. In Figure 1, there is a diagram of the human senses total bandwidth, according to “The User Illusion: Cutting Consciousness Down to Size” written by Danish physicist Tor Nørretranders. The focus is on the relative difference of total bandwidth between senses rather than the absolute value of bandwidth. The diagram shows the precedence of sight compared to other senses. Compared to hearing, sight can process information 10 times faster.
Figure 1 – Bandwidth of human senses [T.Nørretranders, “The User Illusion: Cutting Consciousness Down to Size”, 1999]
Data visualization provides fast insight into:
- Distribution of the data
- Correlation between variables
- Data patterns
Distribution of the data refers to the way that data is spread or dispersed over a range of values. It describes the pattern of the data and helps to understand how frequently certain values occur, how close together or far apart they are. Understanding the distribution of data is important in many fields, including statistics, economics, data science, as it can help inform decisions about how to analyze and interpret data.
Correlation between variables refers to the statistical relationship between two or more variables. It measures the grade to which changes in one variable are associated with changes in another variable. It is used for many multivariable problems to explore and understand the relationships between variables (Figure 2). Additionally, correlation is used in predictive modeling to forecast future values of a variable based on its relationship with other variables.
Figure 2 – Some types of Pearson’s correlation [source]
Data patterns refer to the recurring structures or relationships that exist within a dataset. These patterns can take many forms, such as trends, cycles, clusters, or outliers. It can be identified through various analytical techniques, such as data visualization, statistical analysis, or machine learning algorithms. Understanding data patterns can help researchers and analysts gain insights into the underlying processes or phenomena that generate the data and make more informed decisions based on that knowledge. Correlation coefficients allow us to summarize the relationship between two variables in a single number. However, a given correlation coefficient can represent any number of patterns between two variables (Figure 3). More about this topic can be found in this blog.
Figure 3 – Different patterns for the same correlation coefficient [source]
Trends refer to patterns or insights that can be identified by analyzing data over time using analysis techniques such as regression analysis, time-series analysis, and forecasting. Nevertheless, they can be visualized using charts and graphs to help communicate insights and patterns to others. They can provide valuable information about the behavior of a system data represents and help decision-making and strategic planning.
Outliers are data points that differ significantly from other data points in a dataset. Outliers can be caused by measurement, recording errors, or may represent actual but rare or extreme values. Outliers can have a significant impact on analysis, as they can skew the results and make it difficult to draw accurate conclusions from the data. Identifying and handling them is an important part of the analysis.
Figure 4 – Outliers Identification [source]
Clarity and Simplicity
A simple answer is not an easy task, a comprehensible answer is even harder. To satisfy both simplicity and comprehensibility, it is another level. Put it in the context of one of the most famous equations, E = mc^2. Is it simple? It is. Is it elegant? It is. Is it comprehensible? For the majority of people, it is not. It is simple, but simplicity does not mean clarity. The same goes for visualization. It is very important to find a balance between these two terms. Simple graphs and diagrams are worthy, but when those to whom they are presented are struggling to understand the story behind the data, they become worthless. We should tend to simplicity but not at the expense of comprehensibility. When we succeed in finding the balance, we have a tool that could provide answers not only to simple but even more complex questions.
Focus on Important Information
People usually tend to present as much information as they find through an analysis. They do not like to omit some insights costed them a lot of time and effort. Do we need all of them, and what we actually need to show is the crucial question. Mostly great effort should be made to filter information considered important. Visualization tools could provide good filters for distinguishing information by importance. Using different scales, shapes, colors, textures, and transparency could be very helpful for filtering important information from unimportant. On the other hand, every element of visualization drives users’ attention. Therefore it is important to leave the visualization as simple as possible and omit elements that do not represent useful information.
Better Comprehension for Non-Technical People
Visualization helps data analysts make sense of the data they analyze and others, even more to understand their findings too. Others (stakeholders, business owners, investors) are mostly non-technical people to whom we need to present our insights and design information that tells the story behind the data. At the end of the day, we should answer a simple question most business owners like to ask “What do we get out of this?”. It is much easier to ask simple questions that give simple answers. Visualization can be really helpful in designing intuitive and simple answers that are easy to understand and present to people without any prior domain knowledge. The same data can be represented visually in different ways to meet the needs or priorities of different audiences.
Quicker and Better Decision Making
Data can be a very good basis for decision-making if it is handled in the right way. That implies collection, analysis, and last but not least, visualization and presentation of the data. All three steps could be considered equally important, but the last one is the final touch on decision-making. It makes a recommendation that inspires action. Therefore, special attention should be paid to visualizing and presenting data in the right way. Good visualization by itself does not mean a lot if previous steps were not done correctly. Here applies: a chain is only as strong as its weakest link. As already said, people tend to understand visual things quicker and better than written forms (e.g., reports), which implies faster and better decision-making, but also more informed decisions.
“Benefits of Data Visualization” Tech Bite was brought to you by Đorđe Bjelajac, Junior Data Analyst at Atlantbh.
Tech Bites are tips, tricks, snippets or explanations about various programming technologies and paradigms, which can help engineers with their everyday job.