What is a map?
A map is a symbolic representation of selected characteristics of a place, usually drawn on a flat surface. Maps represent information about our world in a simple, visual way. They teach about it by showing the sizes and shapes of countries, locations of features, and distances between places. Maps can also show distributions of things over Earth, such as settlement patterns, or show the exact locations of houses and streets in a city neighborhood. In this article, we will focus on data analysis because it is the basis of the map quality itself.
With the Invention of GIS (Geographic Information System) technology and the rise of the digital age, the internet has brought maps to a larger audience. What was previously limited by cost and production process, like the hand-drawn maps from the middle ages, is now available to a broader audience with access to the internet. This digital medium reflects today’s society’s needs and has transformed how we use maps.
Most people interact with the products of modern cartography on a daily basis.
Consider the many apps on your phone. How many of them rely on location-based services? Navigation apps like Google Maps, TomTom, and Waze, ride services like Uber, and food delivery apps like Korpa or Glovo all have some mapping component.
Most of us just see the final product, nice color maps that contain a lot of helpful information. But the process of creating a map, especially from scratch, is a long, demanding, and responsible job.
The process of creating maps can be divided into several steps:
1. Data collection
2. Data analysis
3. Data processing
4. Map design
Spatial Data Analytics
Data analytics is the science of analyzing raw data to make conclusions about that information.
Various approaches to data analytics include looking at what happened (descriptive analytics), why something happened (diagnostic analytics), what is going to happen (predictive analytics), or what should be done next (prescriptive analytics).
When we speak about maps and spatial data analytics, in most cases, we are talking about a descriptive analytics approach. This approach describes what has happened over a given period or provides basic data statistics. Has the number of records of certain data types gone up? Is the spatial accuracy better in the latest dataset than in the previous one?
Spatial data is information that describes objects, events, or other features with a location on or near the surface of the Earth. Spatial data typically combines location, attribute, and time information.
Location information is usually coordinates on the Earth; attribute information characterizes the object, event, or phenomena concerned, while time information describes the time or life span at which the location and attributes exist.
The location provided may be static in the short term, e.g., the location of a piece of equipment or an earthquake event, or dynamic, e.g., a moving vehicle or pedestrian, the spread of an infectious disease.
Geospatial data typically involves large sets of spatial data gleaned from many diverse sources in varying formats and can include information such as census data, satellite imagery, weather data, cell phone data, drawn images, and social media data.
Spatial data analysis can be roughly divided into geometry and attributes analysis.
Figure 1: Spatial data (Image Source)
Two primary parameters characterize spatial quality, accuracy, and precision. Accuracy describes how close a measurement is to its actual value and is often expressed as a probability (e.g., 80 percent of all points are within +/− 5 meters of their true locations). Precision refers to the variance of a value when repeated measurements are taken. A watch may be correct to 1/1000th of a second (precise) but maybe 30 minutes slow (inaccurate).
Precision is often confused with accuracy, but the two terms mean very different things. While precision is related to resolution and variation, accuracy refers only to how close the measurement is to the true value, and the two characteristics are not dependent on one another (Figure 2.).
Figure 2: Precision and Accuracy (Image Source)
The second type of accuracy in spatial data is logical consistency. Logical consistency requires that the data are topologically correct.
Topology in a GIS expresses the spatial relationships between connecting or adjacent vector features (points, polylines, and polygons). Topological or topology-based data are helpful in detecting and correcting digitizing errors (e.g., two lines in a road vector layer that do not meet perfectly at an intersection). Topology is necessary for carrying out some types of spatial analysis, such as network analysis.
The third type of accuracy is data completeness. Comprehensive inclusion of all features within the spatial database is required to ensure accurate mapping results. Simply put, all the data must be present for a dataset to be accurate. If all of the address points in the city are not represented, our analysis will inevitably be incomplete or insufficient.
In addition to spatial quality, attribute accuracy is a common source of error in spatial data. Attribute errors can occur when an incorrect value is recorded within the attribute field or when a field is missing a value.
A general category of data errors is related to the attributes:
- missing or unknown values,
- lack of diacritics,
- different notations of a given attribute, e.g., avenue – ave – av, street – st – str, etc. Lack of consistency in data models from different sources, e.g., George Washington Boulevard vs. Washington Boulevard,
- lack of an identification number (ID),
- different data formats and/or different units
A common inaccuracy occurs when “0” is mistaken for “null” in an attribute field. This is common in count data where “0” would represent zero findings, while a “null” would represent a locale where no data collection effort was undertaken.
In the case of categorical values, inaccuracies occasionally occur when attributes are mislabeled. For example, a land-use/land-cover map may list a polygon as “agricultural” when it is, in fact, “residential.” This is particularly true if the dataset is outdated, which leads us to our next source of error.
Temporal accuracy addresses the age or timeliness of a dataset. No dataset is ever completely current. It has already become outdated in the time it takes to create the dataset. Regardless, there are several dates to be aware of while using a dataset. These dates should be found within the metadata. The publication date will tell you when the dataset was created and/or released. The field date relates to the date and time the data was collected. To address temporal accuracy, many datasets undergo a regular data update regimen.
As we can see, the role of analytics in determining the accuracy and quality of spatial data is multiple. It directly affects which data will be used for map production, i.e., which data meet the set of quality criteria. For example, road infrastructure should have positional accuracy within +/− 5 meters of their true locations and contain basic attributes like street name, the direction of traffic flow, type of road surface, speed limit, traffic restriction, etc.
With this kind of data, we will have reliable information on the map that we can safely rely on when, for example, we want to use a navigation system to take us safely to the desired destination.
“The role of Data Analytics in the function of Map Quality” Tech Bite was brought to you by Bakir Sujoldžić, Data Analyst at Atlantbh.
Tech Bites are tips, tricks, snippets or explanations about various programming technologies and paradigms, which can help engineers with their everyday job.