Usually, when people talk about statistics, they think only about numbers, but it is much more. 

Statistics is an irreplaceable tool in different fields like data analysis, predictive modeling, quality control, scientific research, decision-making models, etc.

Overall, statistics provide a valuable toolset for understanding, describing, and making predictions about data, but can QA engineers use statistics as a regular tool to increase the value of their validation and verification process? 

There are several ways where data collected from different kinds of testing performed by QA engineers (performance testing, UAT, Root Cause Analysis, Test Coverage Analysis, Defect Tracking, and Management, etc.) can be analyzed with statistical methods and used to identify trends, patterns, potential bottlenecks, and potential bug causes, same as check the overall health of the system and identify areas for improvement. 

Which statistical methods can be used?

There are many statistical methods, but this short article will talk about descriptive statistics, specifically the interquartile range, and how it is used in the day-to-day job of a QA Engineer.

What is the interquartile range?

Interquartile range IQR (midspread, middle 50%is one of the common measures of dispersion in descriptive statistics. It is defined as the difference between the 75th and 25th percentiles of the data. 

IQR = Q3 – Q1 = qn(0.75) – qn(0.25)

To calculate IQR, dividing the data set into quartiles is necessary. Quartile Q1 is known as the lower quartile, Q2 as the median, and Q3 as the upper quartile. Q1, lower quartile, corresponds with the 25th percentile, and Q3, upper quartile, corresponds with the 75th percentile, and Q2 is the median.

Figure 1. Boxplot with an interquartile range

Figure 1. Boxplot with an interquartile range [source]

Who can use IQR?

IQR can be used by QA Engineers and Analysts that work with big data sets, or they collect a lot of data through complex testing procedures and want to analyze it. This can help teams indicate software, tests, or requirements issues.

When should QA Engineers use IQR?

When you need a tool that is:

  • Robust – IQR considers only the middle 50% of data, so outliers and extreme values do not affect it.
  • Simple – The difference between a dataset’s 75th and 25th percentiles is all that is required to calculate IQR. It is simple for QA engineers to use because it involves identifying only data points that differ significantly from the majority of data points in a dataset. 

And enables:

  • Data visualization simplicity – Through box plots, data distribution can be easily displayed. IQR is a key component of it. Box plots display the IQR as a box and the whiskers, making it easy to identify the spread of the middle 50% of the data and any outliers.
  • Comparative analysis – When comparing two or more datasets, the IQR can reveal how the data’s variability differs between the groups. If one group has a higher IQR than another, it indicates that the data in that group is more dispersed and thus more variable or diverse. This information can be useful when attempting to understand the differences or similarities between groups.

Where can IQR be used by QA Engineers?

  • QA Engineers can use IQR to detect outliersIn statistics, an outlier is a data point that differs significantly from other observations. Outliers can show up for different reasons: internal system errors, nature of data, communication between systems, etc. 
  • Outliers in Test Data: A QA Engineer can use IQR to identify outliers in test data, such as data that falls outside the expected range or is not representative of real-world usage, assisting them in ensuring that test cases are comprehensive and accurate.
  • Data collected through manual or automation tests, e.g., performance testing, can use the IQR to measure the performance of software or systems, including response time and resource utilization. The IQR can be used to determine the performance data’s spread and identify outliers that may indicate issues with the software or systems being tested.
  • The IQR is a key component of box plots, which are a useful visual representation of data distribution. Box plots can help QA engineers quickly identify the spread of the data and any outliers, making it easier to identify trends and patterns.

Show me how!?

Outlier and IQR can be used in different ways by QA Engineers. This depends a lot on the project development and testing life cycle, phases, and current project needs. We will use JavaScript code to explain the usage of IQR and outliers in the automation process. 

1. Define the data set – creating an array of numbers representing the data.

let data = [1400, 2000, 1600, 1550, 2400, 1700, 1900, 1500, 3950, 1100];

2. Sort data – the data set needs to be sorted in ascending or descending order: 

data.sort((a, b) => a - b); 
// [1100, 1400, 1500, 1550, 1600, 1700, 1900, 2000, 2400, 3950]

3. Calculate the quartiles:

a. The first quartile (Q1) –  separates the lowest 25% of the data from the rest;

b. The third quartile (Q3) – separates the lowest 75% of the data from the highest 25%.

const quartile1 = data[Math.floor((data.length / 4))]; //1500
const quartile3 = data[Math.ceil(data.length * (3/4))]; //2400

4. Calculate the IQR:

let iqr = quartile3 - quartile1; //900

5. Define the lower and upper bounds – The lower bound is defined as Q1 – 1.5 * IQR and the upper bound is defined as Q3 + 1.5 * IQR. Outliers are typically defined as values that fall outside of these bounds.

let lowerBound = quartile1 - 1.5 * iqr; //150
let upperBound = quartile3 + 1.5 * iqr; //3750

6. Identify and return outliers – The outliers can be identified by comparing each value in the data set to the lower and upper bounds. If a value is less than the lower bound or greater than the upper bound, it is considered an outlier.

let outliers = data.filter(value => value < lowerBound || value > upperBound);
return outliers; //Outliers: 3950

7. Document results – It is very common that values identified as outliers should be documented. In these situations, the Node.js file system module can be very useful, allowing users to work with the file system. In this specific case, we can make an output CSV file (e.g., outliers.csv) using the fs package. Using the join(‘,’) function, which separates each value with a comma, the outliers are combined into a string. Then, fs.writeFileSync is used to write the returned string to the file. Finally, a message confirming that the export was successful is logged to the console.

const fs = require('fs');
let data = outliers.join(',');
fs.writeFileSync('outliers.csv', data, function (err) {
    if (err) throw err;
    console.log('File is created successfully.');
});

 

Conclusion

Understanding and using the IQR and outliers in software testing can be important for ensuring the validity and reliability of test results. Outliers can indicate issues with the data, such as data entry errors, measurement errors, or other anomalies. Detecting and addressing these outliers can improve the accuracy and reliability of the test results. This might lead to quicker detection of potential data errors and enhance the general effectiveness of software testing.


“IQR in Automation Testing: Unleashing the Data Analytics Potential” Tech Bite was brought to you by Nikola Klačar, Junior Quality Assurance Engineer at Atlantbh.

Tech Bites are tips, tricks, snippets or explanations about various programming technologies and paradigms, which can help engineers with their everyday job.

selenium
QA/Test AutomationTech Bites
December 22, 2023

Selenium Grid 4 with Docker

Introduction When talking about automation testing, one of the first things that comes to mind is Selenium. Selenium is a free, open-source automated testing framework used to validate web applications across different browsers and platforms. It is not just a single tool but a suite of software. Every component of…

Want to discuss this in relation to your project? Get in touch:

Leave a Reply