The intersection of people and numbers
Analytics and the quality of your questions
Statistics and the quality of your answers
Synthesis and the quality of your decisions
Effective data science in the retail industry
The intersection of people and numbers
Data science is not something that happens separately from people, data science is something used to understand people – a space where people and information intersect to engender enlightenment.
When we think of retail, data science is not always the first thought that comes to mind. But retail is an industry that is particularly adept at seeking to understand its consumers and utilizing data to inform decisions on how to best meet those consumers’ needs. Given this context, the science of understanding data and the science of understanding consumers are symbiotic. This mergence prevents data science problems in retail – problems that value numbers above people – and supports the consumer in ways that are essential to the retail ecosystem.
Data science in retail is best equipped to generate this symbiosis of data and people when it utilizes three important elements: analytics, statistics, and decision science.
Analytics and the quality of your questions
The terms analytics and statistics are often used interchangeably, and other times they are debated with voracity. As Cassie Kozyrov, Chief Decision Scientist at Google notes, “Statistics and analytics are two branches of data science that share many of their early heroes, so the occasional beer is still dedicated to lively debate about where to draw the boundary between them.” Nonetheless they each represent a different branch of data science that is relevant to accruing robust business insights. Kozyrov goes on to explain, “Analytics helps you form the hypotheses. It improves the quality of your questions. Statistics helps you test hypotheses. It improves the quality of your answers.”
Having a strong knowledge base to draw from impacts the caliber of the questions, but identifying gaps in that knowledge base can also highlight which questions still need answers. How well do you know your business, product, consumer, industry, trends, competitors? What is missing from your understanding? Can you identify what are your business opportunities and challenges. What are your options? Then use that information to funnel your questions into a cohesive, succinct hypothesis.
Creating such a hypothesis can be a science in and of itself. While all questions have some degree of validity, not every question is useful. To formulate a strong hypothesis, it helps to understand the different kinds of questions:
- Rhetorical questions vs. exploratory questions. A rhetorical question is less concerned with finding an answer and more concerned with making a point. Rhetorical questions already seem to know an answer beforehand. Hypothesis can be rhetorical when the retailers asking the question believe they already know the outcome and are just going through the motions of testing to validate that answer rather than to understand all relevant data. This type of question runs a high risk of introducing bias into the testing process. Exploratory questions seek to investigate rather than corroborate. They want information more than endorsement.
Rhetorical question: “Why wouldn’t revenue increase when prices increase?”
Exploratory question: “What impact do changes in price have on the business.” - Vague questions vs. clear questions. In contrast to a rhetorical question that already believes it has the answer, vague questions have an unclear purpose or direction. They are non-specific in a way that leaves too much room for confusion. Clear questions are concise and focused
Vague question: “Do consumers like our new loyalty program?”
Clear question: “Do customers in group A respond to loyalty efforts better than those in group B?” - Leading questions vs. open-ended questions. Leading questions are designed to guide the answer to a desired end. Open-ended questions, while still specific, are neither targeted nor presumptive. They leave the door open for an unexpected – and sometimes even undesired – response.
Leading question: “Why do consumers prefer a reduced wait time?” This presupposes consumer preference.
Open-ended question: “How does a reduced wait time at checkout impact customer satisfaction?”
Statistics and the quality of your answers
Once the question has been maximized, the next step is ensuring answers are robust, reliable, and accurate. For data science in retail, consider the following factors:
- Representativeness. How well does the group being tested represent the rollout group? To prevent bias, finding an accurate representative sample matters. If the rollout group is primarily from one socio-economic background, but you test a sample from another socio-economic background, the test results will not align with the rollout results.
- Control. To know if the changes in the desired lift are the result of test changes, a congruous control group is essential.
- Confidence. The number of test stores utilized impacts statistical significance. Having the right testing methodologies in place helps determine the requisite number of stores needed for high confidence.
When it comes to data science in the retail industry, therefore, the question should not be either-or. The answer to supporting consumer needs is not based on analytics or statistics. It is based on analytics and statistics. Kozyrkov concludes, “Choosing between good questions and good answers is painful . . . so if you can afford to work with both types of data professional, then hopefully it’s a no-brainer.” The same applies to retail data science software – if you can find a platform that integrates both quality questions and quality answers, then hopefully it’s a no-brainer.
Synthesis and the quality of your decisions
Once questions have been asked and answered, the next step in the data science process is making quality decisions. Data science and decision science are also often segregated into two camps. But these disciplines are interconnected. Data science problems in retail often stem from placing data into a silo. Rather than considering the full scope of information, decision-makers isolate the data and fail to integrate it into applicable contexts.
For example, sometimes sales increases in one category or basket will lead to deficiencies in other categories or baskets. If the decision to rollout the initiative only focuses on that singular category, that initiative may lead to overall sales losses. Another example would be if the data shows that one geographic area is receptive to a change, but other geographic areas are not receptive, yet rollout is conducted fleetwide, that decision could fail.
The best decisions take all the information into account. They consider the people involved, the resources needed and available, the initial knowledge base, the hypothesis, the results, the domino effect of those results, the ROI, and the best forum for rollout success. Just as integrating analytics and statistics is a no-brainer, having a dedicated client-success team as part of your decision making process is a no-brainer when it comes to to data processing and synthesis.
Perhaps the most important piece to the process, however, is recognizing that scientific discovery is not linear; it’s cyclical. The best hypotheses, answers, and decisions generate more questions, leading to further evolution of knowledge and understanding.
Effective data science in the retail industry
As covid quarantines wore down, a convenience store rolled out a new fuel pay app. While they had put effort and money into the design, the lift in sales was not what they had initially hoped, and they began to second-guess themselves, and it was difficult to differentiate if the lift they were seeing was the result of fewer quarantines or the result of app usage. They started with a clear, cohesive question: What is impacting sales outcomes? The app, lifted quarantines, or both? Then they utilized effective testing tools to sift through back-data and identify a representative sample and control group based on pre-pandemic user behavior. With this methodology in place, they were able to differentiate post-quarantine behavior from app usage. They learned that the app was being utilized, and they made the decision to continue rollout – a decision that paid off for both the company and the consumer. The new app met a consumer need and that resulted in higher revenue for the company.
On a sheet of paper, consumers can be seen as just numbers. But data science in the retail industry transforms those numbers back into people – people with quantifiable needs. It asks the right questions to understand consumer behavior. It utilizes effective methodologies and software solutions to achieve high statistical confidence in results. Then it looks at the complete picture to make decisions that provide consumers with productive solutions, and enrich retail success.
If you are interested in learning more about effective data science testing solutions for brick-and-mortar retailers, check out these links:
The surprising power of a testable hypothesis
How to create a statistically valid test
Minimize bias and maximize your testing results
Unlock the value of your data with A/B testing