iStock_000072216425_Large-800x599

Data Science and Machine Learning for Preventing Fraud in Mom and Pop Ecommerce Shops

Picture of Lillian Pierson, P.E.

Lillian Pierson, P.E.

Reading Time: 5 minutes

With the development and growth of ecommerce platforms like Shopify, the number of small- and medium- sized ecommerce businesses is growing at an impressive rate. But, with this growth comes a growth in market opportunities for the online villains and fraudsters out there who are looking to make a quick buck. It used to be that only huge corporations had the resources they needed to detect fraud and protect themselves from its damages. But, in this era of big data and data science for all, even small mom and pop ecommerce shops have access to the tools they need to protect themselves from evil fraudsters. This article introduces some common sources of fraud problems in ecommerce, and how you can use data science technologies or techniques to protect your business (or soon-to-be business) from risk.

Fraud vulnerabilities in the ecommerce space

On first glance, it’s somewhat difficult to imagine the types of fraud to which a typical ecommerce business is exposed. I mean, you really don’t hear much about ‘ecommerce fraud’, do you? Well, don’t let the silence fool you. As Elli Bishop guestblogged on the Kissmetrics blog, online fraud caused $3.5 Billion of damages to the ecommerce industry in 2012 alone – and the damages are increasing on an annual basis. Let’s take a look at the ways in which fraudsters are defrauding honest, and upstanding online merchants.

Card-Not-Present (CNP) fraud

One serious problem in ecommerce is Card-Not-Present (CNP) fraud. CNP fraud is a type of credit card transaction fraud, oftentimes where fraudsters use stolen card numbers to make online purchases. For each instance that a transaction like this goes undetected or un-prevented, the selling merchant is held financially responsible for refunding the fraudulent credit card charge and for the losses from the merchandise that they’ve already shipped out to the fraudster. Since CNP fraud represents a double loss to selling merchants, most merchants want to do everything they can to prevent fraudulent CNP transactions.

Account Takeovers

Ecommerce businesses are also exposed to fraud problems associated with account takeovers. Account takeovers occur in cases where fraudsters have successfully stolen account credentials and then used those credentials to unlawfully login into clients’ accounts and make fraudulent purchases.  As with CNP fraud, ecommerce merchants are left liable for the cost of the merchandise that was shipped out to the fraudster, plus the expense of reimbursing customers for the fraudulent charges that were wrongfully accrued in the account takeover.

Data-science-driven fraud prevention add-ons for use in small business ecommerce

Yikes, ecommerce sounds like it can be risky business, right? Well, that’s not entirely untrue. This is why it’s extremely important to make wise decisions when it comes to your ecommerce solution provider. Ecommerce is hot right now, so new vendors are popping up left and right. While some of these vendors offer very competitively priced packages, more mature vendors offer you an array of options for the support, analytics integrations, and best-in-breed ecommerce security add-ons that you need to keep your business safe, secure, and protected.

…more mature vendors offer you an array of options for the support, analytics integrations, and best-in-breed ecommerce security add-ons that you need to keep your business safe, secure, and protected.

Sift Science and Feedzai are two excellent fraud detection and prevention add-ons for ecommerce businesses. They’re available to all ecommerce businesses that run on the Shopify ecommerce platform. Since Shopify has been in the ecommerce game longer than most other solution providers, they’ve had time to put together a solid suite of support offerings. In fact, Shopify offers over 800 add-on applications that its customers can purchase to make their lives easier and more worry-free.

For ecommerce fraud detection and prevention, the Feedzai add-on application is an excellent selection. Feedzai prides itself on its delivery of powerful machine learning, data science, and big data solutions to help small- and medium- sized businesses. Feedzai’s star offering is a fraud prevention software that runs off of Feedzai’s proprietary risk and fraud detection engine. While of course Feedzai isn’t giving away its secret sauce, they’ve published a white paper that at least clarifies their basic methods. To summarize the white paper in a few brief words, Feedzai has combined behavioral modeling / profiling, machine learning clustering algorithms, and a rule engine to detect and prevent ecommerce fraud for businesses that run on the Shopify platform.

Ecommerce fraud detection with time-series anomaly detection methods

In fact, it’s not reasonable to expect that you can detect and prevent online fraud simply by deploying a few simple machine learning or statistical algorithms. Ecommerce fraud is a lot more complicated than that – you should expect to incorporate behavioral modeling, a rule engine, and some solid domain expertise to even begin moving towards finding a solution that will work. This said, machine learning and statistical algorithms are one essential ingredient.

A simple approach to detecting anomalies in time series

And, what type of algorithms are useful for detecting and preventing ecommerce fraud? That’s a good question – there are many options depending on the approach you’d like to take. You can use time series anomaly detection algorithms to automatically detect suspicious or unusual events and trends as they occur.   If your time series is periodic, then you’re likely to get good results by using an aggregating window function and then following that with a k-nearest neighbor algorithm. In R, you can use the following 2 families of window functions to aggregate your time series data.

As a general approach, you could first you’d divide your time series into windows, and then use a similarity function to calculate an anomaly score for each window. In this manner, it is possible for you to perform automated anomaly detection of time series.

Using k-nearest neighbor algorithms in R

And, just in case you have never used R before, here is a quick intro to get you going in a hurry. The easiest way to set up R on your machine is to download and install the RStudio IDE. The k-nearest neighbor algorithm is in R’s ‘class’ package. You can actually just copy and paste the sample code below to start playing around with classifying data using R’s knn() function.

Example 1

# Window A cases
A1=c(6,6) 
A2=c(5.5,7) 
A3=c(6.5,5) 
 
# Window B cases
B1=c(9,8)
B2=c(2.2,2.5)
B3=c(100,0)

# Window C cases
C1=c(0,0)
C2=c(1,1)
C3=c(2,2)

# Build a classification matrix from the points in each of the windows
train=rbind(A1,A2,A3, B1,B2,B3, C1,C2,C3)

# Window labels vector (attached to each class instance)
cl=factor(c(rep("A",3),rep("B",3), rep("C",3)))

# Specify the object to be classified, i.e., the test case
test=c(2, 98)

# Load the class package so that you can access the knn() function
library(class)

# Call the knn() function and get its summary
summary(knn(train, test, cl, k = 1))

Output results here indicate that the test case has been classified as part of the Window B cluster.

A B C

0 1 0

Example 2

# Window A cases
A1=c(6,6)
A2=c(5.5,7)
A3=c(6.5,5)

# Window B cases
B1=c(9,8)
B2=c(2.2,2.5)
B3=c(100,0)

# Window C cases
C1=c(0,0)
C2=c(1,1)
C3=c(2,2)

# Build a classification matrix from the points in each of the windows
train=rbind(A1,A2,A3, B1,B2,B3, C1,C2,C3)

# Window labels vector (attached to each class instance)
cl=factor(c(rep("A",3),rep("B",3), rep("C",3)))

# The object to be classified, i.e., the test case
test=c(6, 6.2)

# Load the class package so that you can access the knn() function
library(class)

# Call the knn() function and get its summary
summary(knn(train, test, cl, k = 1))

Output results here indicate that the test case has been classified as part of the Window A cluster.

A B C

1 0 0

HI, I’M LILLIAN PIERSON.
I’m a fractional CMO that specializes in go-to-market and product-led growth for B2B tech companies.
Apply To Work Together
If you’re looking for marketing strategy and leadership support with a proven track record of driving breakthrough growth for B2B tech startups and consultancies, you’re in the right place. Over the last decade, I’ve supported the growth of 30% of Fortune 10 companies, and more tech startups than you can shake a stick at. I stay very busy, but I’m currently able to accommodate a handful of select new clients. Visit this page to learn more about how I can help you and to book a time for us to speak directly.
Get Featured

We love helping tech brands gain
exposure and brand awareness among our active audience of 530,000 data professionals. If you’d like to explore our alternatives for brand partnerships and content collaborations, you can reach out directly on this page and book a time to speak.

Join The Convergence Newsletter
See what 26,000 other founders, leaders, and operators have discovered from the advanced AI-led growth initiatives, data-driven marketing strategies & executive insights that I only share inside this free community newsletter.
HI, I’M LILLIAN PIERSON.
I’m a fractional CMO that specializes in go-to-market and product-led growth for B2B tech companies.
Apply To Work Together
If you’re looking for marketing strategy and leadership support with a proven track record of driving breakthrough growth for B2B tech startups and consultancies, you’re in the right place. Over the last decade, I’ve supported the growth of 30% of Fortune 10 companies, and more tech startups than you can shake a stick at. I stay very busy, but I’m currently able to accommodate a handful of select new clients. Visit this page to learn more about how I can help you and to book a time for us to speak directly.
Get Featured
We love helping tech brands gain exposure and brand awareness among our active audience of 530,000 data professionals. If you’d like to explore our alternatives for brand partnerships and content collaborations, you can reach out directly on this page and book a time to speak.
Join The Convergence Newsletter
See what 26,000 other data professionals have discovered from the powerful data science, AI, and data strategy advice that’s only available inside this free community newsletter.
By subscribing you agree to Substack’s Terms of Use, our Privacy Policy and our Information collection notice