With the development and growth of ecommerce platforms like Shopify, the number of small- and medium- sized ecommerce businesses is growing at an impressive rate. But, with this growth comes a growth in market opportunities for the online villains and fraudsters out there who are looking to make a quick buck. It used to be that only huge corporations had the resources they needed to detect fraud and protect themselves from its damages. But, in this era of big data and data science for all, even small mom and pop ecommerce shops have access to the tools they need to protect themselves from evil fraudsters. This article introduces some common sources of fraud problems in ecommerce, and how you can use data science technologies or techniques to protect your business (or soon-to-be business) from risk.
Fraud vulnerabilities in the ecommerce space
On first glance, it’s somewhat difficult to imagine the types of fraud to which a typical ecommerce business is exposed. I mean, you really don’t hear much about ‘ecommerce fraud’, do you? Well, don’t let the silence fool you. As Elli Bishop guestblogged on the Kissmetrics blog, online fraud caused $3.5 Billion of damages to the ecommerce industry in 2012 alone – and the damages are increasing on an annual basis. Let’s take a look at the ways in which fraudsters are defrauding honest, and upstanding online merchants.
Card-Not-Present (CNP) fraud
One serious problem in ecommerce is Card-Not-Present (CNP) fraud. CNP fraud is a type of credit card transaction fraud, oftentimes where fraudsters use stolen card numbers to make online purchases. For each instance that a transaction like this goes undetected or un-prevented, the selling merchant is held financially responsible for refunding the fraudulent credit card charge and for the losses from the merchandise that they’ve already shipped out to the fraudster. Since CNP fraud represents a double loss to selling merchants, most merchants want to do everything they can to prevent fraudulent CNP transactions.
Account Takeovers
Ecommerce businesses are also exposed to fraud problems associated with account takeovers. Account takeovers occur in cases where fraudsters have successfully stolen account credentials and then used those credentials to unlawfully login into clients’ accounts and make fraudulent purchases. As with CNP fraud, ecommerce merchants are left liable for the cost of the merchandise that was shipped out to the fraudster, plus the expense of reimbursing customers for the fraudulent charges that were wrongfully accrued in the account takeover.
Data-science-driven fraud prevention add-ons for use in small business ecommerce
Yikes, ecommerce sounds like it can be risky business, right? Well, that’s not entirely untrue. This is why it’s extremely important to make wise decisions when it comes to your ecommerce solution provider. Ecommerce is hot right now, so new vendors are popping up left and right. While some of these vendors offer very competitively priced packages, more mature vendors offer you an array of options for the support, analytics integrations, and best-in-breed ecommerce security add-ons that you need to keep your business safe, secure, and protected.
…more mature vendors offer you an array of options for the support, analytics integrations, and best-in-breed ecommerce security add-ons that you need to keep your business safe, secure, and protected.
Sift Science and Feedzai are two excellent fraud detection and prevention add-ons for ecommerce businesses. They’re available to all ecommerce businesses that run on the Shopify ecommerce platform. Since Shopify has been in the ecommerce game longer than most other solution providers, they’ve had time to put together a solid suite of support offerings. In fact, Shopify offers over 800 add-on applications that its customers can purchase to make their lives easier and more worry-free.
For ecommerce fraud detection and prevention, the Feedzai add-on application is an excellent selection. Feedzai prides itself on its delivery of powerful machine learning, data science, and big data solutions to help small- and medium- sized businesses. Feedzai’s star offering is a fraud prevention software that runs off of Feedzai’s proprietary risk and fraud detection engine. While of course Feedzai isn’t giving away its secret sauce, they’ve published a white paper that at least clarifies their basic methods. To summarize the white paper in a few brief words, Feedzai has combined behavioral modeling / profiling, machine learning clustering algorithms, and a rule engine to detect and prevent ecommerce fraud for businesses that run on the Shopify platform.
Ecommerce fraud detection with time-series anomaly detection methods
In fact, it’s not reasonable to expect that you can detect and prevent online fraud simply by deploying a few simple machine learning or statistical algorithms. Ecommerce fraud is a lot more complicated than that – you should expect to incorporate behavioral modeling, a rule engine, and some solid domain expertise to even begin moving towards finding a solution that will work. This said, machine learning and statistical algorithms are one essential ingredient.
A simple approach to detecting anomalies in time series
And, what type of algorithms are useful for detecting and preventing ecommerce fraud? That’s a good question – there are many options depending on the approach you’d like to take. You can use time series anomaly detection algorithms to automatically detect suspicious or unusual events and trends as they occur. If your time series is periodic, then you’re likely to get good results by using an aggregating window function and then following that with a k-nearest neighbor algorithm. In R, you can use the following 2 families of window functions to aggregate your time series data.
- Cumulative aggregate window functions in the dplyr package
- Rolling aggregate window functions in the RcppRoll package
As a general approach, you could first you’d divide your time series into windows, and then use a similarity function to calculate an anomaly score for each window. In this manner, it is possible for you to perform automated anomaly detection of time series.
Using k-nearest neighbor algorithms in R
And, just in case you have never used R before, here is a quick intro to get you going in a hurry. The easiest way to set up R on your machine is to download and install the RStudio IDE. The k-nearest neighbor algorithm is in R’s ‘class’ package. You can actually just copy and paste the sample code below to start playing around with classifying data using R’s knn() function.
Example 1
# Window A cases A1=c(6,6) A2=c(5.5,7) A3=c(6.5,5) # Window B cases B1=c(9,8) B2=c(2.2,2.5) B3=c(100,0) # Window C cases C1=c(0,0) C2=c(1,1) C3=c(2,2) # Build a classification matrix from the points in each of the windows train=rbind(A1,A2,A3, B1,B2,B3, C1,C2,C3) # Window labels vector (attached to each class instance) cl=factor(c(rep("A",3),rep("B",3), rep("C",3))) # Specify the object to be classified, i.e., the test case test=c(2, 98) # Load the class package so that you can access the knn() function library(class) # Call the knn() function and get its summary summary(knn(train, test, cl, k = 1))
Output results here indicate that the test case has been classified as part of the Window B cluster.
A B C
0 1 0
Example 2
# Window A cases A1=c(6,6) A2=c(5.5,7) A3=c(6.5,5) # Window B cases B1=c(9,8) B2=c(2.2,2.5) B3=c(100,0) # Window C cases C1=c(0,0) C2=c(1,1) C3=c(2,2) # Build a classification matrix from the points in each of the windows train=rbind(A1,A2,A3, B1,B2,B3, C1,C2,C3) # Window labels vector (attached to each class instance) cl=factor(c(rep("A",3),rep("B",3), rep("C",3))) # The object to be classified, i.e., the test case test=c(6, 6.2) # Load the class package so that you can access the knn() function library(class) # Call the knn() function and get its summary summary(knn(train, test, cl, k = 1))
Output results here indicate that the test case has been classified as part of the Window A cluster.
A B C
1 0 0