Automating machine learning for platform fraud detection

July 23, 2015 Risk & Fraud
Jun He
By Jun He,
Jun He
By Jun He,

When you’re where the money is, bad guys will target you. It’s why criminals rob banks despite their armed guards, reinforced vaults, and state-of-the-art security systems.  And it’s why fraudsters now target platforms — either to gain access to their payments data or to run stolen credit card numbers through them.

Detection is the name of the game: you have to be able to spot fraud with a high degree of accuracy so that you can shut it down before it results in a loss. That means technology and people. It means rules engines that flag suspicious behavior and experienced risk professionals who can then dig into those suspicious behaviors. At WePay, it increasingly also means machine learning models which can spot complicated fraud patterns faster with less human intervention.

This is something we talked a bit about a few months ago, in a blog post about machine learning and shell selling. Today, we’re looking at another challenge we face as we use machine learning to fight fraud in the real world: adapting our models quickly enough to keep up with the attacks we face.

Fraud doesn’t stand still

In payments, we tend to talk about fraudsters like a monolith and fraud like a natural law: “fraudsters do this” or “fraud was at this level this month.” It’s often easier to think of fraud this way.

Yet the reality is that fraud isn’t perpetuated by a small, tight-knit group criminals; rather, it’s often the work of a huge range of individuals all over the world, working alone and in groups. As a class, these fraudsters are smart, resourceful individuals, and the fraud they perpetuate is born of them working full time to find and exploit weaknesses in the platforms they target. They aren’t just mindlessly repeating the same behaviors over and over again — they are probing defenses and adapting to them. Plug one hole, and they’ll find another.

Said another way: fraud is constantly changing.

Machine learning models are great for spotting fraud, but they aren’t psychic — they rely on past data to make predictions about the transactions they’re currently looking at. Since the patterns aren’t constant, that means they go out of date quickly.

In fact, experience has now taught us that even the most accurate model has a shelf-life of about a month without being refreshed. Beyond the month, its accuracy may drop by 50%, and will continue to slowly decrease after that.

Refreshing models is hard

So if models don’t last long, the key is to refresh them constantly. But this approach presents its own challenges.

Each refresh is a major undertaking if not properly managed. Retraining a model by running the full machine learning pipeline can take hours. This includes extraction and transformation (ETL) of incremental new data, feature creation and engineering, model training, performance evaluation, and model deployment.

Even after the model is trained, deployment can itself be a bottleneck. When models are trained in traditional machine learning languages like R or SAS, the models and the whole data pipeline have to be translated to a production language like Java — something that can add weeks of engineering team work. Some companies reduce the burden by opting for simpler models like logistic regression; however such an approach is too rudimentary for complex and high stakes fraud detection.

There’s also one more complicating factor: the newest data might not be the most useful for model training purposes because new fraud can take time to mature — it can often take two or more months for a cardholder to see and report fraud. This means new data can be labeled good before it’s seen as bad, and training models with the latest data can actually hurt model accuracy.

Automation: how we hit a moving target

In the face of these challenges, WePay has been able to build a machine learning architecture that allows us to refresh the models daily without overburdening our engineering team or degrading our performance. We’ve done this by being smart about automating services and building with an eye toward speed and precision.

First, we’ve automated the retraining process to the point that it can be performed daily with relatively little work by our data science team. Our retraining process looks like this:

  • Pull new, incremental retraining data daily
  • Refresh the model by running it again with combined new and existing fraud data
  • Test the new models, evaluating each on Area Under Curve (AUC), precision and recall
  • Transfer models that meet initial test criteria into a pseudo-production environment for additional assessment against test cases
  • Deploy upon satisfactory completion of all performance and test case validation

Python makes automation easy

In order to make model deployment easier, we chose Python as both our model development and production language. Although many machine learning researchers work in more specialized scientific languages like R and SAS, we’ve found these to be a poor fit. I don’t want this to seem like a knock on R and SAS, which are great at what they do. But using models trained in a specialized language would mean translating them into a production-friendly language like Java before they were deployed — an extra step that creates significant engineering overhead and introduces another opportunity for errors that could degrade our model performance.

Python, on the other hand, works in both development and in production. Using Flask or Django, Python allows machine learning models to be run as isolated web services which can be dynamically spun up to handle more requests easily.

Python is also a very capable machine learning language in its own right. Most necessary functionality is provided by powerful libraries like NumPy, SciPy, SciKit-Learn and Pandas. At WePay, we use SciKit-Learn, which implements most popular algorithms through a uniform API interface. One may treat SciKit-Learn trained models as black box that can be used anywhere, in the same way, regardless of the algorithm.

In other words, the entire model development and deployment cycle is self-contained in Python. Just copy the model files to production instance and import the same libraries in production as in development, and you are almost good to go!

Of course, migration from development to production may still involve conversion of data pipeline code, depending on how the machine learning pipeline is designed. With that in mind, we use a unified Python code base for both model development and production.

Basically, every step in data processing pipeline is abstracted as a class. Those classes can can be run for both batch processing a file in development and for single record scoring in production without having to change a single line of code.

This means there is almost zero migration of code from model development to production. In practice, the only material difference is whether we loop through many lines of a file in development, or just do it once for a single line of signals in production.

Putting it all together

We’re able to do daily, incremental refreshes of the models running in production thanks to the efficiencies provided by our approach to machine learning. These refreshes add up to a constantly refined machine learning system, with total retraining happening over a time window of a few months.

Of course, there’s still the matter of how to deal with false negatives in the most recent training sets — recall that we may not know a given transaction is fraud for several months.

This is actually an easier fix than you might expect. When we’re training our models, we simply exclude transactions flagged as good in the most recent time period while including every transaction flagged as fraud that we can. This lets us train on data that includes the most recent fraud patterns while also not contaminating our model with bad data.


Automation has not only made our fraud detection better by enabling the models to keep up with the constantly changing attacks we’re targeted by, it’s also made life a lot easier in the data science department. We’re not spending as much time on routine training and deployment tasks, which has freed us up to work on more interesting, higher-value work. And that’s key because, again, fraud doesn’t stand still. If we’re to be successful in fighting crime and protecting our customers’ money, we must constantly be working to improve our approach, explore new techniques, and create new systems that let us tackle newer and more sophisticated attacks.

About the author

Jun He

Jun He,

Jun He was previously Lead Data Scientist at WePay. He is now Principal Data Scientist at Walmart Labs.

More blog posts by Jun He