Draft Blog Entries

% fortune -ae paul murphy

A challenge in fraud detection

My wife went to a conference recently where she saw a demonstration of an open source fraud detection utility called picalo.

It's very cool - here's what the primary developer, Dr. C. Albrect at BYU, says about it on the project website:

Picalo is a data analysis application, with focus in fraud detection and data retrieved from corporate databases. It is also the foundation for an automated fraud detection system (see below).
Picalo is currently focused on data analysis for fraud and corruption detection. However, it is an open framework that could actually be used for many different types of data analysis: network logs, scientific data, any type of database-oriented data, and data mining.

Picalo is a front end for applications that detect patterns in data - so if you know the pattern that signals the kind of fraud you're interested in, that should also tell you what data to look at, and therefore functionally what has to go into a Picalo "detectlet" designed to spot it.

For example, the most common kind of bid rigging in the large scale construction industry is thought to be bidder collusion - cases where some of the bidders agree whose turn it is to win and then co-ordinate their bids accordingly. That form of fraud is, I'm told, signalled by a prevalence of constant percentage differences between winning and losing bids - because the guys who expect to lose don't put a lot of effort into their bids and so construct them to be x% less, on average, than the winner's.

In that situation the data you want comes from the detailed cost tables in the bids, and the pattern you're looking for is consistency in percentage difference between winners and losers.

One of Picalo's greatest strengths derives from the fact that you can load and test a detectlet using the GUI, but then capture everything you need to rerun it in a script for use from a command line - or a crontab. As a result you can think of Picalo as more of debugging environment for detectlets than a run-time tool and therefore use it to automate many different kinds of pattern checking.

Now if you think of hardware or software failure as a kind of fraud - after all, whoever made it promised that it would work - you can see the applicability to multiple sysadmin tasks - and not just to already automated things like reviewing your error logs, but to hard to do stuff like spotting the user who likes to delete key files and then have other people in her work area phone you to ask they be recovered.

Tomorrow, I'm going to talk about IT bidding and related frauds, but for today I'd like to leave you with this challenge: what forms of fraud do users perpetrate against IT, and what patterns do the frauds leave in what data?

Paul Murphy wrote and published The Unix Guide to Defenestration. Murphy is a 25-year veteran of the I.T. consulting industry, specializing in Unix and Unix-related management issues.