Highest Rated Comments


ricklen13 karma

Hi Andriy,

I'm going to do my Master Thesis about anomaly / outlier detection in transactional data, consisting out of 100.000 up until 1.000.000 records. The goal is to be able to detect anomalies which can be an indicator of fraud. The features of the data mainly consist out of categorical data and a few (two) are numerical. Can you recommend an algorithm or technique on approaching this case?

The main techniques I came across are: Autoencoder neural nets, K-NN, One-class SVM, Principal component analysis, Isolation forests.

Furthermore one specific algorithm for categorical data: K-Modes.

Most algorithms require me to transform the data to numerical data (embeddings / one-hot). Maybe you can recommend me a good approach I haven't read about.

I've asked about this problem before on another topic in this subreddit and people also recommended me a Bayesian approach, but I haven't checked this out. I don't know much about Bayesian approaches. Do you think it can be effective in outlier detection in mainly categorical data?

Thank you in forward! By the way, I really liked your book!

ricklen2 karma

Hey! Cool maybe send me a PM and tell me something about the details. I’m also diving into TF at this moment. But maybe we can inform each other of some progress / interesting findings!