Diving into data

A blog on machine learning, data mining and visualization

Main menu

Skip to content
  • About

Monotonicity constraints in machine learning

September 16, 2018

In practical machine learning and data science tasks, an ML model is often used to quantify a global, semantically meaningful relationship between two or more values. For example, a hotel chain might want to use ML to optimize their pricing … Continue reading →

Posted in Data science, Machine learning | Replies: 19

Random forest interpretation – conditional feature contributions

October 24, 2016

In two of my previous blog posts, I explained how the black box of a random forest can be opened up by tracking decision paths along the trees and computing feature contributions. This way, any prediction can be decomposed into … Continue reading →

Posted in Random forest | Replies: 25

Histogram intersection for change detection

February 28, 2016
The need for anomaly and change detection will pop up in almost any data driven system or quality monitoring application. Typically, there are set of metrics that need to be monitored and an alert raised if the values deviate from … Continue reading →
Posted in Change detection, Data science | Replies: 14

Who are the best MMA fighters of all time. A Bayesian study

December 22, 2015

Like with any sport, the question of who are the best competitors of all time in Mixed Martial Arts (MMA) is something that is hotly debated among MMA fans. And unlike for tournament based sports such as tennis, or sports … Continue reading →

Posted in Bayesian analysis, Data science, Machine learning | Replies: 20

First Estonian Machine Learning Meetup

November 24, 2015

Today, we had the first event of the Estonian Machine Learning Meetup series. I was quite baffled by the pretty massive turnout, with more than a hundred people attending, indicating that such an event series is long overdue. So props … Continue reading →

Posted in Machine learning, Random forest | Replies: 1

7 tools in every data scientist’s toolbox

October 15, 2015

There is huge number of machine learning methods, statistical tools and data mining techniques available for a given data related task, from self organizing maps to Q-learning, from streaming graph algorithms to gradient boosted trees. Many of these methods, while … Continue reading →

Posted in Data science, Machine learning | Replies: 11

Random forest interpretation with scikit-learn

August 12, 2015

In one of my previous posts I discussed how random forests can be turned into a “white box”, such that each prediction is decomposed into a sum of contributions from each feature i.e. .I’ve a had quite a few requests … Continue reading →

Posted in Machine learning, Random forest | Replies: 50

Prediction intervals for Random Forests

June 2, 2015

An aspect that is important but often overlooked in applied machine learning is intervals for predictions, be it confidence or prediction intervals. For classification tasks, beginning practitioners quite often conflate probability with confidence: probability of 0.5 is taken to mean … Continue reading →

Posted in Confidence intervals, Random forest | Replies: 37

Which topics get the upvote on Hacker News?

February 13, 2015

Hacker News is a popular social news website, mostly covering technology and startup topics. It relies on user submissions and moderation, where each submitted story can be upvoted and commented by users, which in term determines whether the story reaches … Continue reading →

Posted in NLP, Topic modelling | Replies: 3

Selecting good features – Part IV: stability selection, RFE and everything side by side

December 20, 2014

In my previous posts, I looked at univariate methods,linear models and regularization and random forests for feature selection.In this post, I’ll look at two other methods: stability selection and recursive feature elimination (RFE), which can both considered wrapper methods. They … Continue reading →

Posted in Feature selection, Machine learning | Replies: 45

Post navigation

← Older posts

Recent Posts

  • Monotonicity constraints in machine learning
  • Random forest interpretation – conditional feature contributions
  • Histogram intersection for change detection
  • Who are the best MMA fighters of all time. A Bayesian study
  • First Estonian Machine Learning Meetup

Archives

  • September 2018
  • October 2016
  • February 2016
  • December 2015
  • November 2015
  • October 2015
  • August 2015
  • June 2015
  • February 2015
  • December 2014
  • November 2014
  • October 2014

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org