In practical machine learning and data science tasks, an ML model is often used to quantify a global, semantically meaningful relationship between two or more values. For example, a hotel chain might want to use ML to optimize their pricing strategy and use a model to estimate the likelihood of a room being booked at a given price and day of the week. For a relationship like this the assumption is that, all other things being equal, a cheaper price is preferred by a user, so demand is higher at a lower price. However what might easily happen is that upon building the model, the data scientist discovers that the model is behaving unexpectedly: for example the model predicts that on Tuesdays, the clients would rather pay $110 than $100 for a room! The reason is that while there is an expected monotonic relationship between price and the likelihood of booking, the model is unable to (fully) capture it, due to noisiness of the data and confounds in it.

Too often, such constraints are ignored by practitioners, especially when non-linear models such as random forests, gradient boosted trees or neural networks are used. And while monotonicity constraints have been a topic of academic research for a long time (see a survey paper on monotonocity constraints for tree based methods), there has been lack of support from libraries, making the problem hard to tackle for practitioners.

Luckily, in recent years there has been a lot of progress in various ML libraries to allow setting monotonicity constraints for the models, including in LightGBM and XGBoost, two of the most popular libraries for gradient boosted trees. Monotonicity constraints have also been built into Tensorflow Lattice, a library that implements a novel method for creating interpolated lookup tables.

## Monotonicity constraints in LighGBM and XGBoost

For tree based methods (decision trees, random forests, gradient boosted trees), monotonicity can be forced during the model learning phase by not creating splits on monotonic features that would break the monotonicity constraint.

In the following example, let’s train too models using LightGBM on a toy dataset where we know the relationship between X and Y to be monotonic (but noisy) and compare the default and monotonic model.

import numpy as np size = 100 x = np.linspace(0, 10, size) y = x**2 + 10 - (20 * np.random.random(size))

Let’s fit a fit a gradient boosted model on this data, setting `min_child_samples`

to 5.

import lightgbm as lgb overfit_model = lgb.LGBMRegressor(silent=False, min_child_samples=5) overfit_model.fit(x.reshape(-1,1), y) #predicted output from the model from the same input prediction = overfit_model.predict(x.reshape(-1,1))

The model will slightly overfit (due to small `min_child_samples`

), which we can see from plotting the values of X against the predicted values of Y: the red line is not monotonic as we’d like it to be.

Since we know that that the relationship between X and Y should be monotonic, we can set this constraint when specifying the model.

monotone_model = lgb.LGBMRegressor(min_child_samples=5, monotone_constraints="1") monotone_model.fit(x.reshape(-1,1), y)

The parameter monotone_constraints=”1″ states that the output should be monotonically increasing wrt. the first features (which in our case happens to be the only feature). After training the monotone model, we can see that the relationship is now strictly monotone.

And if we check the model performance, we can see that not only does the monotonicity constraint provide a more natural fit, but the model generalizes better as well (as expected). Measuring the mean squared error on new test data, we see that error is smaller for the monotone model.

from sklearn.metrics import mean_squared_error as mse size = 1000000 x = np.linspace(0, 10, size) y = x**2 -10 + (20 * np.random.random(size)) print ("Default model mse", mse(y, overfit_model.predict(x.reshape(-1,1)))) print ("Monotone model mse", mse(y, monotone_model.predict(x.reshape(-1,1))))

Default model mse 37.61501106522855

Monotone model mse 32.283051723268265

## Other methods for enforcing monotonicity

Tree based methods are not the only option for setting monotonicity constraint in the data. One recent development in the field is Tensorflow Lattice, which implements lattice based models that are essentially interpolated look-up tables that can approximate arbitrary input-output relationships in the data and which can optionally be monotonic. There is a thorough tutorial on it in Tensorflow Github.

If a curve is already given, monotonic spline can be fit on the data, for example using the splinefun package.

Pingback: Monotonicity constraints in machine learning | Premium Blog! | Development code, Android, Ios anh Tranning IT

Pingback: Monotonicity constraints in machine learning - News Champs

Pingback: Monotonicity constraints in machine learning – AiProBlog.Com

I am really glad that you post again on this blog ðŸ˜€

Can wait to see your next posts.

A PhD student in Biomedicine.

Thanks for sharing this stuff with us!

Pingback: Artificial Intelligence/Machine Learning Roundup #88 | Daily Artificial Intelligence & Machine Learning Curated News

Our skilled property consultants survey native market developments to find the fitting answer to match your needs, whether or not it’s residential, business or for investment purposes.

Great goods from you, man. I’ve understand your stuff previous to and you are just extremely great.

I really like what you have acquired here, really like what

you’re stating and the way in which you say it. You make

it entertaining and you still take care of to keep it sensible.

I cant wait to read far more from you. This is actually a tremendous site.

Thanks for the article.

That reminds me of isotonic regression. Is it the same idea behind ?

Good article !!

Really happy that you’re back!!

Excited to see you posting again, such a treasure trove of useful insights!

Nice article!

I have one question, you said -> “After training the monotone model, we can see that the relationship is now strictly monotone”

But it is not strictly monotone right, I can see there are values or intervals of x where your prediction does not change.

By definition:

Let y=f(x) be a differentiable function on an interval (a,b). If for any two points x1,x2âˆˆ(a,b) such that x1<x2, there holds the inequality f(x1)â‰¤f(x2), the function is called increasing on ths interval.

If there holds the inequality f(x1)<f(x2), the function is called strictly increasing on the interval.

So, for my articular application it should be strictly monotonic at all intervals. Can you tell me how to achieve this in LGBM or any other method?

Thanks

Vineet