Monotonicity constraints in machine learning

In practical machine learning and data science tasks, an ML model is often used to quantify a global, semantically meaningful relationship between two or more values. For example, a hotel chain might want to use ML to optimize their pricing strategy and use a model to estimate the likelihood of a room being booked at a given price and day of the week. For a relationship like this the assumption is that, all other things being equal, a cheaper price is preferred by a user, so demand is higher at a lower price. However what might easily happen is that upon building the model, the data scientist discovers that the model is behaving unexpectedly: for example the model predicts that on Tuesdays, the clients would rather pay $110 than $100 for a room! The reason is that while there is an expected monotonic relationship between price and the likelihood of booking, the model is unable to (fully) capture it, due to noisiness of the data and confounds in it.

Too often, such constraints are ignored by practitioners, especially when non-linear models such as random forests, gradient boosted trees or neural networks are used. And while monotonicity constraints have been a topic of academic research for a long time (see a survey paper on monotonocity constraints for tree based methods), there has been lack of support from libraries, making the problem hard to tackle for practitioners.

Luckily, in recent years there has been a lot of progress in various ML libraries to allow setting monotonicity constraints for the models, including in LightGBM and XGBoost, two of the most popular libraries for gradient boosted trees. Monotonicity constraints have also been built into Tensorflow Lattice, a library that implements a novel method for creating interpolated lookup tables.

Monotonicity constraints in LighGBM and XGBoost

For tree based methods (decision trees, random forests, gradient boosted trees), monotonicity can be forced during the model learning phase by not creating splits on monotonic features that would break the monotonicity constraint.

In the following example, let’s train too models using LightGBM on a toy dataset where we know the relationship between X and Y to be monotonic (but noisy) and compare the default and monotonic model.

import numpy as np
size = 100
x = np.linspace(0, 10, size) 
y = x**2 + 10 - (20 * np.random.random(size))


Let’s fit a fit a gradient boosted model on this data, setting min_child_samples to 5.

import lightgbm as lgb
overfit_model = lgb.LGBMRegressor(silent=False, min_child_samples=5)
overfit_model.fit(x.reshape(-1,1), y)

#predicted output from the model from the same input
prediction = overfit_model.predict(x.reshape(-1,1))

The model will slightly overfit (due to small min_child_samples), which we can see from plotting the values of X against the predicted values of Y: the red line is not monotonic as we’d like it to be.

Since we know that that the relationship between X and Y should be monotonic, we can set this constraint when specifying the model.

monotone_model = lgb.LGBMRegressor(min_child_samples=5, 
                                   monotone_constraints="1")
monotone_model.fit(x.reshape(-1,1), y)

The parameter monotone_constraints=”1″ states that the output should be monotonically increasing wrt. the first features (which in our case happens to be the only feature). After training the monotone model, we can see that the relationship is now strictly monotone.

And if we check the model performance, we can see that not only does the monotonicity constraint provide a more natural fit, but the model generalizes better as well (as expected). Measuring the mean squared error on new test data, we see that error is smaller for the monotone model.

from sklearn.metrics import mean_squared_error as mse

size = 1000000
x = np.linspace(0, 10, size) 
y = x**2  -10 + (20 * np.random.random(size))

print ("Default model mse", mse(y, overfit_model.predict(x.reshape(-1,1))))
print ("Monotone model mse", mse(y, monotone_model.predict(x.reshape(-1,1))))


Default model mse 37.61501106522855
Monotone model mse 32.283051723268265

Other methods for enforcing monotonicity

Tree based methods are not the only option for setting monotonicity constraint in the data. One recent development in the field is Tensorflow Lattice, which implements lattice based models that are essentially interpolated look-up tables that can approximate arbitrary input-output relationships in the data and which can optionally be monotonic. There is a thorough tutorial on it in Tensorflow Github.

If a curve is already given, monotonic spline can be fit on the data, for example using the splinefun package.

20 comments on “Monotonicity constraints in machine learning

  1. Pingback: Monotonicity constraints in machine learning | Premium Blog! | Development code, Android, Ios anh Tranning IT

  2. Pingback: Monotonicity constraints in machine learning - News Champs

  3. Pingback: Monotonicity constraints in machine learning – AiProBlog.Com

  4. Pingback: Artificial Intelligence/Machine Learning Roundup #88 | Daily Artificial Intelligence & Machine Learning Curated News

  5. Our skilled property consultants survey native market developments to find the fitting answer to match your needs, whether or not it’s residential, business or for investment purposes.

  6. Great goods from you, man. I’ve understand your stuff previous to and you are just extremely great.
    I really like what you have acquired here, really like what
    you’re stating and the way in which you say it. You make
    it entertaining and you still take care of to keep it sensible.
    I cant wait to read far more from you. This is actually a tremendous site.

  7. Nice article!

    I have one question, you said -> “After training the monotone model, we can see that the relationship is now strictly monotone”

    But it is not strictly monotone right, I can see there are values or intervals of x where your prediction does not change.

    By definition:

    Let y=f(x) be a differentiable function on an interval (a,b). If for any two points x1,x2∈(a,b) such that x1<x2, there holds the inequality f(x1)≤f(x2), the function is called increasing on ths interval.

    If there holds the inequality f(x1)<f(x2), the function is called strictly increasing on the interval.

    So, for my articular application it should be strictly monotonic at all intervals. Can you tell me how to achieve this in LGBM or any other method?

    Thanks
    Vineet

    • Even your definition of monotonic function says that the function can be equal. The only necessary component of a monotone function is that it does not decrease, which if it stays constant, means that it is.

  8. laitoncrafts is dedicated to creating products that make the places where we live and work more meaningful by developing high-quality products built on a foundation of heritage values, innovation and craftsmanship.the laiton-crafts traditional craftsmanship with modern design, offering a diverse portfolio of products, including Brass Embellishments, Handles, Wall Murals, Partition Grills, Thresholds, Name Plates, Main Doors, and Pooja Doors.

Leave a Reply

Your email address will not be published. Required fields are marked *