Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Inteligencia Artificial

LazyPredict in Python Example: Automating the Machine Learning Model

LazyPredict in Python Example: Automating the Machine Learning Model

Actualizado: 2026-05-03

Selecting the best machine learning model for a specific dataset normally involves training, evaluating, and comparing dozens of alternatives. LazyPredict automates that process: with four lines of code it evaluates all scikit-learn classifiers or regressors on your data and returns a comparative metrics table.

Key takeaways

  • LazyPredict automatically trains all scikit-learn models with their default parameters and ranks them by performance metrics.
  • It is an initial exploration tool, not a production one: it identifies promising model families, not the final optimised model.
  • It works for classification (LazyClassifier) and regression (LazyRegressor).
  • The output includes key metrics (accuracy, F1, ROC-AUC for classification; R², RMSE for regression) and training times.
  • Models that stand out in LazyPredict are candidates for the next phase: hyperparameter tuning with GridSearchCV or Optuna.

What LazyPredict is and what it is for

LazyPredict is a Python library built on scikit-learn that automates model benchmarking. When you receive a new dataset and do not know whether it is better suited to decision trees, linear models, or ensembles, LazyPredict gives a first answer in seconds.

What LazyPredict does:

  • Instantiates all compatible scikit-learn estimators.
  • Trains them on the training set.
  • Evaluates them on the test set.
  • Returns a pandas DataFrame sorted by metrics.

What LazyPredict does not do:

  • It does not optimise hyperparameters (uses default values).
  • It does not handle missing data or categorical variable encoding — preprocessing is the user’s responsibility.
  • It does not replace business analysis for choosing the right metric.

Installation and basic example: classification

Installation is straightforward:

bash
pip install lazypredict

Example with scikit-learn’s breast cancer dataset:

python
from lazypredict.Supervised import LazyClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data
data = load_breast_cancer()
X, y = data.data, data.target

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Evaluate all classifiers
clf = LazyClassifier(verbose=0, ignore_warnings=True, custom_metric=None)
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

print(models)

The output is a pandas DataFrame with columns such as Accuracy, Balanced Accuracy, ROC AUC, F1 Score, and Time Taken, sorted by descending accuracy. A typical result shows that models like LGBMClassifier, RandomForestClassifier, or LinearSVC often top the table on clean tabular datasets.

Regression example

For regression problems, the flow is identical with LazyRegressor:

python
from lazypredict.Supervised import LazyRegressor
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

data = load_diabetes()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

reg = LazyRegressor(verbose=0, ignore_warnings=True, custom_metric=None)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)

print(models)

Output metrics include R-Squared, RMSE, and Time Taken. For regression, models like GradientBoostingRegressor or ExtraTreesRegressor typically show the best metrics on structured datasets.

How to interpret results and what to do next

LazyPredict output is a starting point, not a definitive answer. The recommended workflow after obtaining the results table is:

  1. Identify the top 3-5 models by the metric most relevant to the business (not always accuracy).
  2. Analyse training time: a model that takes 10x longer and improves by 0.5% may not be the best production option.
  3. Verify that top-metric models are not overfitting: compare train and test metrics, not just test.
  4. Optimise hyperparameters of selected candidates with GridSearchCV, RandomizedSearchCV, or Optuna.
  5. Evaluate in production with future-period data to confirm performance holds.

For large datasets, LazyPredict can be slow because it trains all models sequentially. In those cases, manually pre-select 5-10 model families before using LazyPredict, or use AutoML tools like FLAML or AutoSklearn directly.

This rapid benchmarking approach pairs well with distributed processing described in DataFrames and Pipelines in Spark when the dataset has already been prepared at scale. Model selection also links to the concept of pre-trained models and transfer learning for cases where your own data is scarce.

Real advantages and limitations

Advantages:

  • Saves hours of repetitive code in the exploratory phase.
  • Immediately surfaces models you would not have tried initially.
  • The pandas DataFrame output is easy to analyse, filter, and export.

Limitations:

  • Does not automatically handle categorical variables, null values, or scaling — preprocessing is the user’s responsibility.
  • Default scikit-learn models are not best-tuned for any specific dataset.
  • On datasets with highly imbalanced classes, accuracy can be a misleading metric; prioritise weighted ROC AUC or F1.
  • Does not include deep neural networks (PyTorch, TensorFlow) — only scikit-learn estimators.

Conclusion

LazyPredict is a productivity tool, not a substitute for expert judgement. Its real value is compressing the model exploration phase from days to minutes, allowing the data scientist to spend time on preprocessing, feature engineering, and business evaluation — the parts where human context is irreplaceable.

Was this useful?
[Total: 11 · Average: 4.5]

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.