Srikanth Pagadala

First XGBoost Model with scikit-learn

03 Aug 2016

XGBoost is an implementation of gradient boosted decision trees designed for speed and performance that is dominating machine learning competitions.

For up-to-date instructions for installing XGBoost for Python see.

For reference, you can review the XGBoost Python API reference.

We are going to use the Pima Indians onset of diabetes dataset.

This dataset is comprised of 8 input variables that describe medical details of patients and one output variable to indicate whether the patient will have an onset of diabetes within 5 years.

This is a good dataset for a first XGBoost model because all of the input variables are numeric and the problem is a simple binary classification problem. It is not necessarily a good problem for the XGBoost algorithm because it is a relatively small dataset and an easy problem to model.

Source Code

Report

Next: Data Preparation for Gradient Boosting with XGBoost