Catboost for big data

For structured, heterogenous data, gradient boosting is the way to go.

For all of the hoo-ha about deep learning, the most widely used machine learning algorithm is either logistic regression or gradient boosted decision trees. Gradient boosting is a method whereby you iteratively fit simple models to your data (typically shallow trees), but weight each iteration based on the errors of the previous iteration. It tends to produce good prediction in medium to large datasets.

This paper reviews Catboost which, alongside Xgboost and LightGBM, is one of the most popular gradient boosting implementations. It is particularly well suited to categorical data (hence the name) and doesnโ€™t work well with homogenous numeric data like images. The paper compares implementation and describes application in fields such as psychology, transport and chemistry.

๐Ÿ“– Read more here (22655 words) ๐Ÿ“–