“Clarity and peace of mind are powerful tools. Achieving either (and both) is only possible when we allow ourselves to “see” life in all its variance.”
Carlos Wallace, author
We have previously discussed the nuances of parameters vs hyperparameters in machine learning (click here to read), but I will now elaborate on another equally daunting dyad: bias and variance. The total reducible error of a machine learning model is the sum of the bias and variance errors, and these two concepts are also related to the model “fit”: the optimal model fit is when bias and variance are both at the lowest error.
Bias is the difference between the predicted value by the model and the ground truth. In other words, bias reflects “accuracy” or how close the holes are to the target “bull’s eye”. If there is a large difference between the training and the test data, the bias will be high and there will be underfitting. Conversely, if the model has low bias, the difference between training and test data will be smaller and the model will avoid underfitting.
Low bias machine learning algorithms include decision trees, k-nearest neighbors, and support vector machines while high bias machine learning algorithms include linear and logistic regression. The signs of a high bias machine learning model include high error rate on both train and test data sets, underfitting, and an overly simplified model.
Variance is the variability in the model prediction. In other words, variance reflects “precision” or how closely the holes are together. Underfit models have high bias and low variance, while overfit models have high variance and low bias. Low variance machine learning algorithms include linear and logistic regression while high variance machine learning algorithms include decision trees, k-nearest neighbors, and support vector machines. The signs of a high variance machine learning model include low error on training data set but high error on test data set, overfitting, and complexity.
The bias-variance trade-off
There is tension or trade-off between bias and variance error in the prediction error for any supervised machine learning algorithm. If the algorithm has high bias but low variance, it is too simple (“underfitting”) and if the algorithm has high variance but low bias, it is too complex (“overfitting”). Maneuvers that decrease bias and therefore resulting in a better fit to the training data will also concomitantly increase variance and therefore higher likelihood of poor prediction. In short, increasing the bias will reduce the variance while increasing the variance will reduce the bias. Ideally, the model should have both low bias and low variance. In other words, as the complexity of the model increases, the variance will increase while the bias decreases.
The methodologies used to achieve low bias and low variance towards a simple and not overly complicated model are regularization, boosting, and bagging. In addition, a larger data set will allow one to have less overfitting. Lastly, increasing the complexity of the model can decrease the overall bias while the variance can be at an acceptable level.