Imbalance datasets are common in many applications such as medical and x-ray images where a moderate imbalance happens with a less than normal proportion having an illness and majority are not or in anomaly detection in manufacturing where perhaps the failure rate is 1 out of 10,000 batches fail. In such cases, the machine learning algorithm will probably over-classify the larger set due to the increased prior probability. Class imbalances can be identified by examining the target calss distribution via a histogram. If the class is imbalanced, then a problem exists.
There are many ways of handling imbalanced datasets and the key solutions to achieve this include :