It turns out that in many cases, this is harder than you might think. The issue is that there are different ways to measure the accuracy of a model, and often it's mathematically impossible for them all to be equal across groups.

We'll illustrate how this happens by creating a (fake) model to screen these people for a disease. Model Predictions In a perfect world, fatty hepatosis sick people would test positive for the disease and only healthy people would test negative. Model Mistakes But models and tests aren't perfect.

The model might make a mistake and mark a sick person as healthy. Or the opposite: marking a healthy person as sick. Never Miss the Disease. If there's a simple follow-up test, we could have the model aggressively call cases so it rarely misses the disease.

We can quantify this by measuring the percentage of sick people who test positive. On the other hand, if there isn't a secondary test, or the treatment uses a drug with a limited supply, we might care more about the percentage of people with positive tests who are actually sick. These issues and trade-offs in model optimization aren't new, but they're coming into focus when we have the ability to fine-tune exactly how aggressively disease is diagnosed.

Try adjusting how aggressive the model is in diagnosing the disease Subgroup Analysis Things get even more complicated when we check if the model treats different groups fairly.

If we're trying to evenly allocate resources, having the model miss more cases in children than adults would be bad. That is, the "base rate" of the disease is different across groups. The fact that the base rates are different makes the situation surprisingly tricky.

For one thing, even though the test catches the same percentage of sick adults and sick children, an adult who tests positive is less likely to have the disease than a child who tests positive. Imbalanced Metrics Why is there a disparity in diagnosing between children and adults.

There is a higher proportion of well adults, so mistakes in the test will cause more well adults to be marked "positive" than well children (and similarly with mistaken negatives). To fix this, we could have the model take age into account. Try adjusting the slider to make the model diagnose adults less aggressively than children. This allows us to align one metric.

But now adults who have the disease are less likely to be diagnosed with it. No matter how you adjust the sliders, you won't be able to make both metrics fair at once.

It turns out this is inevitable any time the base rates are different, and the test isn't perfect.

