Does calibration improve roc score?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game Looping

--

Chapters
00:00 Does Calibration Improve Roc Score?
00:53 Accepted Answer Score 12
03:03 Thank you

--

Full question
https://stackoverflow.com/questions/5431...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #machinelearning #scikitlearn #classification #roc

#avk47

ACCEPTED ANSWER

Score 12

TLDR: Calibration should not affect ROCAUC.

Longer answer:

ROCAUC is a measure of rank ("did we put these observations in the best possible order?"). However, it does not ensure good probabilities.

Example: If I'm classifying how likely someone is to have cancer, I may always say a number between 95% and 99%, and still have perfect ROCAUC, as long as I've made my predictions in the right order (the 99%s had cancer, the 95%s did not).

Here we would say that this classifier (that says 95% when then are unlikely to have cancer) has good ability to rank, but is badly calibrated.

So what can we do? We can apply a monotonic transformation, that fixes it without changing the rank ability (therefore not changing the ROCAUC).

Example: in our cancer example we can say the predictions are under 97.5% they should be decreased by 90%, and when they are over 97.5% they would be kept. This really crass approach will not affect the ROC, but would send the "lowest" predictions to close to 0, improving our calibration, as measured by the Brier Score.

Great, now we can get clever! What is the "best" monotonic curve for improving our Brier Score? Well, we can let Python deal with this by using scikit's calibration, which essentially finds that curve for us. Again, it will improve the calibration, but not change the ROCAUC, as the rank order is maintained.

Great, so the ROCAUC does not move.

And yet...
To quote Galileo after admitting that the Earth does not move around the Sun... "E pur si muove" (and yet it moves)

Ok. Now things get funky. In order to do the monotonic transformations, some observations which were close (e.g. 25% and 25.5%) may get "squished" together (e.g. 0.7% and 0.700000001%). This may be rounded, causing the predictions to become tied. And then, when we calculate ROCAUC... It will have moved.

However, for all practical purposes, you can expect that the "real" ROCAUC does not get affected by calibration, and that it should simply affect your ability to measure probabilities, as measured by Brier Score