Feature Space 3 classes 1 feature(s)
Low-Rank Softmax Can Have Unargmaxable Classes in Theory
(Scroll down to see how)
Let's start with 100 random single-feature inputs
We feed these points into a Softmax classifier with 3 classes
(The weight vector values are in the top right corner)
We classify each point, assigning it the color of the class having the largest probability
A class is unargmaxable if it is impossible to assign it to any input points
There are currently no yellow points, so Class b is unargmaxable
But we can scale the weight vector for Class b
However, now Class c is unargmaxable: there are no green points
Let's look at a less trivial example: 4 classes and inputs with 2 features
Class d is currently unargmaxable
(But scaling the weight for Class d this way makes it argmaxable again)
Notice when Class d is argmaxable vs unargmaxable, maybe you are seeing a pattern
Spoiler: Class d is unargmaxable when internal to the triangle (a, b, c)
See Demeter et al (2020) and Cover (1967)
How can we detect whether a class is unargmaxable?
Our paper relies on the insight that we can partition the input space
Into regions that correspond to rankings of classes in terms of probabilities
For instance, this region ranks d > b > c > a
We then focus on points that satisfy constraints
We want points for which the probability of Class d is greater than that of a, b and c
We are left with the cone of points to the top right, hence Class d is argmaxable
However, if the Class d weight vector is internal to the triangle (a, b, c)
And we apply the constraints...
No point satisfies the constraints; proving that Class d is unargmaxable
Low-Rank Softmax Can Have Unargmaxable Classes in Theory but Rarely in Practice

In our paper, we search for unargmaxable classes in 150 Language Models and Machine Translation Models.

We find that unargmaxable classes are infrequent and unlikely to impact model quality.

Phenomena may vary across models, our code is available here if you want to test your own.