@coolbreeze16
Mask token ([MASK]) works by replacing a certain percentage of input words with the [MASK] token during training. This forces the model to learn to predict the original words given the context of the surrounding words. During inference or testing, the [MASK] token can be used to generate predictions for missing. $MASKS