Parametric models of token frequency in documents Most words are rare: think of a moderately infrequent word, then check whether it occurs in the next email message, web page, article, or book you encounter. Chances are that it won't. Naive parametric models of token frequency are forced to say that rare words generally have a low probability of occurrence. This leads to well-known problems, since if a word does occur in a document its total occurrence count is often much higher than what would be expected under a naive model. Standard (non-naive) models therefore allow for increased variability of token frequency. But standard models may still run into problems when faced with a large number of documents with zero occurrences. An extension of the naive or standard models is proposed that treats the case of non-occurrence separately. Under the augmented model, there is a separate parameter controlling whether a word appears at all in a document; if it does, its token frequency is then modeled in a standard (or naive) way. I will discuss properties of, and estimation procedures for, several concrete models, and contrast them with two well known naive models often used for document classification.