it is well-known that part of speech depends on context. the word "table," for example, can be a verb in some contexts (e.g., "he will table the motion") and a noun in others (e.g., "the table is ready"). a program has been written which tags each word in an input sentence with the most likely part of speech. the program produces the following output for the two "table" sentences just mentioned: (pps = subject pronoun; md = modal; vb = verb (no inflection); at = article; nn = noun; bez = present 3rd sg form of "to be"; jj = adjective; notation is borrowed from [francis and kucera, pp. 6-8]) part of speech tagging is an important practical problem with potential applications in many areas including speech synthesis, speech recognition, spelling correction, proof-reading, query answering, machine translation and searching large text data bases (e.g., patents, newspapers). the author is particularly interested in speech synthesis applications, where it is clear that pronunciation sometimes depends on part of speech. consider the following three examples where pronunciation depends on part of speech. first, there are words like "wind" where the noun has a different vowel than the verb. that is, the noun "wind" has a short vowel as in "the wind is strong," whereas the verb "wind" has a long vowel as in "don't forget to wind your watch." secondly, the pronoun "that" is stressed as in "did you see that?" unlike the complementizer "that," as in "it is a shame that he's leaving." thirdly, note the difference between "oily fluid" and "transmission fluid"; as a general rule, an adjective-noun sequence such as "oily fluid" is typically stressed on the right whereas a noun-noun sequence such as "transmission fluid" is typically stressed on the left. these are but three of the many constructions which would sound more natural if the synthesizer had access to accurate part of speech information. perhaps the most important application of tagging programs is as a tool for future research. a number of large projects such as [cobuild] have recently been collecting large corpora (101000 million words) in order to better describe how language is actually used in practice: "for the first time, a dictionary has been compiled by the thorough examination of representative group of english texts, spoken and written, running to many millions of words. this means that in addition to all the tools of the conventional dictionary makers... the dictionary is based on hard, measureable evidence." [cobuild, p. xv] it is likely that there will be more and more research projects collecting larger and larger corpora. a reliable parts program might greatly enhance the value of these corpora to many of these researchers. the program uses a linear time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word j), and (b) contextual probabilities (probability of observing part of speech i given k previous parts of speech). probability estimates were obtained by training on the tagged brown corpus [francis and kucera], a corpus of approximately 1,000,000 words with part of speech tags assigned laboriously by hand over many years. program performance is encouraging (95-99% "correct", depending on the definition of "correct"). a small 400 word sample is presented in the appendix, and is judged to be 99.5% correct. it is surprising that a local "bottom-up" approach can perform so well. most errors are attributable to defects in the lexicon; remarkably few errors are related to the inadequacies of the extremely over-simplified grammar (a trigram model). apparently, "long distance" dependences are not very important, at least most of the time. one might have thought that ngram models weren't adequate for the task since it is wellknown that they are inadequate for determining grammaticality: "we find that no finite-state markov process that produces symbols with transition from state to state can serve as an english grammar. furthermore, the particular subclass of such processes that produce norder statistical approximations to english do not come closer, with increasing n, to matching the output of an english grammar." 