over recent years, many natural language pro cessing (nlp) techniques have been developedthat might benefit from knowledge of distribu tionally similar words, i.e., words that occur in similar contexts. for example, the sparse dataproblem can make it difficult to construct language models which predict combinations of lex ical events. similarity-based smoothing (brown et al, 1992; dagan et al, 1999) is an intuitivelyappealing approach to this problem where prob abilities of unseen co-occurrences are estimatedfrom probabilities of seen co-occurrences of dis tributionally similar events.other potential applications apply the hy pothesised relationship (harris, 1968) betweendistributional similarity and semantic similar ity; i.e., similarity in the meaning of words can be predicted from their distributional similarity.one advantage of automatically generated the sauruses (grefenstette, 1994; lin, 1998; curranand moens, 2002) over large-scale manually cre ated thesauruses such as wordnet (fellbaum,1998) is that they might be tailored to a partic ular genre or domain.however, due to the lack of a tight defini tion for the concept of distributional similarity and the broad range of potential applications, alarge number of measures of distributional similarity have been proposed or adopted (see section 2). previous work on the evaluation of dis tributional similarity methods tends to either compare sets of distributionally similar words to a manually created semantic resource (lin, 1998; curran and moens, 2002) or be orientedtowards a particular task such as language mod elling (dagan et al, 1999; lee, 1999). the first approach is not ideal since it assumes that the goal of distributional similarity methods is topredict semantic similarity and that the semantic resource used is a valid gold standard. further, the second approach is clearly advanta geous when one wishes to apply distributional similarity methods in a particular application area. however, it is not at all obvious that oneuniversally best measure exists for all applica tions (weeds and weir, 2003). thus, applying adistributional similarity technique to a new ap plication necessitates evaluating a large number of distributional similarity measures in addition to evaluating the new model or algorithm. we propose a shift in focus from attemptingto discover the overall best distributional sim ilarity measure to analysing the statistical and linguistic properties of sets of distributionally similar words returned by different measures. this will make it possible to predict in advanceof any experimental evaluation which distributional similarity measures might be most appro priate for a particular application. further, we explore a problem faced by the automatic thesaurus generation community, which is that distributional similarity methodsdo not seem to offer any obvious way to distinguish between the semantic relations of syn onymy, antonymy and hyponymy. previous work on this problem (caraballo, 1999; lin et al., 2003) involves identifying specific phrasal patterns within text e.g., ?xs and other ys? is used as evidence that x is a hyponym of y. our work explores the connection between relativefrequency, distributional generality and seman tic generality with promising results. the rest of this paper is organised as follows.in section 2, we present ten distributional simi larity measures that have been proposed for use in nlp. in section 3, we analyse the variation in neighbour sets returned by these measures. in section 4, we take one fundamental statisticalproperty (word frequency) and analyse correla tion between this and the nearest neighbour setsgenerated. in section 5, we relate relative fre quency to a concept of distributional generalityand the semantic relation of hyponymy. in sec tion 6, we consider the effects that this has on a potential application of distributional similarity techniques, which is judging compositionality of collocations.in sec tion 6, we consider the effects that this has on a potential application of distributional similarity techniques, which is judging compositionality of collocations. over recent years, many natural language pro cessing (nlp) techniques have been developedthat might benefit from knowledge of distribu tionally similar words, i.e., words that occur in similar contexts. we would liketo thank adam kilgarriff and bill keller for use ful discussions. we have presented an analysis of a set of dis tributional similarity measures. for example, the sparse dataproblem can make it difficult to construct language models which predict combinations of lex ical events. in section 5, we relate relative fre quency to a concept of distributional generalityand the semantic relation of hyponymy. in its most general sense, a collocation is a habitual or lexicalised word combination. thus, it would seem that the three-way connection betweendistributional generality, hyponymy and rela tive frequency exists for verbs as well as nouns. we have seen that there is a large amount of variation in the neighbours selected by different measures andtherefore the choice of measure in a given appli cation is likely to be important. mean simhm(w2, w1) = 2.simp (w2,w1).simr(w2,w1) simp (w2,w1)+simr(w2,w1) where f (w) = {c : i(c, w) > 0} table 1: ten distributional similarity measures their harmonic mean (or f-score).