alwaysaditi's picture
End of training
dc78b20 verified
raw
history blame contribute delete
No virus
2.87 kB
toi)ic signatures can lie used to identify the t)resence of a (:omph~x conce.pt a concept hat consists of several related coinl)onents in fixed relationships. ]~.c.stauvant-uisit, for examph~, invoh,es at h,ast the concel)ts lltcgfit, t.(tt, pay, and possibly waiter, all(l dragon boat pcstivai (in tat- wan) involves the ct)llc(!l)t,s cal(tlztlt,s (a talisman to ward off evil), rnoza (something with the t)ower of preventing pestilen(:e and strengthening health), pic- tures of ch, un9 kuei (a nemesis of evil spirits), eggs standing on end, etc. only when the concepts co- occur is one licensed to infer the comph:x concept; cat or moza alone, for example, are not sufficient. at this time, we do not c.onsider the imerrelationships among tile concepts. since many texts may describe all the compo- nents of a comi)lex concept without ever exi)lic- itly mentioning the mlderlying complex concel/t--a tol)ic--itself, systems that have to identify topic(s), for summarization or information retrieval, require a method of infcuring comt)hx concellts flom their component words in the text. 2 re la ted work in late 1970s, ])e.long (dejong, 1982) developed a system called i"tiump (fast reading understand- ing and memory program) to skim newspaper sto- ries and extract the main details. frump uses a data structure called sketchy script to organize its world knowhdge. each sketchy script is what frumi ) knows al)out what can occur in l)articu- lar situations such as denmnstrations, earthquakes, labor strike.s, an(t so on. frump selects a t)artic- ular sketchy script based on clues to styled events in news articles. in other words, frump selects an eml)t3 ~ t(uni)late 1whose slots will be tilled on the fly as t"f[ump reads a news artme. a summary is gen- erated })ased on what has been (:al)tured or filled in the teml)iate. the recent success of infornmtion extractk)n re- search has encoreaged the fi{um1 ) api)roach. the summons (summarizing online news artmes) system (mckeown and radev, 1999) takes tem- l)late outputs of information extra(:tion systems de- velofmd for muc conference and generating smn- maries of multit)le news artmes. frump and sum- mons both rely on t/rior knowledge of their do- mains, th)wever, to acquire such t)rior knowledge is lal)or-intensive and time-consuming. i~)r exam-- l)le, the unive.rsity of massa(:husetts circus sys- l.enl use(l ill the muc-3 (saic, 1998) terrorism do- main required about 1500 i)erson-llours to define ex- traction lmtterns 2 (rilotf, 1996).the automated acquisit ion of topic signatures for text summarizat ion chin -yew l in and eduard hovy in fo rmat ion s(:i(umes i l l s t i tu te un ivers i ty of southern ca l i fo rn ia mar ina del rey, ca 90292, usa { cyl,hovy }c~isi.edu abst rac t in order to produce, a good summary, one has to identify the most relevant portions of a given text.