700 Huron Ave
#14F
Cambridge, MA 02138
The natural language processing community has recently experienced a growth of interest in domain independent semantic parsing -- popularly known as semantic role labeling. This entails identifying all the predicates in a sentence, and then, identifying and classifying sets of word sequences, that represent the arguments (or, semantic roles) of each of these predicates. In other words, this is the process of assigning a WHO did WHAT to WHOM, WHEN, WHERE, WHY, HOW etc. structure to plain text, so as to facilitate enhancements to algorithms that deal with various higher-level natural language processing tasks, such as -- information extraction, question answering, summarization, machine translation, etc., by providing them with a layer of semantic structure on top of the syntactic structure that they currently have access to. In recent years, there have been a few attempts at creating hand-tagged corpora that encode such information. Two such corpora are FrameNet (UC Berkeley) and PropBank (UPenn/Colorado). One idea behind creating these corpora was to make it possible for the community at large, to train supervised machine learning classifiers that can be used to automatically tag vast amount of unseen text with such shallow semantic information. ASSERT is a automatic statistical semantic role tagger, that can annotate naturally occuring text with semantic arguments. When presented with a sentence, it performs a full syntactic analysis of the sentence, automatically identifies all the verb predicates in that sentence, extracts features for all constituents in the parse tree relative to the predicate, and identifies and tags the constituents with the appropriate semantic arguments. ASSERT is trained to tag: i) PropBank arguments, ii) Thematic roles, and iii) Opinions, in plain text. For more information on the tagging algorithm, please read the following paper:
700 Huron Ave
#14F
Cambridge, MA 02138