ASSERT (Automatic Statistical SEmantic Role Tagger)
Introduction
The natural language processing community has recently experienced a growth of interest in domain independent semantic parsing -- popularly known as semantic role labeling. This entails identifying all the predicates in a sentence, and then, identifying and classifying sets of word sequences, that represent the arguments (or, semantic roles) of each of these predicates. In other words, this is the process of assigning a WHO did WHAT to WHOM, WHEN, WHERE, WHY, HOW etc. structure to plain text, so as to facilitate enhancements to algorithms that deal with various higher-level natural language processing tasks, such as -- information extraction, question answering, summarization, machine translation, etc., by providing them with a layer of semantic structure on top of the syntactic structure that they currently have access to. In recent years, there have been a few attempts at creating hand-tagged corpora that encode such information. Two such corpora are FrameNet (UC Berkeley) and PropBank (UPenn/Colorado). One idea behind creating these corpora was to make it possible for the community at large, to train supervised machine learning classifiers that can be used to automatically tag vast amount of unseen text with such shallow semantic information. ASSERT is a automatic statistical semantic role tagger, that can annotate naturally occuring text with semantic arguments. When presented with a sentence, it performs a full syntactic analysis of the sentence, automatically identifies all the verb predicates in that sentence, extracts features for all constituents in the parse tree relative to the predicate, and identifies and tags the constituents with the appropriate semantic arguments. ASSERT is trained to tag: i) PropBank arguments, ii) Thematic roles, and iii) Opinions, in plain text. For more information on the tagging algorithm, please read the following paper:
- Shallow Semantic Parsing using Support Vector
Machines
Sameer S. Pradhan, Wayne Ward, Kadri Hacioglu, James
H. Martin, Daniel Jurafsky, in Proceedings of the Human
Language Technology Conference/North American chapter of the
Association for Computational Linguistics annual meeting
(HLT/NAACL-2004), Boston, MA, May 2-7, 2004
Download
assert-v0.15b beta version available. (October 17, 2014)
- This version of ASSERT fixes some bugs in the earlier version and updated to compile on the latest gcc-4.4.7 compiler
assert-v0.14b beta version available. (April 14, 2006)
- This version of ASSERT works on RedHat Linux system. It has been
tested on RedHat 6, 7.3, 9 and Fedora Core 1-3.
- It is designed and implemented by Sameer S.
Pradhan, with some initial contribution
from Daniel
Gildea at the University of Rochester.
- This distribution includes several other software packages,
some as distributed, and some with minor modifications, of which
the principal ones are: i) Eugene Charniak's nlparser, ii) Taku Kudo's YamCha and TinySVM, iii) Douglas Rohde's Tgrep2 and iv) University of Pennsylvania's
morphology package.
assert-v0.1b version released. (July 16, 2004)
Acknowledgements
- I would like to thank my mentors
Profs. Wayne
Ward, James
Martin
and Daniel
Jurafsky for all their support and guidance, and
introducing me to the exciting field of Natural Language
Processing. I would also like to thank Martha
Palmer from UPenn for providing the PropBank
corpus, and useful suggestions,
and
Charles Fillmore for
providing the FrameNet corpus.
- The work was supported by
the Center
for Spoken Language Research and in part by ARDA
AQUAINT program contract OCG4423B and NSF grant
IS-9978025.
|