Sameer Pradhan

700 Huron Ave
#14F
Cambridge, MA 02138

ASSERT 

Automatic Statistical SEmantic Role Tagger 

Introduction

The natural language processing community has recently experienced a growth of interest in domain independent semantic parsing -- popularly known as semantic role labeling. This entails identifying all the predicates in a sentence, and then, identifying and classifying sets of word sequences, that represent the arguments (or, semantic roles) of each of these predicates. In other words, this is the process of assigning a WHO did WHAT to WHOM, WHEN, WHERE, WHY, HOW etc. structure to plain text, so as to facilitate enhancements to algorithms that deal with various higher-level natural language processing tasks, such as -- information extraction, question answering, summarization, machine translation, etc., by providing them with a layer of semantic structure on top of the syntactic structure that they currently have access to. In recent years, there have been a few attempts at creating hand-tagged corpora that encode such information. Two such corpora are FrameNet (UC Berkeley) and PropBank (UPenn/Colorado). One idea behind creating these corpora was to make it possible for the community at large, to train supervised machine learning classifiers that can be used to automatically tag vast amount of unseen text with such shallow semantic information. ASSERT is a automatic statistical semantic role tagger, that can annotate naturally occuring text with semantic arguments. When presented with a sentence, it performs a full syntactic analysis of the sentence, automatically identifies all the verb predicates in that sentence, extracts features for all constituents in the parse tree relative to the predicate, and identifies and tags the constituents with the appropriate semantic arguments. ASSERT is trained to tag: i) PropBank arguments, ii) Thematic roles, and iii) Opinions, in plain text. For more information on the tagging algorithm, please read the following paper:

  • Shallow Semantic Parsing using Support Vector Machines
    Sameer S. Pradhan, Wayne Ward, Kadri Hacioglu, James H. Martin, Daniel Jurafsky,
    in Proceedings of the Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting (HLT/NAACL-2004),
    Boston, MA, May 2-7, 2004

Download

April 14, 2006: assert-v0.14b beta version available.
  • This version of ASSERT works on RedHat Linux system. It has been tested on RedHat 6, 7.3, 9 and Fedora Core 1-3.
  • It is designed and implemented by Sameer S. Pradhan, with some initial contribution from Daniel Gildea at the University of Rochester.
  • This distribution includes several other software packages, some as distributed, and some with minor modifications, of which the principal ones are: i) Eugene Charniak's nlparser, ii) Taku Kudo's YamCha and TinySVM, iii) Douglas Rohde's Tgrep2 and iv) University of Pennsylvania's morphology package.
July 16, 2004: assert-v0.1b version released.

Acknowledgements

  • I would like to thank my mentors Profs. Wayne Ward, James Martin and Daniel Jurafsky for all their support and guidance, and introducing me to the exciting field of Natural Language Processing. I would also like to thank Martha Palmer from UPenn for providing the PropBank corpus, and useful suggestions, and Charles Fillmore for providing the FrameNet corpus.
  • The work was supported by the Center for Spoken Language Research and in part by ARDA AQUAINT program contract OCG4423B and NSF grant IS-9978025.

 

700 Huron Ave
#14F
Cambridge, MA 02138