Frequently Asked Questions
- Can I use the development data to train the system for the final test run?
Answer: Yes. It is fine to use the annotation on the development data as additional training material for the final evaluated system.
- We are getting a few *_conll files with missing/wrong number of columns
Answer: This was happening owing to an error in two parse trees. Please manually update the LDC distribution as follows, and that should fix the issue.
conll-2012-train-v0/data/files/data/english/annotations/tc/ch/00/ch_0021.parse:
------------------------------------------------------------------------------
change: (-NONE- +unknown+)
to: (TOP (XX +unknown+))
conll-2012-train-v0/data/files/data/english/annotations/tc/ch/00/ch_0011.parse:
------------------------------------------------------------------------------
change: (-NONE- When)
to: (TOP (SBARQ (WHADVP (WRB When))))
- Each *conll file is split into one or many “parts”. Do we consider each part as a separate document?>
Answer: Yes. As mentioned in the OntoNotes release document, some files were too long for a complete coreference annotation, so we had to break them into smaller pieces. Therefore, each part behaves as a separate document. We have, in some cases, tried to merge multiple contiguous parts into one to form longer, coherent parts, and given more time and feasibility we plan to merge as many as possible in later releases.
- Do the participants have to perform mention detection and coreference resolution, or will the mention bracketing be available in the test sets?
Answer: Systems are expected to perform both mention detection and coreference resolution. The final coreference column in the test set will be blank (i.e., a series of “-”).
- Are singleton mentions not tagged in OntoNotes? Should we filter them out before scoring ourselves, and before submitting the final test run?
Answer: Yes, by design, we have only tagged mentions that have one or more other coreferent mentions in the document. Therefore, if your system produces singleton mentions, they should be filtered out before scoring or submitting the final runs. The scorer is not designed to filter them out implicitly, and would consider them to be suprious mentions.
- There are two versions of the CEAF metric: i) Mention based CEAF, or CEAFM; and ii) Entity based CEAF, or CEAFE. Which of these will be used for the final average score?
Answer: We will use the Entity based (CEAFE) metric and compute the final score based on the following formula using the F-scores of each metric: (MUC + BCUBED + CEAFE)/3
- Can we submit the final test results in one big file?
Answer: Yes, you may do that, as long as all the document metadata — including the part numbers are preserved in the right format.
- Why are the Arabic part of speech tags in the predicted version are different from the gold standard version? Which of them will be available in the test set?
Answer: The gold standard Arabic part of speech tags are much richer than the standard ones that have been used for English, and on which most of the parsers are trained on. The LDC release of the Arabic Treebank (LDC2010T13) provides a mapping to a smaller set of Penn-Treebank style part of speech tags which are used by the community. We therefore decided to follow the same tradition. The following script was used to simplify the parse trees. The test set will contain this reduced version of the part of speech tags.
- What is the citation for the shared task introduction paper?
Answer: Please use the following bibtex citation:
@inproceedings{pradhan-etal-conll-st-2012-ontonotes,
author = {Sameer Pradhan and Alessandro Moschitti and Nianwen Xue and Olga Uryupina and Yuchen Zhang},
title = {{CoNLL-2012} Shared Task: Modeling Multilingual Unrestricted Coreference in {OntoNotes}},
booktitle = {{Proceedings of the Sixteenth Conference on Computational Natural Language Learning (CoNLL 2012)}},
year = {2012},
address = {Jeju, Korea}
}
|