The task of Coreference Resolution is to identify all mentions that refer to the same real world entity. Consider the following example -
Barack Obama nominated Hillary Rodham Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as former First Lady.
Barack Obama, he and his all refer to Obama (Entity).
Hillary Rodham Clinton, secretary of state, her, she and First Lady all refer to Hillary Clinton.
OntoNotes 5.0 is the biggest coreference dataset out there - it has around 3000 documents labeled by humans. You can download the dataset from LDC here. Conll has annotated the dataset and you can find the annotated dataset and scripts to merge them here.
Metrics & SotA
- For each mention, compute a precision and recall.
- Average the individual Ps and Rs
Often report F1 over these metrics.
|Wiseman et al (2015)||63.3|
|Clark & Manning (2016)||65.4|
|Lee et al (2017)||67.2|
 - 2018 Stanford NLP Lecture - Slides
 - HuggingFace Demo and Github repository - Github
 - Learning Anaphoricity and Antecedent Ranking Features for Coreference Resolution - Paper
 - Improving Coreference Resolution by Learning Entity-Level Distributed Representations - Paper
If you think this is an easy task, consider the following 2 examples -
She poured water from the pitcher into the cup until it was full/empty.
The trophy would not fit in the suitcase because it was too big/small.
In each of the above 2 examples, the coreference would change based on a single world. Also, there is a lot of common knowledge (not written in books) that goes into these resolution. These are called Winograd Schema and was recently proposed as an alternative to Turing Test.
- Document Understanding
- Machine Translation
- Dialog Systems
Anaphora is a kind of reference where one term in the document (anaphor) refers to another term term (antecedent).
Barack Obama said he would sign the bill.
We went to see a concert last night. The tickets were really expensive.
Coreference resolution can be broadly divided into 2 steps -
- Detect the mentions
- can be nested
- Cluster the mentions
- multiple ways (see below)
Mention is a span of text referring to some entity. There are 3 kinds of mentions -
- Part of Speech Tagger
- Named Entities
- Noun Phrases
- Constituency Parser
Mention Clustering Models
There are 3 kinds of coref models -
- Mention Pair
- Mention Ranking
- Mention Clustering
In the Mention Pair model, we train a binary classifier that predicts if a pair of mentions are coreferent.
At the train time, we minimize a standard cross entropy loss -
- Assume, we’ve N mentions in the model
- yij if mentions mi and mj are coreferent, -1 otherwise
- i -> iterates through the mentions
- j -> iterate through candidate antecedents
At the test time, we pick some threshold (say 0.5) and add coreference links between all positives.
- Could easily ball up and all form 1 big cluster.
- Most of the mentions only have 1 antecedent, but we predict all of them
In the Mention Ranking model, we assign each mention to its highest scoring candidate antecedent. We also have a dummy NA mention that allows the model to decline linking the current mention to anything.
- Non-Neural Statistical Classifier
- Feed-Forward Neural Network a. Embeddings, distance, document genre, speaker information.
- LSTMs and Attention
HuggingFace Coref Implementation
The HuggingFace Coref Implementation is very similar to this paper by Clark and Manning.
Code. Learn. Explore