Entity Linking with a Knowledge Base: Issues, Techniques and Solutions
- Many KBs - Wikipedia, DBpedia, YAGO, Freebase, KnowItAll, ReadTheWeb and Probase.
- 2 main challenges - name variations (partial names, aliases, abbreviations, and alternate spellings) and entity ambiguity.
- It is possible that some entity mention in text does not have its corresponding entity record in the given knowledge base. We define this kind of mentions as unlinkable mentions and give NIL as a special label denoting unlinkable.
- Usually NER before entity linking.
- Similar to coreference resolution (without an external KB) and Word Sense Disambiguation (WSD).
- 3 sub-problems in Entity Linking - Candidate Entity Generation, Candidate Entity Ranking and Unlinkable Mention Prediction.
Candidate Entity Generation
Named Dictionary Based Techniques - ⟨key, value⟩ mapping where key is entity and value is the canonical entity. From wiki, entity pages, redirect pages, disambiguation pages, bold phrases in first paragraphs and hyperlinks. Now we can use exact/partial match on the key for a given mention.
Surface Form Expansion - This is mostly for acronym expansion and includes heuristic based methods and supervised learning methods (SVM which emits a score given an (acronym, candidate) pair).
Search Engine Based Techniques - Google APIs and Wikipedia search engines.
Candidate Entity Ranking
Features for this can be broadly classified into Context Independent (doesn’t depend on where the mention occurs - examples include mention text, entity popularity and entity type) and Context Dependent (bag of words and concept vector).
Ranking methods include binary ranking and LTR in supervised ranking and IR based ranking in unsupervised ranking.
Unlinkable Mention Prediction - Binary classification problem - given a (mention, top Entity) train a binary classifier to predict whether the top entity is not linked to the mention. An alternate approach is to add ‘NIL’ as a candidate to entity ranking problem and take a softmax over (N+1) entities.
Code. Learn. Explore