Week 6 and 7

Ashutosh Kumar (GSoC @DBpedia 2021 blogs)

2 min readAug 12, 2021

(18th July — 1 Aug)

Hii There!

In the last week, we got an F-score of 0.86 with our Lookup test3. so we started analyzing our approach this week to find where are we losing the rest of the score.

Mostly we are losing our score on two points -

when we compare our candidate entities with the complete question while disambiguation, then other words in the question affect our disambiguation and we choose incorrect entities. But this is like a necessary evil because most of the time the token we use to get candidate entities are incomplete, so we need to compare with the complete questions in case part of entities is present in question but not in the token. so until unless we have a perfect spotting algorithm that extracts the complete token, we have to compare candidates with the whole question.

EG-

Question- "List some musicians associated with famous guitar players"
Benchmarked entity-['http://dbpedia.org/resource/Guitar'] 
Entity we selected-['http://dbpedia.org/resource/Guitar_Player']Question- "What university campuses are situated in Indiana"
Benchmarked entity-['http://dbpedia.org/resource/Indiana']
Entity we selected-['http://dbpedia.org/resource/Indiana_University'Question- "How many total religions are followed by people whose hometown is somewhere in India."
Benchmarked entity-['http://dbpedia.org/resource/India']
Entity we selected-['http://dbpedia.org/resource/Indian_people']

2. when the entities are not completely mentioned in the question nor there disambiguates are present. we can try to solve this problem using relations.

EG-

Question-"What are the kind of games one can play on windows"
Benchmarked entity-['http://dbpedia.org/resource/Microsoft_Windows'] Entity we selected-['http://dbpedia.org/resource/Games_for_Windows']Question-"How many people are known for Dragons' Den"
Benchmarked entity-["http://dbpedia.org/resource/Dragons'_Den_(UK_TV_series)"] 
Entity we selected-
["http://dbpedia.org/resource/Dragons'_Den"]

After this analysis, one of my mentors suggested exploring Bert embeddings to solve the second problem in which we can focus word embedding on the particular token of the question but the embedding will also containing some context of the question. so we tried exploring embeddings Like Bert, Glove but the results we not even good enough to run the complete benchmark as it would take a large amount of time for 5000 samples. so after exploring the embeddings we realized it won’t help.

This was all for weeks 6 and 7. You can find the LC-QuAD benchmarked dataset, code, and observations here — benchmarks/LC-QuAD.

Now for next week-

Next week we will be focusing on spotting algorithms, like using Stanford parsing, ner, Spacy ner, and noun chunks to extract the proper token, which may improve our scores but most importantly will make our entity linking algorithm complete.

See you next week !!

STAY TUNED!🙌.

Week 6 and 7

Now for next week-

Written by Ashutosh Kumar (GSoC @DBpedia 2021 blogs)

No responses yet