Community Bonding — (17th May to 6th June)
Hii again!!😉
Welcome to the community bonding blog. This period was indeed very busy but great. On 28th May, I had the first meeting with my mentors Edgard Marx, Nausheen Fatma, Thiago Castro Ferreira, and Diego Moussallem. The meeting was amazing, everyone gave their introduction and the overall vibe was energetic. We discussed how we’re going to proceed with the project. I got my first task to explore the Google Dialogflow and convert the DBQNA dataset in the Dialogflow format.
Also, on 3rd June, We had a great community meeting by DBpedia in which all GSoC students and mentors of DBpedia joined and described their projects.
First Task-
So I started exploring DBpedia Neural Question Answering DBQNA which is the largest DBpedia-targeting dataset we have found so far. It is based on English and SPARQL pairs and contains 894,499 instances in total. It has about 131,000 words for English and 244,900 tokens for SPARQL without any reduction in terms of vocabulary. A large number of generic templates are extracted from the concrete examples of two existing datasets LC-QUAD and QALD-7-Train [18] by replacing the entities with placeholders.
Therefore, I thought of an approach For using DBQNA in Dialogflow. I broke down the templates in CSV format, which will strictly contain two columns: one for the question and the other for an answer. I wrote breakdown.py file for the same. This file takes the Question template and its respective query template from DBQNA template CSV files and writes them in different Doc.csv files for their respective template files.
After breaking it down, I uploaded the converted CSV file to the knowledge base of Dialogflow (The max pair per document is 2000), then we have our Knowledge base which contains FAQ’s like this
After this, when we hit a question on Dialogflow “Give me the total number of architects of the buildings whose one of the architects was Stanford white?”
it triggers the knowledge-base intent and responds with a query “SELECT DISTINCT COUNT(?uri) where { ?x dbp:architect “tok A” . ?x dbp:architect ?uri }”.
Now, I needed to replace “tok A” with “Stanford_White” in the query and fetch the answer. For that, I created a flask server that is connected with Webhook in Dialogflow using Dialogflow response as a body. Now in the server, the extraction of the “tok A” from the question takes place, which is “Stanford_White”.
After extraction, I replaced the token holder “tok A” with the entity and its tag “dbr:Stanford_White” and the final query is formed “SELECT DISTINCT COUNT(?uri) where { ?x dbp:architect dbr:Stanford_White . ?x dbp:architect ?uri }”.
Then it gets the response from Dbpedia endpoint which is then returned to dialogflow in the form of “Output Query: SELECT DISTINCT COUNT(?uri) where { ?x dbp:architect dbr:Stanford_White . ?x dbp:architect ?uri }\n Ans: 33”
On the server, I believed that I would simply compare the question “Give me the total number of architects of the buildings whose one of the architects was Stanford white?” and matched the query “Give me the total number of architects of the buildings whose one of the architects was <A>?”
Although, it’s not the proper way to obtain the identity because the person’s question may have a different word or different way of asking the same question.
Also in the proposal drafting period, I updated the LiberAI/NSpM to TF 2.0 but a few of the training parameters were hardcoded and the code was not modular as well, So I parametrized those variables, modularised the code, and created a pull request.
So what’s next?
For the next week, I believe I’d have to find a proper way to do the entity linking in such a way that it considers such cases.
This was all for community bonding. You can find the GitHub repository here DBpedia-LiveNeural-Chatbot.
Next week is going to be even more exciting and full of learning.
Till then, STAY TUNED!🙌.
Thank You!!✨✨