DBpedia at the Google Summer of Code 2013

Comments Off on DBpedia at the Google Summer of Code 2013

GSoC 2013

We are proud to announce that the DBpedia project has been accepted for the Google Summer of Code 2013!

Warning! Student applications end on May 3rd at 19 p.m. UTC! You can find some guidelines for a good application with DBpedia here. What are you waiting for?

The following ideas would bring great benefit to the Italian DBpedia. Marco F. will be an official mentor. Contact him directly via email or Twitter.

Type Inference to extend ontology coverage

Currently, there are many entities without a type in DBpedia. Extending the coverage of the DBpedia ontology is a critical step for high-quality data.
Some recently published papers leverage various functions (categories, infoboxes, abstracts, other DBpedia properties, etc.) to automatically infer entity types.
Other ideas include the exploitation of coordinates (geolocation) to infer that the entity is a place. Since Freebase has more types than DBpedia, creating an import tool would be extremely useful.

Check the consistency of the ontology

Given the Wiki nature of the DBpedia ontology, anyone can edit it, and add / delete / modify its classes and properties. This has led to a conceptual chaos, which is in complete contrast with the idea behind an ontology, i.e., to provide a layer of consistency to the heterogeneous data coming from different Wikipedia chapters.
The recently proposed automatic approaches can be a good starting point to clean the current state of the ontology, starting from the class hierarchy.
They can also be used as a tool to avoid redundancy, e.g., to alert a human contributor when he or she is trying to add some new class or property which is already there under a similar name.

Extend the Wiki mapping syntax

Editing a mapping that will have a good coverage, i.e., that will not generate problems in a particular subset of Wikipedia articles, is still a complex task.
It is therefore necessary to extend the Wiki mapping framework by solving the following issues, already reported in the DBpedia issue tracker. The most important for the Italian DBpedia is highlighted in bold:

  • Properly handle the presence of multiple templates on a page (issue # 17)
  • Mint entities with the addition of prefixes / suffixes (issue # 20)
  • Extend conditional mappings to more complex conditions (issue # 19)
  • Generate inverse properties (issue # 32)
  • Map categories to classes (issue # 21)
  • Add a new extractor that will be the combination of infobox mapping and parsing, in order to only produce triples in the property namespace, if they are not mapped (issue # 22)

Any other ideas can be found at the official page.

You don’t know what the Google Summer of Code is?

Simple, it’s an open source dedicated event where Google acts as an agency between a plethora of organizations and many more students willing to contribute.
The selected students receive glory and remuneration from Google, while organizations get free improvements to their software.

Comments are closed.