This is the abstract of the talk at PyData London 2015 “Jointly embedding knowledge from large graph databases with textual data using deep learning”
I will present recent advances in combining structured graph data with textual data using embedding word representations from a large corpus of unlabelled data. This allows to expand the knowledge base graph and extract complex semantic relationships. Targeting knowledge graphs completion is a recent paradigm that allow extraction of new relations (facts) from existing knowledge graphs like Freebase or GeneOntology.
Word embeddings represents each entity into a low dimensional space and the relationships as transformations which has the advantage of making the search space continuous. This allows to encode the entities and transformations with global information from the entire graph. On the other hand, word embedding approaches, like word2vec, extracted from unlabeled text allows representations of words as vectors, although it doesn’t allow to extract relations. By careful alignment of entities from free text with a knowledge graph it is possible to combine both approaches and jointly extract new knowledge through relationships between entities and words / phrases. We will show results from applying this technology to biomedical data.