Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save shagunsodhani/e8e6213906ec2642f27b1aca3a6201c6 to your computer and use it in GitHub Desktop.
Save shagunsodhani/e8e6213906ec2642f27b1aca3a6201c6 to your computer and use it in GitHub Desktop.
Notes for "Traversing Knowledge Graphs in Vector Space" paper

Traversing Knowledge Graphs in Vector Space

Introduction

  • The paper describes a "compositional" training approach for vector space models, corresponding to Knowledge Bases (KBs).
  • The new approach improves the system's ability to answer path queries and impute missing information for the KBs.
  • Link to the paper

Task

  • Given a KB, knowledge graph, G, is defined as the set of triplets (s, r, t) where s, t ∈ Entities and r ∈ Relations.
  • A path query q consists of an initial entity, s, followed by a sequence of relations, p, to be traversed.
  • The answer to the query is the set of all the entities that can be reached from s by traversing p.
  • Knowledge base completion (KBC) is the task of predicting if an edge (s, r, t) belongs in the graph.

Compositionalization

  • Given a triplet (s, r, t), define score(s/r, t) as the liklihood of s being connected to t via r.
  • In general, score(s/r, t) = M(Tr(xs), xt) for some membership operator M and some traversal operator T.
  • Given a dataset of form (q, t) where q is the path query and t is the answer to the path query,
    • Minimize the max-margin objective 1 - margin(q, t, t')
    • margin(q, t, t') = score(q, t) - score(q, t')
  • This objective function is better that the existing objectives which only train on queries of length 1 (single-edge training).

Candidates Models

  • TransE
    • score(s/r, t) = -|| x<sub>s</sub> + w><sub>r</sub> - x<sub>t</sub>||<sub>2</sub><sup>2</sup>
  • Bilinear-Diag
    • Similar to TransE, but with multiplicative interactions between entity and relation vectors.

Datasets

  • Single-Edge Query datasets:
    • Freebase
    • WordNet
  • Path Query Dataset
    • Given a base knowledge graph, generate path queries of different lengths by performing random walks on the graph.

Results

  • Evalution Metric

    • Mean Quantile - For a query q, the quantile of a correct answer t is the fraction of incorrect answers ranked after t.
    • hit at 10 - Percentage of correct answers ranked among top 10 results.
  • Compositional training improves path querying performance across all models and metrics on both the datasets.

  • TransE(COMP) is the best model in terms of mean quantile.

  • Performance improves for both induction and deduction based queries.

Analysis

  • Why does compositional training improve path query answering?

    • Cascading nature of errors along the path - For a given edge (s, r, t) on the path, the single-edge training encourages xt to be closer to xs, only to the extent that margin is 1 and does not push them any closer. The remaining discrepancy gets added as noise at each step of the traversal.
  • Why does compositional training improve knowledge base completion?

    • Paths in a knowledge graph are an important feature for predicting the existence of single edges and training on paths should provide some form of structural regularisation which should reduce cascading errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment