Knowledge Graphs

We demonstrate how to use CSV data to build a knowledge graph and then learn node embeddings that capture the similarity between individuals. The process involves two major stages: First, we construct an undirected knowledge graph from our CSV files, where nodes represent people and their attributes, and edges capture relationships weighted by their importance. Then we learn vector embeddings for these nodes and reposition the person nodes in 2D space so that similar individuals are physically closer together, representing their mathematically derived relationships.

CSV Data

Here is a sample data set that we will use for the graphs that follow. data.csv contains details such as name, education, employer, location, hobbies, etc., while weights.csv defines the relative importance of each field.

data.csv

  name undergrad grad employer location hobbies drinks
0 John MIT Yale Google Colorado skiing espresso
1 Janice nan nan Freelancer California skiing coldbrew
2 Alice Stanford Northeastern Facebook Massachusetts cycling;reading tea
3 Bob Harvard nan Amazon Massachusetts running;cooking rum
4 Charlie MIT Harvard Google Washington skiing espresso
5 Joe BC nan Amazon Massachusetts running;cooking rum
6 Chuck BU Harvard Google Washington skiing espresso
7 Aime BU nan Kohls Florida TV water

weights.csv

  field weight
0 undergrad 5.000000
1 grad 5.000000
2 employer 5.000000
3 location 10.000000
4 hobbies 3.000000
5 drinks 0.500000

Building the Undirected Knowledge Graph

Using the information from data.csv and weights.csv, we create an undirected graph. In this graph, each person and each trait (e.g., school, employer, hobby) is represented as a node. Edges connect people to their corresponding traits, with the edge weights reflecting the importance of that relationship. The physical distance between nodes is arbitrary, at this point, and closeness can only be understood by examining node degrees.

Learning Node Embeddings & Repositioning

After constructing the graph, we apply a machine learning process to learn vector embeddings for every node. These embeddings capture the similarity between nodes – so that individuals with similar attributes have embeddings that are close together. We then reposition the person nodes using Multi-Dimensional Scaling (MDS), resulting in a 2D layout where similar people are closer. Hovering over any person node in the graph reveals a list of other individuals sorted by similarity scores, with 1 being identical. In this example, where we put a heavy weight on location, we see Alice Joe and Bob, who all live in the same state, are very close together. Aime, who only shares a trait with Chuck, is far away from the pack.

Summary

In summary, this visualization demonstrates the complete pipeline: CSV Data → Knowledge Graph → Node Embeddings → Interactive Visualization. The undirected graph shows the raw connections between people and their traits, while the embeddings graph provides a refined view based on learned similarities. Hover over nodes to explore the detailed similarity metrics.

Back to Home