My highlighted notes for the following paper:
Sawicki, J., & Ganzha, M. (2024). Exploring Reddit Community Structure: Bridges, Gateways and Highways. Electronics, 13(10), 1935.
Sawicki and Ganzha analyze Reddit’s information structure using text embeddings derived from the DistilBERT model, applying graph and cosine similarity measures. They explore the concepts of gateways and bridges, finding significant overlap between the two, and introduce a new construct -- the highway -- defined as a path traversed by many of the shortest paths connecting communities. This addition extends prior notions by identifying a set of nodes along a path rather than a single key node.
Their work was an interesting read. The communities, gateways, and bridges do indeed make sense, mostly (exc. r/formuladank). In Table 1, Size=15 (first one), most subreddits are also SE Asia/India related. I do agree that gateways and bridges seem to be mostly similar -- but I didn't understand Figures 2 and 3.
Questions and observations:
- In Figure 1, isn't the bridge supposed to be a node? In the figure, it appears to be an edge/highway.
- How was a subreddit summarized into a single embedding vector?
Also, it seems gateways and bridges are both high PageRank nodes, while highways are edges/paths with high betweenness centrality.