directedstructure

Infer communities, hierarchies, and their connection in directed graphs

Maximilian Jerdee, Elizabeth Bruch, Mark Newman

This python package uses Bayesian inference to identify communities and hierarchies of nodes in a directed network, as well as measure the interaction between those structures.

We model community structure using a stochastic block model, hierarchy structure with a Bradley-Terry model, and their interaction according to our work here [link paper].

Installation

directedstructure can be built locally by cloning this repository and running

pip install .

in the base directory (requires a C++ compiler).

Typical usage

Once installed, the package can be used to identify the network structures.

Load a network

We recommend using NetworkX to load the network and then using the directedstructure package to infer the node grouping and hierarchy.

import directedstructure as ds
import networkx as nx
import pandas as pd

# Load a network using NetworkX (this can also be read from an edgelist or other format)
G = nx.read_gml("examples/data/friends.gml")

Infer node properties

node_properties_df = ds.node_properties(G) # pandas DataFrame of inferred community identity and hierarchical position of each node in the network

Infer network properties

network_properties_df = ds.network_properties(G) # pandas DataFrame of network properties (for example depth of hierarchy within and between communities) inferred by the model, as well as their uncertainities

Full samples

To get a more complete picture of the inference, we can consider the full posterior distribution of Monte Carlo samples

samples_df = ds.samples(G)

With these samples we can ask more detailed questions like what is the posterior distribution of possible numbers of groups?

Customization

As this package focuses on the potential link between community and hierarchy, we can swap out the models considered of either community or hierarchy in isolation.

Neutral interactions can also be considered within the model if further interaction type information is provided (either type = dominant or type = neutral). If no type information is provided all interactions will be assumed to be dominant.

Further usage examples can be found in the examples directory of the repository and the package documentation.