# ccme
Continuous Configuration Model Extraction (CCME) Community Detection Method
code author and primary article author: John Palowitch (palojj@email.unc.edu)
other article authors and collaborators: Shankar Bhamidi (bhamidi@email.unc.edu), Andrew B. Nobel (nobel@email.unc.edu)
Description
CCME is a method for community detection method for networks with edge-weights. The method is not intended for bi-partite, directed, or multi-layer networks, but extensions of this sort are underway. A particular feature of CCME is its hypothesis testing framework (see http://arxiv.org/abs/1601.05630v2) and it is thus able to return "background" nodes which are not preferentially connected to any other nodes in the network. At one extreme, CCME may return no communities if no significant community structure exists. CCME can also return overlapping communities.
Usage
CCME is coded in the statistical language R. Download the latest version of R, source "CCME.R", and install the required packages when prompted. The function call is as follows:
CCME(edge_list, alpha = 0.05, binary = FALSE, n.samples = NULL, OL_thres = 0.90, updateOutput = FALSE, loopOutput = TRUE, generalOutput = TRUE, throwInitial = FALSE)
Arguments
edge_list
The input graph. Must be a data.frame object with named items "node1", "node2", "weight".
alpha
The false discovery rate to achieve during hypothesis testing.
binary
CURRENTLY UNSUPPORTED; keep at default FALSE. Whether or not to treat the network as an un-weighted network.
n.samples
Number of initial sets to form and test; if NULL initial sets will be formed for each node (recommended).
OL_thres
Overlap threshold: maximum proportion of nodes from any community allowed to exist in any other.(default = 0.90)
updateOutput
Whether or not to show console output about set updates (default FALSE)
loopOutput
Whether or not to show console output about progression through initial sets (default TRUE)
generalOutput
Whether or not to show console output about non-loop, non-update info (default TRUE)
throwInitial
Whether or not to skip over initial sets from nodes that were in previously refined initial sets (default = FALSE)
Details
In the data.frame object for the argument 'edge_list', columns "node1" and "node2" are the adjacencies. There are two requirements for these columns: 1) "node1" must be ordered (try edge_list <- edge_list[order(edge_list$node1), ]), and 2) edge_list$node1[i, ] <= edge_list$node2[i, ] for each i corresponding to a row of edge_list. This is a parsimonious format for an edge_list for which the corresponding adjacency matrix has no entries below the diagnoal. Finally, "weight" is the numeric vector of edge weights on each edge. Weights must be non-negative and each node must have non-zero total weight sum.
The argument 'binary' is currently unsupported for use, so keep it at FALSE if you don't want non-sensical output. In the future, a setting of 'true' will allow the method to perform hypothesis testing with respect to the standard (binary-network) configuration model.
If n.samples is not NULL it must be an integer between 1 and the number of nodes. Instead of initializing a set for each node, CCME choose n.samples nodes at random for which to initialize a set. Setting n.samples non-NULL can help with overall run time for extremely large networks, but always try default first.
Recommended that throwInitial be FALSE, but setting to TRUE can help with overall run time (though may decrease detection power).
Value
CCME returns an object of class "list" with the following named items:
communities
a list of found communities that conform to the overlap rule given by OL_thres
background
an integer vector of indices corresponding to background nodes
initial.sets
a list of initializing sets determined by the scoring/FDR process described in section 4 of the article
final.sets
a list with same length as initial.sets with each item being the community produced by the corresponding entry in initial.sets (some, possibly many, will be empty due to the hypothesis testing)
communities_before_OLfilt
a list of the non-empty communities prior to filtering them according to OL_thres
OLfilt
a list output from the (interior) function that performs the overlap filtering