Continuous Configuration Model Extraction (CCME)

Results of CCME applied to U.S. airport network traffic from May 2015.

Results of CCME applied to U.S. airport network traffic from May 2015.

This is the homepage for the community detection method CCME. The method is a data-mining procedure based on tests of statistical significance with respect to the continuous configuration model, a null for edge-weighted networks proposed in the article The Continuous Configuration Model: A Null for Community Detection on Weighted Networks (supplement doc here). These significance tests are performed in iterative batches, with a multiple testing correction to ensure Type-I error control. The batches correspond to refinements of single communities, and refinements are performed independently so that each node has the potential to belong to multiple communities. Due to the significance tests, each node also has the potential to belong no communities. So, CCME will return “background” nodes along with communities, and sometimes no communities when no significant node subgroups exist.

The code for the method is available here at my github site. The method is coded in R but interlaced with C++ using the Rcpp library. You should update to the current version of R (and the latest version of all packages) before sourcing CCME.R. Here’s a text file version of the README for CCME.

 

Code and analyses for the article.

In the article introducing the continuous configuration model and CCME, we developed a thorough simulation framework (Section 5) to ensure the accuracy of the method (and compare it to others). While writing the article I curated a somewhat immense project folder to house and organize the code and results for these experiments and the two real data analyses (Section 6). Before posting to arXiv I re-ran all the analyses and made the project folder parsimonious.

With all the simulated data sets, the size of the folder is upwards of 30GB, so I deleted most of the simulation data. The results of the methods remain, as do the seeds for the simulated data, so you can reproduce any data set you desire and look at it and investigate the results. You can also reproduce all the data if you want, but this takes a lot of time and space. The link to the folder and instructions for reproducibility are below.

Main folder download link.

README download link.

You’ll also need the following two .zip files to put in the main folder.

Experiments 1-5 download link.

Experiments 6-9 download link.