Anomaly detection in social networks

Our lab has also developed a novel framework based upon a variational Expectation-Maximization algorithm to assess the likelihood of each observation being anomalous. The computational complexity of the proposed method scales linearly with both the number of observed events and the number of people in the network, while most competing methods exhibit far worse complexities which scale quadratically with the number of people in the network or the number of meetings observed. This makes our method much more well-suited to very large networks, and it requires no parameter tuning. In our simulations, we were able to detect anomalous meetings within a 2000-person network with 100% accuracy using just 200 observations, while alternative approaches based on graphs were significantly more computationally expensive and alternative approaches based on Support Vector Machines resulted in large numbers of falsely detected anomalous meetings.

To run this, you need the following file: demo_varem (Just the variational EM). Inside, you’ll find two .mat files with the preprocessed Enron dataset. You’ll also find three scripts: “demo_synthetic.m”, “demo_enron_subject.m” and “demo_enron_day.m”, which exemplify how to call all the other functions. Run either of these, for results with synthetic data or the Enron data, organized by subject line or by day, respectively. Additionally, you should download and install the lightspeed toolbox for MATLAB from Tom Minka’s page, because of the “logsumexp” function and other improvements: lightspeed toolbox.
With the synthetic example, you can set the parameters by editing “demo_synthetic.m”. You shouldn’t decrease the number of observations much lower than n=200, unless the dimensionality p is low. You can crank up p as far as your memory will allow. In a laptop with 2 GB RAM, if you keep the MC sample size down at m=1000 and set, say, n=200, you should be able to go as far as p=20,000. The Enron example, organized by day, has p=75,511.

If you intend to run the comparison with the OCSVM algorithm, you’ll need to use the file: demo_ocsvm (Variational EM vs. OCSVM) and also download and install the libsvm toolbox for MATLAB, by Chih-Chung Chang and Chih-Jen Lin: libsvm toolbox. Tip: to get libsvm working on some platforms, you may need to edit “make.m” and replace all references to .obj with .o instead. For comparing with L1O-kNNG, we have included a (probably rather crude) implementation: l1o_knng (Variational EM vs. L1O-kNNG). Extract all the demo_*.zip files to the same directory. Remember to add the toolbox locations to your MATLAB path. Then, run “demo_ocsvm.m” or “demo_l1o_knng.m”. Optionally, you can also just edit “demo_synthetic.m” to set parameters and to mix and match methods for comparison (simply set the corresponding run_* variables equal to one, in the “Initialization” section of the code) and then run the edited “demo_synthetic.m”.