Final Paper for LIS 590 AD – Sociotechnical Data Analytics (Spring, 2014)

Pattern-Seeking in the Enron Email Data Set: Understanding Levels of Connectedness


The Enron Email Dataset has provided a rich corpus for analysis by a wide range of analytical methods. As a body of highly interconnected data, this dataset is an excellent candidate for network analysis. Considerable research has already been done in this field, but there remains much to learn. Many of the email addresses in particular version of the corpus used in this study have been linked to individuals, allowing for the opportunity to examine those individuals’ own communication networks, both inside and outside of the Enron system. Using network graphing tools and the creation of a composite Oracle SQL database table, several analyses were performed. I found that the network has a few members who are very strongly linked to many sectors of the network, and many members who are only weakly linked to the network. This paper presents a new table for the database that could be beneficial for future analysis of the level of connectivity of individual members of the network and for examining the more global level of connection. Both the global and individual level patterns which emerge under this analysis provide a useful framework for understanding ways in which a corporate network can simultaneously be linked together closely and loosely, and what that may mean for the health of the network.