Recently I’ve been observing via RescueTime that I spend 3 hours or more hours in my e-mail application most days. However, I don’t have a good breakdown of how much of this is scheduling, looking up information, commenting on something substantive or social discourse. There is a tremendous amount of information locked up in the time-series of e-mail’s sent and received that can provide insight into aspects of my behavior such as focus of attention (time of day e-mail is sent), social relationships (what organizations I interact with in a given week), the density of idea generation, etc. E-mail logs contains a wealth of raw data that can be instrumented to uncover important information about our life.
Our E-mail logs are also rich archive of useful information such as phone numbers, addresses, what we said to someone, when we said something to someone, edits to papers, attachments, etc. With a proper set of tools, many of which have been built for analyzing social media, we can turn this archive into a database of useful information that can significantly enhance e-mail-based instrumentation.
The techniques and technology I imagine for instrumenting e-mail emerge from my early research at the MIT Media Lab, current work with the C3N project, our collaborator Ginger.io, and the work I’ve done at Compass Labs. For example, Ginger.io inserts a small bit of logic into your cell phone to track your SMS and e-mail activity along with a sample of your GPS location. They are building predictive models that take this raw data and then predict whether a chronic disease patient is having a flare, feeling depressed, etc. At Compass Labs we’ve been taking public social media messages, breaking them apart into primitive components (topics, sentiment, social graph, etc) and using that as inputs to a wide variety of predictive models.
I’m surprised how hard it is to find similar tools I can use that will read my e-mail, extract knowledge from it, and project that knowledge into informative time-series. These datasets could effectively represent characteristics of myself such as mood, focus, activity, personal vs. work balance, etc. The implications for the enterprise are obvious; there must be more commercial or open-source projects out there. The few I’ve found aren’t terribly useful:
- Mail Trends (Python + IMAP)
- Clear Context (Outlook plugin)
- Xobni (Outlook plugin)
In the e-mail domain, Buster Benson of Habit Labs has some tools for tracking your inbox status (unread e-mails, e-mails sent, etc) over time but I don’t think that code is available. I am beginning to fear I may have to roll my own!
Does anyone know of more good tools or services I could use or reference in my research work?
Here is a starting set of factors such a service might extract/exploit:
- Time e-mail is sent and received – this allows us to easily measure activity factors such as e-mail velocity (how many e-mails per hour/day), ratio of received to sent, am I answering e-mail in real time or am I batching it up?
- Social graph – who am I writing to, what kind of people are they – with a little bit of annotation I can group domains and e-mail addresses into work/personal/hobby, etc and look at who I talk to. There are some good examples (but not tools) along these lines out there.
- Topics – What kinds of topics do I talk about, how frequently, etc?
- Text Sentiment – What is the general mood of my text over time? How does mood-carrying language correlate to topics, senders/recipients etc?
- Language use – How we write is as informative as what we write. There is some good research out there on what we can learn from how we structure our sentences. For example, I believe that my vocabulary is better on days I’m less tired so diversity of word use is a potential metric for predicting fatigue or general cognitive funtion.
- Learned correlations – If I have an external source of data, such as days I’m feeling tired or days I’m feeling really good we can build models that look at features of my e-mail (activity, word use, etc) that predict that state. I can then use my e-mail activity to provide same-day feedback to remind me that I should be focusing more on work when happy, and more on life logistics when tired.
(I will update this list further over time as I get feedback and more ideas occur to me)
What else do you think could be done to extract insights from e-mail?