Study co-author Benjamin Fung, a professor of Information Systems Engineering at Concordia University and an expert in data mining, said the past few years have seen an “alarming increase” in the number of cyber crime involving anonymous emails.
“These emails can transmit threats or child pornography, facilitate communications between criminals or carry viruses,“ he told Concordia Journal.
Although IP addresses can be used to identify the house or apartment where an email has originated, issues emerge when the location has several residents and therefore several suspects. Fung and his colleagues developed a method of authorship attribution that would solve the question of who wrote the emails, based on techniques used in speech recognition and data mining.
To determine whether a suspect has authored the target email, they first identify the patterns found in emails written by the subject. They then filter out any of these patterns that are also found in the emails of other suspects. The remaining frequent patterns are unique to the author of the emails being investigated. They constitute the suspect“™s “write-print,” a recognizable attribute like a fingerprint, according to Concordia Journal.
If the anonymous email has typos or grammatical errors, or is written in lowercase letters, Fung said, those characteristics are used to create a write-print. Using this method, “we can even determine with a high degree of accuracy who wrote a given email, and infer the gender, nationality and education level of the author,“ Fung said.
To test the accuracy of their technique, Fung and his colleagues examined the Enron Email Dataset, a collection which contains more than 200,000 real-life emails from 158 employees of the Enron Corporation. Using a sample of 10 emails written by each of 10 subjects, they were able to pinpoint authorship with an accuracy of 80-90 percent.
“Unless you have been unusually careful, the gig is probably up by then,” he wrote on The Privacy Blog. “Remember, this might not be for criminal matters. [In] many cases, this would come up in whistle blowing or other non-criminal situations.”