Pixellating Text Creates Identifiable Patterns

Posted by Jason TateJul 20, 2016

Kashmir Hill, writing for Fusion, summarizes a study by Steven Hill, et. al.:

“In many online communities, it is the norm to redact names and other sensitive text from posted screen shots,” write the researchers, specifically citing Reddit. “Mosaicing and blurring have also been used for the redaction of high-profile government documents and celebrity social media.”
They should probably stop doing that. The UC-San Diego researchers found that they could use statistical models—”so-called hidden Markov models”—to generate the blurring or pixelation of lots of numbers, letters, and words, to the point that their software program could match a known redaction to an unknown redaction to figure out what it says. The biggest challenge is figuring out the font and size of the underlying text which the researchers need for their deciphering. They say it works better than a brute-force technique for deciphering pixelated images discussed by Dheera Venkatraman in 2007.