mClerk project digitizes docs via SMS

10 May 2012

ITEM: Researchers from the University of Toronto and Microsoft Research India are leveraging an old Nokia messaging protocol to enable low-end mobile phone owners in emerging markets to make money by digitizing documents.

The researchers recently completed a pilot test for “mClerk”, a crowdsourcing project in which people were tasked to digitize handwritten words using mobile phones via a protocol from Nokia’s Smart Messaging Service that enables binary picture messages with a resolution of 74 x 28 pixels that can be transmitted by SMS.

According to a research paper [PDF] released Monday at CHI 2012 in Austin, Texas, mClerk works like this:

In essence, mClerk starts with a scan of a paper document, segments it into word images, sends each image via SMS to users’ phones, receives back the users’ responses, probabilistically verifies them, pays the users and aggregates responses into a digital document. It has four modules: image segmentation software, a mobile crowdsourcing platform, word aggregation code, and a payment mechanism.

The imaging protocol not only lets any low-end phone participate in the project, but also gets around the issue of language font support. For example, the mClerk project had users digitize words in Kannada (the local language of Karnataka, the state where the project was conducted), which isn’t supported uniformly on handsets.

Results: the project grew from ten users to almost 240 in five weeks (thanks to a referral mechanism that helped the project go viral), at the end of which they had completed 64,000 tasks for a total of 25,000 digitized words in a handwritten document that had been chopped into thousands of images of individual words.

Also, participants were paid in phone minutes for their work, which equated to about $21 a month – 12% of the average monthly wage in the region – for just two hours work a day, according to Technology Review.

Aakar Gupta, a researcher from the University of Toronto and lead author of the paper, tells TR the concept could be applied to all sorts of handwritten documents, such as information on medical forms, which would enable distributed digitization of medical records.

However, while the research team will continue to develop the technology in the coming months, it has no immediate plans to commercialize or license the technology, TR reports.

Meanwhile, the paper argues that the mClerk system could in future be applied to “other contextually appropriate tasks such as audio transcription and tagging locally relevant images and songs”, and is just a taste of the potential of mobile crowdsourcing as a viable ecosystem and an economic boost for users in emerging markets.

Related content

No Comments Yet! Be the first to share what you think!