Programming the crowds

At the Association for Computing Machinery鈥檚 23rd symposium on User Interface Software and Technology in October, members of the User Interface Design Group at MIT鈥檚 Computer Science and Artificial Intelligence Laboratory walked off with awards for both best paper and best student paper.
Both papers describe software that uses Amazon鈥檚 Mechanical Turk service to farm tasks out to human beings sitting at Internet-connected computers. The best-paper prize went to , a cell-phone application that enables blind people to snap photographs and, within a minute or so, receive audible descriptions of the objects depicted. The best-student-paper prize went to Soylent, a program that distributes responsibility for editing text to hundreds of people in such a way that highly reliable results can be culled in as little as 20 minutes.
Launched in 2005, is an Internet marketplace, where so-called requesters can upload data and offer payment for the completion of simple tasks that computers can鈥檛 perform reliably. A requester, for instance, might split an audio file into 30-second chunks and offer five cents for the transcription of each one. But Mechanical Turk has a few drawbacks. One is its fairly clunky interface. Others are the difficulty of getting results in real time and of getting reliable results to complex tasks. The MIT researchers鈥 papers address all three problems.
People in the program
Both VizWiz and Soylent operate as free-standing applications: The user wouldn鈥檛 necessarily know that they use Mechanical Turk at all, only that they take somewhat longer to return results than computer programs generally do. VizWiz, whose design was led by Jeffrey Bigham, an assistant professor at the University of Rochester who was a visiting professor at CSAIL last fall, uses a variety of tricks to get results within seconds. One of these is to recruit Mechanical Turk workers 鈥 or 鈥渢urkers鈥 鈥 as soon as the application launches, on the assumption that it will imminently provide data for analysis. To hold the recruited workers鈥 attention, and to prime them for the type of task they鈥檒l be asked to perform, the VizWiz system gives them a series of already-solved problems to work on, until the cell-phone application has produced a new photo for identification.
鈥淭he use scenario is just fantastic,鈥 says Ben Bederson, an associate professor at the University of Maryland鈥檚 Human-Computer Interaction Lab, who was not involved in the project. 鈥淚t鈥檚 not only visually impaired people. I was traveling in China, and I can鈥檛 read the road signs. There are a number of times where people want real-time help.鈥 Bederson adds that he considers the most innovative aspect of the program the technique it uses to enable blind people to take photos in the first place. 鈥淭hey came up with this mechanism where they used real-time computer vision and sound generation to give audio feedback of whether or not you鈥檙e pointing at the right thing,鈥 he says. 鈥淭hat鈥檚 a totally new concept. That鈥檚 incredible.鈥
Nonetheless, Bederson says, Soylent may be the more innovative of the two systems. 鈥淭he nature of this labor force is that it鈥檚 very disconnected,鈥 Bederson says. 鈥淭here鈥檚 a huge amount of lack of trust and cheating.鈥 For instance, Bederson says, his own research group tried to use Mechanical Turk to assign two-paragraph movie reviews a simple binary rating: favorable or unfavorable. 鈥淢ost of the answers were basically garbage,鈥 he says. A turker who submitted a random rating was paid as much as a turker who actually took a minute to read the review. 鈥淭here鈥檚 been a lot of work on anti-cheating mechanisms,鈥 Bederson says, but Soylent 鈥済oes a step farther in coming up with mechanisms to engage work that before this paper people didn鈥檛 think was possible to do with this kind of labor force.鈥
Between extremes
In the MIT researchers鈥 experiments, Soylent recruited turkers to perform two different tasks: one was to copyedit a document of roughly seven paragraphs; the other was to shorten a document. 鈥淭he na茂ve thing you would think to do is put your paragraph on Mechanical Turk and say, 鈥楬ey, could you please make this shorter?鈥欌 says Michael Bernstein, a PhD student in the Computer Science and Artificial Intelligence Laboratory who together with associate professor Rob Miller, who heads the User Interface Design Group, led the Soylent project. 鈥淎 huge set of workers will simply find the simplest single thing they can do, make one short edit somewhere near the beginning of your document, and run away with your money.鈥 Another group, whom Bernstein and Miller call 鈥渆ager beavers,鈥 will perform all kinds of unnecessary edits that may even introduce errors. 鈥淭hey鈥檙e trying to help, but they鈥檙e doing more harm than good,鈥 Bernstein says.
So the MIT researchers, and their colleagues at the University of California, Berkeley, and the University of Michigan, split editing tasks into three stages, which they label Find, Fix, and Verify. Initially, turkers are recruited simply to identify sections of a text that are wordy or contain grammatical errors. Those sections are then abstracted from the text and passed on to additional turkers who propose solutions. And finally, another round of turkers evaluate the proposed solutions, weeding out those that are ungrammatical. In experiments, the researchers found that $1.50 per paragraph would elicit good results within 20 minutes; the cost would go down to about 30 cents per paragraph if the user was willing to wait a couple hours. Either way, by the time the results came back, literally hundreds of turkers would have worked on the document.
鈥淭here鈥檚 a lot of work nowadays on Mechanical Turk,鈥 says Luis von Ahn, an assistant professor of computer science at Carnegie Mellon University. 鈥淚 find that most of it is pretty crappy. But this is one of the only papers I saw that told me something that I didn鈥檛 actually know.鈥 Von Ahn is famous as the inventor of captchas, those online tests that ask the user to re-type a line of distorted text in order to verify that he or she is not a spam robot. According to von Ahn, work like Soylent and VizWiz is valuable for revealing what鈥檚 possible with Internet-coordinated, distributed labor. But, he says, he would like to see researchers develop a better theoretical understanding of the relationship between the size of tasks, cost and time, which would explain why systems like Soylent work where others don鈥檛. Given a problem, von Ahn asks, 鈥淗ow do you split it up into the right-size pieces so that at the end of the day you get the most people doing it and probably the cheapest?鈥 In some instances, he says, 鈥淎 20-minute task might be unmanageable, but a nine-minute task is just right. It would be really nice to have more of a theory of this.鈥
This story is republished courtesy of MIT News (), a popular site that covers news about MIT research, innovation and teaching.
Provided by Massachusetts Institute of Technology