FYI.

This story is over 5 years old.

Tech

How the Descriptive Camera Out Instagrams Instagram

Wandering the back halls of the internet -- all flickering lights, discarded furniture, and cigarette butts -- you might find yourself a bit lost. Around that corner another dead end, down that hall just more piles of junk, and, just as worry turns to...

Wandering the back halls of the internet — all flickering lights, discarded furniture, and cigarette butts — you might find yourself a bit lost. Around that corner another dead end, down that hall just more piles of junk, and, just as worry turns to panic, you might find a nondescript door with a light underneath. The trip further and further down — past storerooms of search spam tutorials and Flash games — has just gotten colder and more empty, but here is something. . .warm? You have in fact reached the very end of the internet (or computing, really) and, to your surprise, beyond that door is a human. This human is part of the Human Intelligence Task (HIT) division of Amazon’s Mechanical Turk and their job is to fill in the gaps, to do the things that human brains do better than machines.

Advertisement

At this very minute, that human might be doing something that machines are particularly not very good at: analyzing images. They’re getting better at it, of course, but describing just any old image in general terms — compared to recognizing a specific shape within a class or reading some text — is still very hard. It’s a frontier and one that will be conquered eventually as algorithms get better and processing power gets better (quantum computing, perhaps). In the meantime, we’ll just have to make do with a hidden stash of human brains. These brains are the guts of Matt Richardson’s Descriptive Camera, what started as an NYU class project and has since become something of an internet sensation. In short, it’s a camera that takes a photo and instead of outputting an image to the photographer, it outputs a short description. That’s it.

That description comes courtesy of a Mechanical Turk worker, at a cost of $1.25 and five or six minutes of time. So, for example: “This is a faded picture of a dilapidated building. It seems to be run down and in need of repairs.” It prints this out from the back of the camera on a little strip of paper. (Note that for instant cameras, this is a big cheaper and faster than current-day Polaroid film via the Impossible Project.)

“The workers themselves don’t know what they’re working on,” Richardson explained to me a couple of weeks ago. "They’re told told to describe the image with not many other guidelines. The reaction generally has been very positive, but there’s been some confusion. I created this conceptual piece to explore the idea of the descriptive metadata technology. To explore the question, ‘what if our cameras could understand what we are shooting?’ Many people ask why I made such a thing. It’s not something I ask myself when I make something; I just make it.

Advertisement

“I’ve been thinking a lot about how modern cameras capture metadata about photos: time, date, camera settings, and sometimes even location,” he says. “What they don’t capture is metadata about the content of the photo. That is, who is in them, where they are and what they’re doing. I wondered what it would be like to have a camera that could do such a thing, so I created the Descriptive Camera that would only output this data.”

There’s something fundamentally weird about assigning computing tasks to a human. It’s not weird, of course, to just ask a human to do a task for money, but here it’s something different and a bit surreal — using a human expressely as a processor, in those terms. It’s a more casual than usual acknowledgement of brain-as-computer. Note that non-human-assisted image recognition technology uses machine learning algorithms derived by how human brains process information as it is. As an art project, the Descriptive Camera is a kind of literalization/illustration of the brain/computer merging that’s already happening.

“I think there’s very little that can’t eventually be replaced by machine learning algorithms,” Richardson says. “It’s not easy to think of how a camera might eventually understand that a particular photo is of you and your brother are riding your bikes up a steep hill, but I like to think of how that changes the game.”

Inadvertently, he’s asking questions about the future of work. Given the demographics of Mechanical Turk, chances are good that each Descriptive Camera’s snapshot is being written by a worker in India, male, age 26-28 – and increasingly using Turk to make ends meet. A 2010 report by demographers at the University of California, Irvine, concluded with an observation on ethics: "With the exception of participatory design, HCI [human computer interaction] has not developed a language or conceptual framework for considering questions of labor and livelihood in system design. Given the demographics of Mechanical Turk, chances are good that that data is coming courtesy of a worker in India, male, age 26-28, and increasingly using Turk to make ends meet. A 2010 report by demographers at the University of California, Irvine, concluded with an observation on ethics: “With the exception of participatory design, HCI has not developed a language or conceptual framework for considering questions of labor and livelihood in system design. What kinds of work conditions for these human computers can the HCI community design?”

Advertisement

The thorny issues surrounding work in the age of Occupy can look ironic: Richardson has also created the ultimate hipster camera. Think about all of our Instagram filtering of photos, which is, generally, something motivated by camera photos taken by amateurs tending look crappy and boring and “regular.” And filters make them look more interesting and, well, not-crappy or at least look intentionally crappy. Things like Holgas and other toy cameras have long been used like that, but there’s something a bit more honest about it with actual film in a camera rather than some cheap software that acts exactly the same on every image it’s fed. Accidents and surprises could happen for real, whereas now we have filters that give digital photos the pre-packaged look of accidents.

The Descriptive Camera seems almost like a parody of the culture of the accidental — as much as it’s a paraody of neural networking — by providing the ultimate subjective interpretation of an image. You don’t even get the original image at all, just a once-in-a-universe interpretation by some stranger on the other side of the planet. Or next door.

“I’ve been surprised at how poetic they sound when they’re printed out,” Richardson says. "It’s not intentional on the workers’ part, it’s sometimes a factor of regional language differences or people who use English as a second language. When I have my own friends writing descriptions (see ‘accomplice mode’ on the project’s site), they tend to be more clinical, they act more like a computer because they know the context of their work.

“Very early on, I had considered giving the workers tighter parameters,” he adds, “but when the first basic tests of the mechanical turk came back rather imperfect results, I was delighted by them. The less machine-like the output, the more I liked it.”

Connections:

Reach this writer at michaelb@motherboard.tv.