Reconstructing visual experiences from brain activity evoked by natural movies. [link]

Kevin

http://sites.google.com/site/gallantlabucb/publications/nishimoto-et-al-2011

Fascinating. Three things leap out at me:

That is remarkably text-like without actually being text. Makes sense that reading would produce more distinct activity.
I would have expected more details on the faces, considering how much processing power is assigned to the task.
What's below the elephant is ground, what's above it is sky. Right? It's like the fMRI went "Hey, brain, what color is sky?" "Blue." "And what color is ground?" "Uh, green?"

I would have expected more details on the faces, considering how much processing power is assigned to the task.

The results are obtained by mixing 100 other movies together so it is not surprising that there are no details.

Presumably that's also why some of the clips seem to have jumbled words and letters floating around.

Cool, yes, but I have two questions about it.

Firstly, given the narrow support of the prior (the result of processing the brain imaging data is expected to look like the original pictures), it seems less than breathtaking that the posterior -- the reconstructed movie -- is also narrow: it looks sort of like the original. But how could it not? (A point that pedanterrific touches on here.) But I don't feel able to do more than raise the question. Perhaps someone more knowledgeable in the methods they use could comment.

I'm reminded of an possibly apocryphal story from the early days of speech recognition. As a demo, the researchers set up a computer chess game in which you would speak your move, and the computer would play it on screen. So a general (it was a military project) comes to see the demo, they invite him to speak a chess move, he coughs, and the machine responds with P-K4. The prior was too narrow -- noises that weren't moves at all were outside its support.

Secondly, it's already known that the retina projects to a part of the visual cortex (V1) that spatially corresponds very closely to the retina. The part imaged includes V1 and the rest of the early visual areas (see end of first page of the paper), so it's not very surprising that something can be recovered that looks to the human viewer somewhat similar to the original image. But only to the human viewer. The same human that can recognise an elephant in a movie is recognising a elephantish blob in the reconstruction. That does not mean that elephant-recognition is happening in the brain tissue that is being imaged. Are we merely seeing the image before it has undergone any substantial neural processing, and recovering what is left of the original, rather than what the brain is making of it? It's cool that they've got that far, but this isn't reading visual experience out of the brain (and isn't claimed to be).

Another anecdote to amplify the point. In the early days of machine vision -- say, 60s and early 70s -- people produced what they called "edge-detection" algorithms. The result of processing an image would be another image consisting of -- apparently -- all of the edges in the original image. But it was only to the human eye that that is what it looked like. As far as the software was concerned, it was just another pixel array. The software did not know that there were any "edges" there, and if you tried to really detect "edges" by stipulating that any connected set of black pixels in the transformed image was an "edge", you just got a mess. The software had made the edges more salient to the human eye, but it had not detected edges in any useful sense. All it did was apply a sharpening transformation that would nowadays be an ordinary Photoshop filter.

I wrote:

but this isn't reading visual experience out of the brain (and isn't claimed to be).

I neglected to notice the title of the paper, "Reconstructing visual experiences from brain activity..." So they do claim to be reconstructing visual experiences from brain activity. What they are actually doing is reconstructing the pictures that the subjects were looking at.

PDF. Pretty impressive you can get that far just by using BOLD, with its shitty temporal resolution and also-not-great spatial resolution.

I particularly like the text floating around during the title sequence, and with the elephant.

Also: awesome!

they should do this with the basketball team + gorilla video

would be intresting to see if gorillan shows up in the reconstrucshon

Have you, like, had a stroke in the last 12 hours or are you just drunk?

The latter, thankfully. Your concern is appreciated.

At least this one time I wrote something semi-useful. I would genuinely, and soberly, be interested to see such a reconstructed gorilla video, and in particular the difference between a watcher who is instructed to focus on the ball and one who isn't.

Thank goodness! (Party hard!)

And yes, though I suppose that might be influenced by whether there happens to be clips of gorillas in the sampled material.

Also, the example reconstructions seem considerably simpler (in the sense of there being a single, obvious-to-human-eyes focal object) than a basketball clip would be. It makes me curious what the differences would be - whether the 'most similar' clips would be other sports, people standing still, balls moving around the screen...

(by the way, he is referring to this)

No, I know what he's referring to - I even think it's an interesting question - I'm just remarking on the fact that his last few posts have been uncharacteristically incoherent.