SUrF – Philip Schütte - (re)Placing (re)Cognition: Cognitive Places in the Making

The discussion with Philip Schütte turns to intuition as (re)cognition in motion, that is, a pattern-based orientation that operates before decisions, explanations, or aesthetic judgments take form. The interview moves between playful formats (the SUN), where orientation is situational, and MRI-based reconstructions of self-viewing (auto-portraits), where orientation is externalised, to show that perception is never simply private. In playful situations, a small mismatch between what an image seems to demand and what the body discovers can trigger a micro-event of recoding: surprise, a smile, a brief relief, a shift in the threshold of action. In the neural reconstructions, the direction reverses. Perception is no longer only something a subject enacts in encounter with the world; it becomes something the world computes about the subject. Across both registers, the image is treated as a cognitive ecology. Images thus do not represent; they train, normalise, and re-orient, ultimately placing cognitive work within organised groups of signals -visual, spatial, procedural- that tune attention and behaviour. A (re)placement, then, is not simply a revision of context but a shift in the operating conditions of perception, and what counts as ‘the self’ is recompiled by the cognitive ecology it enters.

Philip Schütte was interviewed by Lena Galanopoulou

To give a bit of context, this conversation is part of Speculative Urban Futures (SUrF), an Erasmus+ initiative that explores how speculative design pushes us to rethink and reshape contemporary urban realities. These conversations are one way we’re opening up different perspectives on the topic. In our pilot at TU Delft, Sensing–Imaging–Intuiting, we’re working on intuition, or in simpler terms, on how designers sense, image, and anticipate futures.

You’re a co-founder at Formats & Mechanisms, which you describe as a ‘critical media exploration studio’. Could you say a bit about what that actually does, or means in practice?

What do we do? I think it’s ‘critical’ because there is a lot of need for critical reflection on our media treatment and technology treatment in general. We’re all both stretched thin and compartmentalised yet everybody is somehow still ‘stoked’ about the ‘cool things’ they are doing, without much understanding of what is actually available or happening elsewhere. People reflecting on it are are few and far between. There’s also such a delay for academics to catch up to understanding what’s even happening right now. And you have the same in all the other disciplines. I guess it’s always been like that: something happens, and then you post-rationalise it. I don’t know how critical we can claim to be, because we also work within it, but we try to do so from a position of sceptical questioning; skeptical of the attention market, and sceptical of the picture of humans around it that we are using. That’s, I guess, what it means.

‘Formats’ are what we try to develop. We feel like everybody’s media literacy and image literacy is really, really high now; People are really trained up on decoding images and understanding images as a multi-dimensional communicational tool. It can be a proxy for all sorts of things, and representations can be food for semi-intelligence beings and pattern regulators, etc. People understand all that intuitively now, we think. But very little is actually done with it. We always feel like it’s kind of the new colours of the tool set that art -or communication- could use, but not many people do it.

So what we try to do is insert new formats or explore new formats in some way, shape, or form. We always try to introduce a small experiment that rejiggers the structures through which you perceive information, or relate to it. That is the format. Then we also play with mechanisms; once you have learned something about, say, how the selfie works, there are so many ways you can play with that structure to make it funny, exciting, or attuned to almost any tonality you want. So we try to play on the mechanisms or develop a new format, but I think we’re best with developing new formats.

What’s very interesting in the way you describe formats and mechanisms is that they seem to redistribute agency, away from individual intention and into setups that shape how orientation happens. In your work, what ‘media’ means? Are we talking about intermediaries such as tools, platforms, interfaces, or even organisational structures?

Image 1: Image01000023. Project about automatically up and downscaling images millions of times (2008)

We work with a diverse group of clients spanning cultural and commercial contexts across many industries, including artists, academics, designers, musicians, and marketing professionals. This diversity mirrors our own broad -perhaps unconventional- capabilities, which range from data work and development to creative technology and moving image. We try to go back to the beginning -what people are trying to do- and then rebuild a system around it. It can be moving image or photography, installations, ways of relating your body to information. I did a project for kids where they could use their body as a search prompt: they take a position and it’s mirrored with a specific image. It was about embodied relating to abstract representations.

In your example the interface becomes cognitive, in other words, the system actively shapes how bodies and images relate. Today many of these mediating systems are increasingly automated. With AI now matching -and in some cases surpassing- our more so-called ‘rational’ capabilities (analysis, optimisation, even certain design outputs), what do you think remains specifically human in making? And where does intuition sit in that shift?

So, yes I like intuition. I have the sense we’ve put too many eggs in the rationality-and-skill basket, and we’re now, more than ever, realising that the rational capacities we took pride in are also deeply flawed and limited. That becomes especially visible with the emergence of AI and high-level pattern recognition. In a way, it’s like we may have partially bet on the wrong horse.

Rationality has gotten us far; We built machines that aren’t intuitive but are highly rational, and these can now produce design outputs for us. But as our ‘best’ rational thinking is increasingly mirrored by machines, it triggers a kind of existential crisis: what is it that we do, and what are we actually uniquely good at?

That’s partly why I worry about training people only in hard skills. More and more of that is precisely what can be replaced. Intuition, by contrast, seems less likely to be substituted. That’s part of why I was curious about how you define intuition. To me, it feels like the part that ‘summarises the summaries.’ It’s not just a conclusion, but a condensation you can’t fully explain. When we were discussing about intuition, you wrote about a felt sense of potentials that can’t easily be put into words. I think intuition can be combined with rational thinking. If we frame it as two processes, fine, though I suspect there are more. What I’m interested in is the human element in all this: what it means now to be a human maker, and what role intuition plays. Because schooling tends to lean toward the rational, we teach people 3D, for example, and that’s useful, and it may take time to be fully replaced. But you can imagine a world where it is replaced, and then what remains? That’s where intuition comes back.

Image 2: Overview of Empathy vs Greed displaying all unique image of the concepts ‘empathy’ (800.000) and ‘greed’ (1.200.000) for Fotomuseum Breda (2012)

So we are aligning into questioning intuition as a capacity to sense, in one and the same movement, how a situation is likely to unfold and what else it could still become, before being able to account for this anticipation in discursive terms. It is pattern-based without being reducible to simple pattern recognition.

Design education still tends to romanticise the ‘good eye’, as if we were talking about an innate talent or vague artistic flair rather than a trainable skill. The ‘good eye’, ‘gut feeling’ or ‘sixth sense’ vocabulary routinely attached to intuition de-worlds it and over-inflates its plasticity, as if it offered a quasi-prophetic access to any situation from nowhere. Our working hypothesis here in the SUrF inverts that. We approach Intuition as a soft capacity, that is, collective rather than idiosyncratic; A trainable skill for orienting within situations.

Given your work with AI-generated images and large image archives, do you think this ‘good eye’ abstract discourse around intuition hides how much our perception is actually trained by tools, datasets, and image formats? I am thinking in particular of your Auto-Portraits project, which seems to operate precisely at that intersection between machine vision and learned perception. Could you tell us more about how that work emerged?

Yes, this project is about something that started a long time ago. There was a guy who managed to reconstruct images from patterns detected in the visual cortex. Basically, when people saw a tree, there was a specific pattern that served as evidence of ‘tree’, and similarly for a house, a car, a cat, and so on. The PhD student that did the initial study spent months in an fMRI scanner calibrating a model to map what his brain was doing while watching specific stimuli. Then they switched the logic and said: OK, what if I watch something, and we give a different experimenter only the brain-activity patterns and ask them to reconstruct what I watched? So, they reconstructed images from mathematical modelling of visual-cortex activity and played them back to another experimenter to identify the original images. Interestingly, they used YouTube videos. The participant watched a huge library of short, single-shot clips, like, for example, a plane landing. They used correlation-based methods (because this was still relatively ‘basic’ statistics) to link visual input and brain activity, then overlaid the YouTube clips that best matched the pattern in the brain. The reconstructions were really fuzzy -layers and layers of moving imagery- but sharp enough to identify the right clip. You could catch, for example, the rough shape of earlier mentioned plane passing through the screen. They validated this across multiple experimenters. It was the first time they could plausibly say they had retrieved images from the brain.

That was the original study¹ by Kendrick Kay and the Gallant Lab at Berkeley. I found it mind-boggling because it made you feel the potential that was coming. If this was possible with relatively basic math, what happens once machine learning and AI enter the picture and pattern recognition becomes automated? If the patterns emerging in your brain can be modelled robustly, then in principle you could reconstruct what you perceive. Whatever ‘image’ ends up meaning in that context. Even just getting to ‘plane landing’ is already wild.

Scientists are pretty matter-of-fact about it; house colours, edges, full-screen shapes, patterns. But what I found more interesting was: OK, if we go down that route, what about more personal constructs? So I wrote to the lead investigator about my interest and basically just told him how fascinated I was by their work. He replied (which honestly surprised me), we met, and he invited me to propose a project. I thought, what is a really personal construct that’s still tightly linked to seeing? My idea was: what if you have people in the fMRI scanner look at themselves? Self-viewing is a very specific thing, you’re not just seeing a face, you’re seeing yourself with all the information points you carry about yourself. This was still before the selfie became what it is now, but it already had that charge.

I proposed an experimental condition where people would see themselves and we would reconstruct images of how they see themselves, ‘without their participation’, using math as the author, and brain activity as the trace. I don’t know if you want to call that dualistic (matter / mind), but for me it felt intuitive. I just felt the potential of the project. The implications were so far-reaching that I never really dove into the philosophical unpacking, partly because there’s a lot there, and partly because I’m more interested in making and never feel theoretically grounded enough to do so. I didn’t fully connect it to theories of self-image or representation, or to what it means to reconstruct the ‘self’ through high-end technological, mathematical processes and infrastructures.

So, I pitched it to him, and he was really excited; surprisingly, given how specific it is scientifically: viewing the self, and everything encapsulated in that. He supported me in reaching out to labs. And then I tried for about ten years to get it included in experimental setups, which is depressing and complicated: at that time it was something like $70,000 an hour to run an fMRI scanner, and whenever PhDs or postdocs ran experiments, they’d stop as soon as anything went wrong. My condition was always the ‘nice-to-have’ add-on, so it rarely made it through. I got some images, but I couldn’t afford to make the project truly happen. Then, a few years later, AI arrived. It did what everyone expected: it crunched the math and simplified the reconstruction pipeline. We could use existing datasets about visual cortex. AI trained on several individual brains’ data to generate self-images without the people even knowing at that point. That becomes scarier, because it’s not even about people actively seeing themselves; it’s about inferring how they see themselves, and how participants in these experiments see.

The first version of these images comes from participant 4 of the Natural Scenes dataset. This dataset was established by Kendrick Kay (the original PhD student from the Gallant Lab at Berkeley, now a professor in Minnesota). He created a seminal dataset with eight participants who spent a long time in the scanner watching dogs, houses, and thousands of other images. They trained models on how each person’s visual cortex works. By now even including more prefrontal activity, not just visual cortex, so more of the ‘higher’ processes as well. From there we reconstructed images, not only of how they saw themselves, but also images of their mother or best friend, these very personal targets. The idea is then to put the generated images back to the eight subjects and ask them about familiarity.

By now there was a substantial ‘Auto-portrait group’, consisting of very notable computational neuro scientists from New York, Toulouse, Minnesota and Rome. Our proposal was: what if we do the same thing but have people watch themselves in the fMRI scanner? Record the activity and reconstruct the self as-seen. Because I couldn’t fund that directly, we used Natural Scenes / COCO-style datasets instead: eight people in scanners for a long time, seeing thousands of images that computers also use to learn vision. From this, an AI learns patterns of an individual way of seeing, which loops back to your question about training the eye in a weird way, because we don’t even know how universal versus individual perception really is.

So now you have eight subjects (1 through 8) and their data, and you can reverse-engineer and “prompt” these brains in a sense. The reconstructions go through phases of neurocomputation, and those phases can be trained and coupled with something like stable diffusion to turn them into more specific images. The machine decodes the machine-view of the brain into a human-view of the data. There’s a strange feedback loop in that.

In this scenario, I’d still love to get rid of the AI elements one day, either by funding or by building a setup where people actually look at themselves and you reconstruct directly. It’s cleaner, and the storytelling is less complicated. But this version is interesting in a different way. And because there’s no scarcity - hundreds of reconstructions, endless parameter variations - you end up with massive personal data. Still, the outputs can be accurate in certain basics: the genders and age groups roughly match the participants. And subject 4 is this one (Image 3).

Image 3: ‘Me’ (Subject 4) from the Auto-Portrait series shown at FOAM Amsterdam (2024). With the Auto-portrait Group (Ongoing)

In the images you showed to me, subject 4 seems to carry this mask over the face. What moved you there, and you chose to focus on this specific subject?

For me, it was from the art perspective that I took out subject 4. I don’t know much about her, right? She’s unknown to me. From the very limited experimental meta data I know she’s female, of a certain age, and has a specific ethnic background, and this kind of ethical-guidelines questionnaire around it. But what she had in all her self-imagery productions was this very strange thing: you would always see a woman, so the math thing kind of worked, right, but the face was always, almost always, hidden behind a very strange mask. It was like a doughy mess in front of her face. And she was the only one that had that usually. Others had more ‘normal’ faces. Hers was consistently obscured, and I thought that was interesting to use as a stepping stone to have these conversations.

But to be honest, the project is tricky to explain to people. I showed it in FOAM in Amsterdam, and it was way too complicated already - this explanation is complicated and lengthy - so it was hard to make it land in that very compacted context.

I’m still looking for ways to continue the work, because it’s still expensive. But I do think it speaks of us entering a new era of self-portraiture. Self-portraits often happen when there’s a shift in how we approach the world. I’m not saying this is comparable, but you have scientific self-portraits in the Enlightenment, signifying leaving a spiritual space and moving into a scientific, empirical world. And I think this one is similar, in the sense that we’re ‘flying inwards’ and using technology to get at the core of what we are. I feel that’s going to be going on for the next decades.

If I follow this idea correctly, these new forms of self-portraiture appear as a shift in how we understand and question ourselves, and these neuro-computational reconstructions seem to signal such a moment. The ‘portrait’ is no longer produced by the hand or the eye alone, but through a whole technical apparatus that mediates perception, data, and interpretation.

In that context, where would you locate intuition? Does it primarily operate in cognition, that is, in how we perceive ourselves and our environment; in making, that is, in how we experiment with and act within these environments; or in the media systems themselves that shape what becomes visible and thinkable in the first place?

I like the question because I have no idea. I think it’s the interplay between the three, and the shifting of orders. Sometimes you enter through something stupid and playful, and you end up somewhere else. With SUN (Image 4), for example, it’s a little ball. You pick it up. You engage in a way that’s very different from where you end up. You’re kind of tricked into it. Sometimes it’s much more analytical. I don’t know what comes first, but the interplay and the composition is where it becomes interesting. That’s a bit of a cop-out answer.

Image 4: Sun(2017)_Screenshot2 from https://vimeo.com/206307408

OK, let’s go to SUN, because I think it’s a case of this ‘entry point shift’ we are talking about. In SUN you collapse a cosmic phenomenon into a bouncy ball and literally put the sun in people’s hands. You’ve described ‘the moment’ when people realise that by lifting the ball they are lifting the sun. From your point of view as the maker, what exactly happens in that moment? If we slow it down, what changes first: their body posture, their sense of scale, their belief in the image, or something else?

Most of the time it sits in an ‘artsy’ context, so people don’t dare to touch. They’ve been taught to stay away from things. There’s already a kind of selection happening: kids run straight to it, but grown-ups are much more timid. That’s also why I’m glad we went with the ball version. It’s so stupid and funny that I hoped even adults would dare to pick it up. But adults usually need someone else first; They need to see somebody doing it to feel allowed.

And when they finally do it, the first thing that breaks is their relation to the image. They go, what the hell is this? because they had an expectation, and that expectation about what should happen gets corrupted. That disruption almost universally produces a smile: it’s a funny, tingly feeling, tiny, but enough. In a stupid way, it’s like a mild version of the overview effect. It shifts perspective and loosens the sense of severity, because even what feels like the most rigid pattern can be broken in your brain. From the outside it looks like a relief for the visitor.

Image 5: Sun(2017)_Screenshot1 from https://vimeo.com/206307408

SUN stages a kind of gentle megalomania; one picks up a toy and suddenly they control a star. In the Fast Company piece on SUN, you talk about people testing that power, playing with sunrise, noon and sunset. What does that tell you about how people intuitively relate to technological control?

Originally the work was also about testing the promise of technology: this idea that it lets us reach for the stars, gives us more control and power. Humans want that because historically the baseline has been harsh - survival, vulnerability, loss. Technology promises control, and it delivers -but it delivers so hard it might end up enslaving us.

What interested me was precisely that space of tension. The SUN is playful, moving this little sun around, engaging with something that feels light and intuitive. But at the same time the same technological power that allows us to manipulate systems so easily is also the force that might eventually become overwhelming or even harmful. That tension is what I liked (Image 5).

That tension between play and control is very interesting. Particularly because the work seems to invite people to enter a system through a simple move or gesture and only gradually discover the larger forces at play.

In my research I’ve been using ‘recoding’ to describe intuition not just as reacting to a situation, but shifting the frame through which the situation is understood, or more precisely, changing the medium of the problem. Do you recognise something like that in your practice? Do you see intuition as something that can be trained through these kinds of experimental reframings?

I was wondering what you meant by the difference between recoding and responding. Because to me recoding already feels like a response mechanism, or one specific way of responding.

Yes, for me recoding is responding by shifting the medium of the problem. Passing the same question through another code: through language, through a model, through a prototype, through a different material constraint. Not solving immediately, but multiplying the ways the problem can be held open and productive as such.

In a lot of the things we build, the point is not to guide someone toward a predefined outcome but to create a small system where a shift can happen. Sometimes the entry point is almost trivial, like a toy-like object, in the SUN or a simple interaction, and only later do you realise that you’re operating inside a larger mechanism.

So in that sense intuition might appear exactly in those moments where people start to change how they approach the system. They try something, it produces an unexpected behaviour, and suddenly the frame shifts. It’s less about finding the correct response and more about discovering that the situation itself can be approached differently.

I don’t know if that means intuition is something you can ‘train’ in a direct way, but you can certainly design environments where those shifts become more likely to happen.

Indeed, a lot of your work invites people to experiment without a manual. In design pedagogy we often do the opposite: rubrics first, objectives first, criteria first. Do you think we should be braver in dropping students into manual-less situations and letting intuition lead?

I really liked that question; whether we should be brave in dropping students into manual-less situations. I kind of wondered: if we do that, which I think we should, then what is left of pedagogy? Because the pedagogy is kind of the manual in a way, no?.

I was lucky enough to do a stint at ECAL in Lausanne. In one instance they invited me to do a interactive media workshop and so I proposed a potential and then let the students go. The idea was that proposing a novel problem where there was no manual would lead to novel ways of thinking. With climate change, everybody knows the problem, so there’s a lot of premonition about how you ‘respond.’ But if you posit a new problem, like: how could you grow new senses using brain plasticity, that was really liberating. You could compare how you think about existing problems and contrast these thinking modes.

The idea was about using the potential of neuroplasticity and technology to build in essence new senses. It came from one of my favourite projects came from a belt that pokes you in the direction of north and rekindles your old forgotten ‘sense of the north’. The students came up with really good work: sensing the composition of neurotransmitters, or sensing Earth’s position in the solar system. They asked: what if you could sense where on Earth you are relative to other solar objects? The precise projects matter less than what the thinking does: manual-less situations can be extremely informative, especially when the problem is genuinely novel.

Listening to this, I’m thinking about another context where manual-less exploration is central. If we fold back to playfulness..have you ever heard of the game Outer Wilds?

No, not really.

The game (Outer Wilds) makes you comfortable with dying and starting again; failure is literally built into the loop. In your design process, how do you frame failed prototypes or abandoned directions so that they feed intuition instead of shutting it down? In this game called Outer Wilds where you explore a small solar system that resets every 22 minutes when the sun goes supernova; only knowledge carries over. If SUN is about a system that obeys you, Outer Wilds is about discovering a system that ignores you. It’s maybe total plasticity versus total control. In the game you never level up your tools; you only level up your understanding. Do you see your design work as a series of such knowledge-loops?

It depends. A commercial commission is different from an art project. But honestly, I always want to develop new processes every single time on how to approach a question. I’m more interested in the system of emergence of an answer. Which makes it almost unworkable business-wise, because I never want to do the same thing twice.

I am myself quite cerebral; I think things to pieces, almost to get that out of the way, and then on that basis I try to have a more intuitive feel for what sticks and what resonates and has potential, and then I get super controlling again and work it out. At the studio we are really obsessive with mockups. We simulate everything. We build elements that are missing so we can understand what happens. That helps us transition from concept to real-world situations, especially when you direct things and need to know what will happen and convince people.

Image 6: Screenshot from the project Edrandom.com juxtaposing a random place on earth with a random word from the English dictionary. With Ottokaan and Sascha Landshoff (2012).

I see you create those simulations to somehow have a kind of control over possible outcomes. But, in your own projects, what are the parts of the process that simply cannot be fully planned in advance? How do you keep those zones of uncertainty open while keeping control, especially when you have to do with huge data-sets? Or more specifically, could you give an example where micro-decisions accumulated over time ended up redirecting the work in an unexpected way?

We did a really fun project recently for a friend at a fashion brand. Fashion shows are live now, so people watch anywhere, on the tram, at home, even in the bathroom. The designer said: I want people to tune in at 4:30 and instead of having to wait for the show… they hear the sounds of a quite explicit situation happening, like you’re dropping in on somebody cruising, anonymised. The mismatch with expectation was the point. So we built a whole realm of micro-voyeurism around it: the protagonist goes to a party, sometimes goes live ‘by accident,’ then goes home, showers, and it culminates in an implied hook-up. People reacted like: what the fuck is this? We've been hacked.

To make a long story short… we knew the moment that idea reaches company level there’s going to be a huge battle. It created a negotiation field. Negotiations went up until 4:30. We were debating with the operational team and the CEO about how intense the audio could be, which was hilarious. For me it was a beautiful example of micro decisions accumulating into a specific materialisation of a path, and of anticipating dilution and using it as part of the process. In the end it was weird and really cool. Our main character ended up in the actual live show and someone sitting next to him said: oh dude, you’re that guy. That loop was wonderful. We love introducing improvisation and noise in a technology based design process. Micro decisions. This was it.

What you’re describing is also a kind of negotiation with constraints. Or how an idea survives contact with an institution and its thresholds. That is a nice pass to a next question.

You also hold advisory roles, right? I suppose you see a lot of applications and projects. When you read proposals, can you feel the difference between more speculative ones and others that are reverse-engineered to fit a funding format? What are the small signs of each and how, in your experience, can novelty still emerge from highly constrained, format-heavy contexts?

With commissions, you definitely feel when an application is retrofitted, made to be stringent, but coming from a fuzzy starting point. There’s a big difference when people know what they want to try out. They have an idea of why it’s interesting and what the speculative moment is, and they invest time in it.

You can discern a pure maker from somebody who’s a huge artist and just hired someone to write about their work. It reads differently. It feels different. And I think there’s another thing: you go through phases as a maker. When you become successful, it goes from making to business. That’s good - you can buy a house - but it comes at a cost. Like music: the first album is often the best, then it becomes too professionalised and quality drops. You start removing people and thinking in roles rather than people. Structures of businesses become very determinate in what comes out. Organisation is a medium in a way. The most timid, boring stuff comes out of really established, well-functioning businesses; They don’t allow for speculative space. That’s the price of becoming successful.

I’m interested in being professional but not calcified. That’s tricky, because if you want to sell a big business you have to think in roles (art director, strategist..) and that’s weirdly inadequate for humans, especially for this new type of creativity that needs to develop.

That’s actually a nice closure; Creativity as something that resists being fully captured by organisational formats. Thank you very much for your time and our very interesting discussion.

Notes

Thomas Naselaris, Ryan J. Prenger, Kendrick N. Kay, Michael Oliver, and Jack L. Gallant, “Bayesian Reconstruction of Natural Images from Human Brain Activity,” Neuron 63, no. 6 (September 24, 2009): 902–915, https://doi.org/10.1016/j.neuron.2009.09.006 ↩