Linguist
Whoah. OP is one of today's lucky 10.000 (ht XKCD). Let us introduce you to sign languages: natural languages evolved without a single sound. There are hundreds of these around the world, in daily use by many deaf communities and studied by academic researchers, many of them from these same communities or closely allied to them. Lovely convergence of ideas: these languages indeed involve ample use of the 3D affordances of the visual spatial modality. And they use these affordances in exactly the kind of flexible ways you would expect from a complex linguistic communication system culturally evolved in the visual-spatial modality. For instance, they use something linguists call buoys, where one sign is held with the non-dominant hand while the dominant hand produces a further sequence of signs (hard to do in speech!). They use complex ways of modifying spatial verbs to precisely indicate location in space. And they make ample use of indexical forms (like pointing gestures, except more grammaticalized) to achieve person reference. There is loads more, but we'd soon get into very technical issues, reflecting the technical and bodily complexity these linguistic systems have achieved, which is considered on a par with the most complex grammatical systems of spoken+gestured languages. In short, great question, and it happens to have an actual answer from which we can learn deep things about the nature of language and the degree to which it depends (or does not depend) on communicative modalities. Check out this work by Prof. Carol Padden and colleagues, for instance:
Padden, Carol & Meir, Irit & Aronoff, Mark & Sandler, Wendy. 2010. The grammar of space in two new sign languages. Sign Languages: A Cambridge Survey, 570–592. New York: Cambridge University Press.
Worth noting that the visual cortex already does project mental images externally using, for instance, the limbs. Human languages around the world make constant use of this, combining speech and other conventionalised modes of expression with depictions like manual gestures. The keyword here is iconicity, when the form of expressions does resemble their meaning (and this is why "our languages are symbolic" is only a very rough approximation of the truth; in actual fact, our languages are indexical, iconic and symbolic, and each of these offers its own constraints and affordances). There is a large literature in linguistics and cognitive science on the forms and functions of iconicity in human communication. And there is good evidence (from archaeology to comparative visual anthropology to linguistics) to think that human-intelligence level-language evolved exactly from such beginnings, featuring a combination of indexical, iconic, and symbolic signs.