It is very plausible [...] that we value internal states on the model, and we also receive negative reinforcement for model-world inconsistencies [...], resulting in learned preference not to lose correspondence between model and world
Generally correct; we learn to value good models, because they are more useful than bad models. We want rewards, therefore we want to have good models, therefore we are interested in the world out there. (For a reductionist, there must be a mechanism explaining why and how we care about the world.)
Technically, sometimes the most correct model is not the most rewarded model. For example it may be better to believe a lie and be socially rewarded by members of my tribe who share the belief, than to have a true belief that gets me killed by them. There may be other situations, not necessarily social, where the perfect knowledge is out of reach, and a better approximation may be in the "valley of bad rationality".
it is unnecessary to define the values over real world (the alternatives work fine for e.g. finding imaginary cures for imaginary diseases which we make match real diseases) [...] there's precisely the bit of AI architecture that has to be avoided.
In other words, make an AI that only cares about what is inside the box, and it will not try to get out of the box.
That assumes that you will feed the AI all the necessary data, and verify that the data is correct and complete, because the AI will be just as happy with any kind of data. If you give an incorrect information to AI, the AI will not care about it, because it has no definition of "incorrect"; even in situations where AI is smarter than you and could have noticed an error that you didn't notice. In other words, you are responsible for giving AI the correct model, and the AI will not help you with this, because AI does not care about correctness of the model.
You put it backwards.... making AI that cares about truly real stuff as the prime drive is likely impossible and certainly we don't know how to do that nor need to. edit: i.e. You don't have to sit and work and work and work and find how to make some positronic mind not care about the real world. You get this by simply omitting some mission-impossible work. Specifying what you want, in some form, is unavoidable.
Regarding verification, you can have the AI search for code that predicts the input data the best, and then if you are falsifying the data the code will include a model of your falsifications.
I used to advocate trying to do good work on LW. Now I'm not sure, let me explain why.
It's certainly true that good work stays valuable no matter where you're doing it. Unfortunately, the standards of "good work" are largely defined by where you're doing it. If you're in academia, your work is good or bad by scientific standards. If you're on LW, your work is good or bad compared to other LW posts. Internalizing that standard may harm you if you're capable of more.
When you come to a place like Project Euler and solve some problems, or come to OpenStreetMap and upload some GPS tracks, or come to academia and publish a paper, that makes you a participant and you know exactly where you stand, relative to others. But LW is not a task-focused community and is unlikely to ever become one. LW evolved from the basic activity "let's comment on something Eliezer wrote". We inherited our standard of quality from that. As a result, when someone posts their work here, that doesn't necessarily help them improve.
For example, Yvain is a great contributor to LW and has the potential to be a star writer, but it seems to me that writing on LW doesn't test his limits, compared to trying new audiences. Likewise, my own work on decision theory math would've been held to a higher standard if the primary audience were mathematicians (though I hope to remedy that). Of course there have been many examples of seemingly good work posted to LW. Homestuck fandom also has a lot of nice-looking art, but it doesn't get fandoms of its own.
In conclusion, if you want to do important work, cross-post it if you must, but don't do it for LW exclusively. Big fish in a small pond always looks kinda sad.