All of MathiasKB's Comments + Replies

I'll crosspost the comment I left on substack:
 

In Denmark the government has a service (ROFUS), which anyone can voluntarily sign up for to exclude themselves from all gambling providers operating in Denmark. You can exclude yourself for a limited duration or permanently. The decision cannot be revoked.

Before discussing whether gambling should be legal or illegal, I would encourage Americans to see how far they can get with similar initiatives first.

This exists in the US as well. Because legalized gambling is regulated by the state governments the self exclusion programs are also run by them. Here is one for the state of Massachusetts https://massgaming.com/about/voluntary-self-exclusion/

-2Shankar Sivarajan
Without looking it up, I'd bet there are plenty of people who get added to this list by mistake, and can't get themselves removed, like the people who got put on the US's no-fly list, or get declared dead.

A similar service exists in the UK - https://www.gamstop.co.uk/

I don't know if "don't even discuss other methods until you've tried this first" seems right to me, but I do think such services seem pretty great, and would guess that expanding/building on them (including e.g. requiring that any gambling advertising included an ad for them) would be a lot more tractable than pursuing harder bans.

What actually works is clearly the most important thing here, but aesthetically I do like the mechanism of "give people the ability to irreversibly self exclude" as a response to predatory/addictive systems.

Is there any good write up on the gut/brain connection and the effect fecal transplants?

Watching the South Park episode where everyone tries to steal Tom Brady's poo got me wondering why this isn't actually a thing. I can imagine lots of possible explanations, ranging from "because it doesn't have much of an effect if you're healthy" to "because FDA".

On this view, adversarial examples arise from gradient descent being "too smart", not "too dumb": the program is fine; if the test suite didn't imply the behavior we wanted, that's our problem.

 

Shouldn't we expect to see RL models trained purely on self play not to have these issues then?

My understanding is that even models trained primarily with self play, such as katago, are vulnurable to adversarial attacks. If RL models are vulnurable to the same type of adversarial attacks, isn't that evidence against this theory?

The amount of inference compute isn't baked-in at pretraining time, so there is no tradeoff.

This doesn't make sense to me.

In a subscription based model, for example, companies would want to provide users the strongest completions for the least amount of compute.

If they estimate customers in total will use 1 quadrillion tokens before the release of their next model, they have to decide how much of the compute they are going to be dedicating to training versus inference. As one changes the parameters (subscription price, anticipated users, fixed costs for a ... (read more)

5Vladimir_Nesov
There are open weights Llama 3 models, using them doesn't involve paying for pretraining. The compute used in frontier models is determined by the size of the largest cluster with the latest AI accelerators that hyperscaler money can buy, subject to the time it takes the engineers to get used to the next level of scale, not by any tradeoff with cost of inference. Currently that's about 100K H100s. This is the sense in which there is no tradeoff. If somehow each model needed to be pretrained for a specific inference setup with specific inference costs and for it alone, then there could've been a tradeoff, but there is no such correspondence. The same model that's used in a complicated costly inference heavy technique can also be used for the cheapest inference its number of active parameters allows. If progress slows down in a few years and it becomes technologically feasible to do pretraining runs that cost over $50bn, it will make sense to consider the shape of the resulting equilibrium and the largest scale of pretraining it endorses, but that's a very different world.

Thanks!! this is exactly what I was looking for

With the release of openAI o1, I want to ask a question I've been wondering about for a few months.

Like the chinchilla paper, which estimated the optimal ratio of data to compute, are there any similar estimates for the optimal ratio of compute to spend on inference vs training?

In the release they show this chart:

The chart somewhat gets at what I want to know, but doesn't answer it completely. How much additional inference compute would I need a 1e25 o1-like model to perform as well as a one shotted 1e26?

Additionally, for some x number of queries, what is ... (read more)

3Vladimir_Nesov
The amount of inference compute isn't baked-in at pretraining time, so there is no tradeoff. You train the strongest model, then offer different ways of doing inference with it. Expensive inference probably wasn't offered before OpenAI o1 because it didn't work well enough to expect even a minimal viable number of customers who are willing to pay the inference premium. Many inference setups have significant fixed costs, you need sufficient demand for price per request to settle. The plots show scaling across 2 orders of magnitude with no diminishing returns. Train-time compute is likely post-training, so it might still be much cheaper than pretraining, feasible to scale further if it doesn't crucially depend on the amount of human labeling. Test-time compute on one trace comes with a recommendation to cap reasoning tokens at 25K, so there might be 1-2 orders of magnitude more there with better context lengths. They are still not offering repeated sampling filtered by consensus or a reward model. If o1 proves sufficiently popular given its price, they might offer even more expensive options.
5quetzal_rainbow
This, for example

If someone wants to set up a figgy group to play, I'd love to join

I agree the conclusion isn't great!

Not so surprisingly, many people read the last section as an endorsement of some version of "RCTism", but it's not actually a view I endorse myself.

What I really wanted to get at in this post was just how pervasive priors are, and how difficult it is to see past them.

Just played through it tonight. This was my first D&D.Sci, found it quite difficult and learned a a few things while working on it.

Initially I tried to figure out the best counters and found a few patterns (flamethrowers were especially good against certain units). I then tried to look and adjust for any chronology, but after tinkering around for a while without getting anywhere I gave up on that. Eventually I just went with a pretty brainless ML approach.

I ended up sending squads for 5 and 6 which managed a 13.89% and 53.15% chance of surviving, I think it's good I'm not in charge of any soldiers in real life!

Overall I had good fun, and I'm looking forward to looking at the next one.

This wouldn't be the first time Deepmind pulled these shenanigans.

My impression of Deepmind is they like playing up the impressiveness of their achievements to give an impression of having 'solved' some issue, never saying anything technically false, while suspiciously leaving out relevant information and failing to do obvious tests of their models which would reveal a less impressive achievement.

For Alphastar they claimed 'grandmaster' level, but didn't show any easily available stats which would make it possible to verify.  As someone who was in Gra... (read more)

DeepMind's no-search chess engine is surely the furthest anyone has gotten without search.

This is quite possibly not true! The cutting-edge Lc0 networks (BT3/BT4, T3) have much stronger policy and value than the AlphaZero networks, and the Lc0 team fairly regularly make claims of "grandmaster" policy strength.

if it makes it easier, I can add the questions to manifold if you provide a list of questions and resolution criteria.

3DirectedEvolution
OK, I'm out of stamina at the moment to rewrite the question I submitted to metaculus but I'll send questions your way if/when I get them written down,

thanks for pointing that out, I've added a note in the description

There's countries where cooperative firms are doing fine. Most of Denmark's supermarket chains are owned by the cooperative coop. Denmark's largest dairy producer Arla is a cooperative too. Both operate in a free market and are out-competing privately owned competitors.

Both also resort to many of the same dirty tricks traditionally structured firms are pulling. Arla, for example, has done tremendous harm to the plant-based industry through aggressive lobbying. Structuring firms as cooperatives doesn't magically make them aligned.

6Oskar Mathiasen
Note that coop is a consumer cooperative not an employee cooperative. https://en.wikipedia.org/wiki/Consumers%27_co-operative   
4MikkW
Worth noting explicitly, the Danish government is in general much more hands-on in the economy than in most other countries. I don't know specifically how that manifests itself here, but I expect that's an important part of it

Cicero, as it is redirecting its entire fleet: 'What did you call me?'

Yeah, my original claim is wrong. It's clear that KataGo is just playing sub-optimally outside of distribution, rather than punished for playing optimally under a different ruleset than its being evaluated.

Actually this modification shouldn't matter. After looking into the definition of pass-alive, the dead stones in the adversarial attacks are clearly not pass-alive.

Under both unmodified and pass-alive modified tromp-taylor rules, KataGo would lose here and its surprising that self-play left such a weakness.

The authors are definitely onto something, and my original claim that the attack only works due to kataGo being trained under a different rule-set is incorrect.

3gjm
It doesn't matter whether the dead stones are pass-alive. It matters whether the white stones surrounding the territory they're in are pass-alive. Having said that, in e.g. the first example position shown on the attackers' webpage those white stones are not pass-alive, so the situation isn't quite "this is a position in which KG would have won under its training conditions". But it is a position that superficially looks like such a position, which I think is relevant since what's going on with this attack is that they've found positions where KataGo's "snap judgement", when it gets little or no searching, gets it wrong.

No, the KataGo paper explicitly states at the start of page 4:

"Self play games used Tromp-Taylor rules [21] modified to not require capturing stones within pass-aliveterritory"

Had KataGo been trained on unmodified Tromp-Taylor rules, the attack would not have worked. The attack only works because the authors are having KataGo play under a different ruleset than it was trained on.

If I have the details right, I am honestly very confused about what the authors are trying to prove with this paper. Given their Twitter announcement claimed that the rulesets were... (read more)

Actually this modification shouldn't matter. After looking into the definition of pass-alive, the dead stones in the adversarial attacks are clearly not pass-alive.

Under both unmodified and pass-alive modified tromp-taylor rules, KataGo would lose here and its surprising that self-play left such a weakness.

The authors are definitely onto something, and my original claim that the attack only works due to kataGo being trained under a different rule-set is incorrect.

As someone who plays a lot of go, this result looks very suspicious to me. To me it looks like the primary reason this attack works is due to an artifact of the automatic scoring system used in the attack. I don't think this attack would be replicable in other games, or even KataGo trained on a correct implementation.

In the example included on the website, KataGo (White) is passing because it correctly identifies the adversary's (Black) stones as dead meaning the entire outside would be its territory. Playing any move in KataGo's position would gain no poi... (read more)

6evhub
Note that when given additional search, KataGo realizes that it will lose here and doesn't fall for the attack, which seems to suggest that it's not just a rules discrepancy.
1Anonymous
The KataGo paper says of its training, "Self-play games used Tromp-Taylor rules modified to not require capturing stones within pass-alive territory". It sounds to me like this is the same scoring system as used in the adversarial attack paper, but I don't know enough about Go to be sure.

Evaluating the RCT is a chance to train the evaluation-muscle in a well-defined domain with feedback. I've generally found that the people who are best at evaluations in RCT'able domains, are better at evaluating the hard-to-evaluate claims as well.

Often the difficult to evaluate domains have ways of getting feedback, but if you're not in the habit of looking for it, you're less likely to find the creative ways to get data.

I think a much more common failure mode within this community, is that we get way overconfident beliefs about hard-to-evaluate domains, because there aren't many feedback loops and we aren't in the habit of looking for them.

3tailcalled
Sounds confounded by general cognitive ability.
2Rob Bensinger
Yep, and I don't advise people to ignore all RCTs. I thought about discussing your point when I wrote the OP (along with other advantages of having a community that contains some trivia-collecting), but decided against it, because I suspect EAs and rats tend to misunderstand the nature of this advantage. I suspect most "we need to spend more time on fast-empirical-feedback-loop stuff even if it looks very low-VOI" is rationalizing the mistake described in the OP, rather than actually being about developing this skill. In particular, if you're just trying to build skill (rather than replacing a hard question with a superficially related easy one), then I think it's often actively bad to build this skill in a domain that's related to the one you care about. EAs and rats IMO should spend more time collecting trivia about physics, botany, and the history of Poland (as opposed to EA topics), insofar as the goal is empiricism skill-building. You're less liable to trick yourself, then, into thinking that the new data points directly bear on the question you're not currently working on. Maybe? I think the rationality community is pretty good at reasoning, and I'm not sure I could predict the direction of their error here. With EAs, I have an easier time regularly spotting clear errors, and they seem to cluster in a similar direction (the one described in https://equilibriabook.com/toc). I agree that rationalists spend more time thinking about hard-to-evaluate domains, and that this makes some failures likelier (while making others less likely). But I also see rats doing lots of deep-dive reviews of random-seeming literatures (and disproportionately reading blogs like ACX that love doing those deep dives), exploring lots of weird and random empirical domains out of curiosity, etc. It's not clear to me what the optimal level of this is (for purposes of skill-building), or where the status quo falls relative to the optimum. (What percent of LW's AI-alignment-related posts

Does anyone know of any zero-trust investigations on nuclear risk done in the EA/Rationalist community? Open phil has funded nuclear work, so they probably have an analysis somewhere that concluded it is a serious risk to civilization, but I haven't ever looked into these analyses.

2cubefox
Like this?
7PeterMcCluskey
ALLFED has been doing research recently into nuclear winter. This seems to be their most relevant publication so far. I haven't read it yet.

For each tweet the post found arguing their point, I can find two arguing the opposite. Yes, in theory tweets are data points, but in practice the author just uses them to confirm his already held beliefs.

I don't think the real world is good enough either.

The fact that humans feel a strong sense of the tetris effect, suggest to me that the brain is constantly generating and training on synthetic data.

5Yitz
Aka dreams?

Another issue with greenwashing and safetywashing is that it gives people who earnestly care a false impression that they are meaningfully contributing.

Despite thousands of green initiatives, we're likely to blow way past the 1.5c mark because the far majority of those initiatives failed to address the core causes of climate change. Each plastic-straw ban and reusable diaper gives people an incorrect impression that they are doing something meaningful to improve the climate.

Similarly I worry that many people will convince themselves that they are doing som... (read more)

we should be very sceptical of interventions whose first-order effects aren't promising.

This seems reasonable, but I think this suspicion is currently applied too liberally. In general, it seems like second-order effects are often very large. For instance, some AI safety research is currently funded by a billionaire whose path to impact on AI safety was to start a cryptocurrency exchange. I've written about the general distaste for diffuse effects and how that might be damaging here; if you disagree I'd love to hear your response.

In general, I don't think ... (read more)

I’m worried about this too, especially since I think it’s surprisingly easy here (relative to most fields/goals) to accidentally make the situation even worse. For example, my sense is people often mistakenly conclude that working on capabilities will help with safety somehow, just because an org's leadership pays lip service to safety concerns—even if the org only spends a small fraction of its attention/resources on safety work, actively tries to advance SOTA, etc.

The primary question on my mind is something like this:

How much retraining is needed for Gato to learn a new task? Given a task, such as "Stack objects and compose a relevant poem" which combines skills it has already learned, yet is a fundamentally different task, does it quickly learn how to perform well at it?

If not, then it seems Deepmind 'merely' managed to get a single agent to do a bunch of tasks we were previously only able to do with multiple agents. If it is also quicker at learning new tasks in similar domains, than an agent trained solely to do it, then it seems like a big step towards general intelligence.

Hi Niplav, happy to hear you think that.

I just uploaded the pkl files that include the pandas dataframes for the metaculus questions and GPT's completions for the best performing prompt to github. Let me know if you need anything else :)

https://github.com/MperorM/gpt3-metaculus

1niplav
Thanks a lot!

I think wife rolls of the tongue uniquely well here due to 'wife' rhyming with 'life', creating the pun. Outside of that I don't buy it. In Denmark, wife-jokes are common despite wife being a two syllable word (kone) and husband-jokes are rare despite husband being a one syllable word (mand).

My model of why we see this has much more to do with gender norms and normalised misogyny than with catchiness of the words.

Good point, though I would prefer we name it Quality Adjusted Spouse Years :)

but it's such a good pun!

2aarongertler
On the Devil's Advocate side: "Wife" just rolls off the tongue in a way "husband" doesn't. That's why we have "wife guys" and "my wife!" jokes, but no memes that do much with the word "husband". (Sometimes we substitute the one-syllable word "man", as in "it's raining men" or "get you a man who can do both".) You could also parse "wife years" as "years of being a wife" from the female perspective, though of course this still fails to incorporate couples where no wife-identifying person is involved.  ...so it doesn't work well in a technical sense, but it remains very catchy.

+1, you could make it Quality Adjusted Wedded Years if you want to keep the acronym.

(That's what I thought it stood for when you (= Richard) first told me about it)

Fantastic to see this wonderful game be passed onto a new generation!

6ChristianKl
I personally do understand how political careers work in Berlin where I'm living, but I don't think you can easily transfer that to the US. In a political system where the local political party controls the list of candidates, it's indeed central to interact with the local party. In the US you frequently have situations where there are primaries that determine the candidates of a given party which produces different incentives. That dramatically reduces the political power of the actual political parties.  One example is that it was advantageous for Obama to end Dean's 50 state strategy that provided local funding all over the US because Obama didn't have direct control over the party. Obama campaign created their own campaign structures independent from the democratic party that could then be used more directly to mobilize for the interests of the Obama administration. 
1Asgård
Hey, thanks for the insight. The running in-line with a political party is a great point for anyone in America. The successes of third-party candidates are rare enough, that the rational first step to take is probably always joining one of the two parties.

My analysis was from no exercise to regular high intensity exercise. There's probably an 80/20 in between, but I did not look into it.

2Adam Zerner
Gotcha, thanks.

For what it's worth hastily made a spreadsheet and found that regular heavy exercise was by far the largest improvement I could make to my life expectancy. Everything else paled in comparison. That said I only evaluated interventions that were relevant to me. If you smoke, I imagine quitting would score high as well.

2Adam Zerner
Good to know, thanks! My understanding is that with exercise, going from nothing to something has a huge benefit, but after that the returns diminish pretty rapidly. I'm being very qualitative here, but maybe eg. going from something to solid exercise is decent, and then solid to intense is small. Does that match what you found?

For me, this perfectly hits the nail on the head.

This is a somewhat weird question, but like, how do I do that?

I've noticed multiple communities fall into the meta-trap, and even when members notice it can be difficult to escape. While the solution is simply to "stop being meta", that is much harder said than done.

When I noticed this happening in a community I am central in organizing I pushed back by bringing my own focus to output instead of process hoping others would follow suit. This has worked somewhat and we're definitely on a better track. I wonder what dynamics lead to this 'death by meta' syndrome, and if there is a cure.

1skybrian
When you're actually a little curious, you might start by using a search engine to find a decent answer to your question.  At least, if it's the sort of question for which that would work. Maybe even look for a book to read? But, maybe we should acknowledge that much of the time we aren't actually curious and are just engaging in conversation for enjoyment? In that case, cheering on others who make an effort to research things and linking to their work is probably the best you can do. Even if you're not actually curious, you can notice people who are, and you can look for content that's actually about concrete things. For example, my curiosity about the history of politics in Turkey is limited, so while I did read Scott Alexander's recent book review and some responses with interest, I'm not planning on reading an actual book on it. I don't think he's all that curious either, since he just read one book, but that's going further than me.

Really cool concept of drumming with your feet while playing another instrument.

I think it would be really cool to experiment with different trigger sounds. The muscles in your foot severely limits the nuances available to play, and trying to imitate the sounds of a regular drum-set will not go over well.

I think it is possible to achieve much cooler playing, if you skip the idea of your pedals needing to imitate a drum-set entirely. Experiment with some 808 bass, electric kicks, etc.

Combining that with your great piano playing would create an entirely new feel of music, whereas it can easily end up sounding like a good pianist struggling to cooperate with a much worse drummer

2jefftk
If you look at the video where I'm playing piano I'm using electronic drum sounds, though I still want to play around and figure out ones I like better. Here's what this is eventually going to fit with: https://www.jefftk.com/p/rhythm-stage-setup-v3

I spent 5 minutes searching amazon.de for replacements to the various products recommended and my search came up empty.

Is there someone who has put together the needed list of bright lighting products on amazon.de? I tried doing it myself and ended up hopelessly confused. What I'm asking for is eg. two desk lamps and corresponding light bulbs that live up to the criteria.

I'll pay $50 to the charity of your choice, if I make a purchase based off your list.

And there doesn’t need to be an “overall goodness” of the job that would be anything else than just the combination of those two facts.

There needs to be an "overall goodness" that is exactly equal to the combination of those two facts. I really like the fundamental insight of the post. It's important to recognize that your mind wants to push your perception of the "overall goodness" to the extremes, and that you shouldn't let it do that.

If you now had to make a decision on whether to take the job, how would you use this electrifying zap help you make the decision?

2Kaj_Sotala
My current feeling is that I'd probably take it. (The job example was fictional, as the actual cases where I've used this have been more personal in nature, but if I translate your question to those contexts then "I'd take it" is what I would say if I translated the answer back.)

I would strongly prefer a Lesswrong that is completely devoid of this.

Half the time it ends up in spiritual vaguery, of which there's already too much on Lesswrong. The other half ends up being toxic male-centric dating advice.

For those who, like me, have the attention span and intelligence of a door hinge the ELI5 edition is:

Outer alignment is trying to find a reward function that is aligned with our values (making it produce good stuff rather than paperclips)

Inner alignment is the act of ensuring our AI actually optimizes the reward function we specify.

An example of poor inner alignment would be us humans in the eyes of evolution. Instead of doing what evolution intended, we use contraceptives so we can have sex without procreation. If evolution had gotten its inner alignment right, we would care as much about spreading our genes as evolution does!

GPT-3's goal is to accurately predict a text sequence. Whether GPT-3 is capable of reason, or whether we can get it to explicitly reason is two different questions.

If I had you read Randall Munroe's book "what if" but tore out one page and asked you to predict what will be written as the answer, there's a few good strategies that come to mind.

One strategy would be to pick random verbs and nouns from previous questions and hope some of them will be relevant for this question as well. This strategy will certainly do better than if yo... (read more)

I don't get the divestment argument, please help me understand why I'm wrong.

Here's how I understand it:

If Bob offers to pay Alice whatever Evil-Corp™ would have paid in stock dividends in exchange for what Alice would have paid for an Evil-Corp™ stock, Evil-Corp™ has to find another buyer. Since Alice was the buyer willing to pay the most, Evil-Corp™ now loses the difference between what Alice was willing to pay and the next-most willing buyer, Eve, is willing to pay.

Is that understanding correct, or am I missing... (read more)

So I think the divestment argument that Buck is making is the following:

Assume there are 25 investors, from Alice to Ysabel. Each investor is risk-averse, and so is willing to give up a bit of expected value in exchange for reduced variance, and the more anticorrelated their holdings, the less variance they'll have. This means Alice is willing to pay more for her first share of EvilCorp stock than she is for her second share, and so on; suppose EvilCorp has 100 shares, and the equilibrium is that each investor has 4 shares.

Suppose now Alice decides th... (read more)

As Benjamin Graham put it:

in the short run, the market is a voting machine; in the long run, the market a weighing machine.

I think that's a very fair way to put it, yes. One way this becomes very apparent, is that you can have a conversation with a starcraft player while he's playing. It will be clear the player is not paying you his full attention at particularly demanding moments, however.

Novel strategies are thought up inbetween games and refined through dozens of practice games. In the end you have a mental decision tree of how to respond to most situations that could arise. Without having played much chess, I imagine this is how people do chess openers do as wel... (read more)

I think the abstract question of how to cognitively manage a "large action space" and "fog of war" is central here.

In some sense StarCraft could be seen as turn based, with each turn lasting for 1 microsecond, but this framing makes the action space of a beginning-to-end game *enormous*. Maybe not so enormous that a bigger data center couldn't fix it? In some sense, brute force can eventually solve ANY problem tractable to a known "vaguely O(N*log(N))" algorithm.

BUT facing "a limit that forces meta-cognition"... (read more)

Before doing the whole EA thing, I played starcraft semi-professionally. I was consistently ranked grandmaster primarily making money from coaching players of all skill levels. I also co-authored a ML paper on starcraft II win prediction.

TL;DR: Alphastar shows us what it will look like when humans are beaten in completely fair fight.

I feel fundamentally confused about a lot of the discussion surrounding alphastar. The entire APM debate feels completely misguided to me and seems to be born out of fundamental misunderstandings of what it means to be good at ... (read more)

6spkoc
I think you're right when it comes to SC2, but that doesn't really matter for DeepMind's ultimate goal with AlphaStar: to show an AI that can learn anything a human can learn. In a sense AlphaStar just proves that SC2 is not balanced for superhuman ( https://news.ycombinator.com/item?id=19038607 ) micro. Big stalker army shouldn't beat big Immortal army. In current SC2 it obviously can with good enough micro. There are probably all sorts of other situations where soft-scissor beats soft-rock with good enough micro. Does this make AlphaStar's SC2 performance illegitimate? Not really? Tho in the specific Stalker-Immortal fight, input through an actual robot looking at an actual screen and having to cycle through control groups to check HP and select units PROBABLY would not have been able to achieve that level of micro. The deeper problem is that this isn't DeepMind's goal. It just means that SC2 is a cognitively simpler game than initially thought(note, not easy, simple as in a lot of the strategy employed by humans is unnecessary with sufficient athletic skill). The higher goal of AlphaStar is to prove that an AI can be trained from nothing to learn the rules of the game and then behave in a human-like, long term fashion. Scout the opponent, react to their strategy with your own strategy etc. Simply bulldozing the opponent with superior micro and not even worrying about their counterplay(since there is no counterplay) is not particularly smart. It's certainly still SC2, it just reveals the fact that SC2 is a much simpler game(when you have superhuman micro).
5[anonymous]
Interesting point. Would it be fair to say that, in a tournament match, a human pro player is behaving much more like a reinforcement learning agent than a general intelligence using System 2? In other words, the human player is also just executing reflexes he has gained through experience, and not coming up with ingenious novel strategies in the middle of a game. I guess it was unreasonable to complain about the lack of inductive reasoning and game-theoretic thinking in AlphaStar from the beginning since DeepMind is a RL company, and RL agents just don't do that sort of stuff. But I think it's fair to say that AlphaStar's victory was much less satisfying than AlphaZero, being not only unable to generalize across multiple RTS games, but also unable to explore the strategy space of a single game (hence the incentivizing of use of certain units during training). I think we all expected seeing perfect game sense and situation-dependent strategy choice, but instead blink stalkers is the one build to rule them all, apparently.

I think your feelings stem from you considering it to be enough If AS simply beats human players while APM whiners would like AS to learn all the aspect of Starcraft skill it can reasonably be expected to learn.

The agents on ladder don't scout much and can't react accordingly. They don't tech switch midgame and some of them get utterly confused in ways a human wouldn't. Game 11 agent vs MaNa couldn't figure out it could build 1 phoenix to kill the warp prism and chose to follow it with 3 oracles (units which cant shoot at flying units). The ladder agents d

... (read more)

"Science confirms video games are good" is essentially the same statement as "The bible confirms video games are bad" just with the authority changed. Luckily there remains a closer link between the authroity "Science" and truth than the authority "The bible" and truth so it's still an improvement.

Most people still update their worldview based upon whatever their tribe as agreed upon as their central authority. I'm having a hard time critisising people for doing this, however. This is something we all do! ... (read more)

1Sunny from QAD
Oh yes, that's certainly true! My point is that anybody who has the floor can say that science has proven XYZ when it hasn't, and if their audience isn't scientifically literate then they won't be able to notice. That's why I lead with the Dark Ages example where priests got to interpret the bible however was convenient for them.

I really like this line of thinking. I don't think it is necessarily opposed to the typical map-territory model, however.

You could in theory explain all there is to know about the territory with a single map, however that map would become really dense and hard to decipher. Instead having multiple maps, one with altitude, another with temperature, is instrumentally useful for best understanding the territory.

We cannot comprehend the entire territory at once, so it's instrumentally useful to view the world through different lenses and see what new ... (read more)

2Shmi
Not in terms of other maps, but in terms of its predictive power: Something is more useful if it allows you to more accurately predict future observations. The observations themselves, of course, go through many layers of processing before we get a chance to compare them with the model in question. I warmly recommend the relevant SSC blog posts: https://slatestarcodex.com/2017/09/05/book-review-surfing-uncertainty/ https://slatestarcodex.com/2017/09/06/predictive-processing-and-perceptual-control/ https://slatestarcodex.com/2017/09/12/toward-a-predictive-theory-of-depression/ https://slatestarcodex.com/2019/03/20/translating-predictive-coding-into-perceptual-control/

Believing the notion that one can 'deconfuse' themself on any topic, is an archetypal mistake of the rationalist. Only in the spirit of all things that are essential to our personal understanding, can we expect our beliefs to conform to the normality of our existence. Asserting that one can know anything certain of the physical world is, by its definition, a foolhardy pursuit only someone with a narrow and immature understanding of physicality would consider meaningful.

Believing that any 'technique' could be used to train ones mind in t... (read more)

3Richard_Kennaway
Is that you, GPT2?