it's not
I don't understand why "scheming AI exfiltrates itself" and "scheming AI hacks its datacenter" are considered to be rogue deployments. Wouldn't the unmonitored deployments in those cases be catastrophes themselves, caused by a monitored, not-yet-roguely-deployed AI?
Thanks for writing this! I have a few questions, to make sure I'm understanding the architecture correctly:
So, essentially, the above pulls a transformation of the current state, fed through an action-conditioned dynamics network, close to a transformation from the future state.
Is this transformation an image augmentation, like you mention regarding SimSiam? Is the transformation the same for both states? And is the dynamics network also trained in this step?
Is the representation network trained in the same operation that trains the state-value and action-value functions? If not, what is to stop the representation function from being extremely similar regardless of the actual state?
The third technique, the one where older episodes have fewer timesteps before using the state-value function to truncate the rewards, makes me wonder if there has been any research where the final policy "grades" the quality of previous actions. It seems like that could shed some light on why this sort of technique works.
This one may backfire....
If you read "Preventable disease kills thousand daily" five days in a row, why do I buy the newspaper on the 6th day?
You don't :) I write:
The problem with such a newspaper is that they would go out of business. After all, if the headline has been the same for the last month, even if it is the most important action item in the world, people will stop learning anything from it.
However, by bringing up that extreme example, I approach the question of whether it makes sense to move on from news stories just because they are no longer novel—after all, the problem has not gone away by the time you stop printing about it. It's not clear that repeating important information would create more political action on it, but such a strategy is worth pondering (in my opinion).
As far as the headline go printing the first headline in your post everyday would be highly misleading. You can argue that Flint's water is not clean but that doesn't change the fact that it's massively more clean then it was two decades ago. A newspaper who just reports "it's not clean" in the same way every year would do a massive disservice to it's readers.
When I found that article, I also found several newer articles about how Flint's water is clean now. The headline I chose was from 2019. I just chose Flint because it was the only event I could think of where I remembered seeing headlines like that.
If someone shared a bad article with me so I would contribute to their refund, I think I would not like them as much afterward :P
One way to keep people from sharing bad content is to display the proportion of previous viewers who paid to the author. This would be a useful way for readers to find good content, too. But the big problem I see is that, unless a reader is scrupulously honest, their payment decision is fairly arbitrary, which might lead them to refund every article (while expecting the same in return).
In my experience, I don't usually get to choose. I am ineffective and distractable when I am unmotivated, so the vast majority of worthwhile work occurs when I am motivated. Over time this has led me to "ride the tide" of motivation when it is present, and not to force it when it is not. For externally imposed work, pushing work off to deadlines has caused much less frustration than attempting to start early and work steadily. If you are not constrained by motivation, it seems like working slowly and steadily would be preferable for large projects, because it would not be as likely to burn you out. It would also be preferable for jobs, because employers and customers prefer steady, predictable progress. For small projects, and projects that benefit more from "inspiration", it is possible that short bursts would be preferable instead, because there is less risk of losing the "spark" before finishing.
I may have been unclear in my post, because I agree with a lot of your viewpoints.
If you want to pay your sportsballers a bajillion dollars that money has to come from somewhere. In general, the market has decided that ads are the optimal way of paying for things that people don't want to pay directly for.
I dedicated only about a sentence and a half to this, because I think it deserves a separate post, but I don't want to pay my sportsballers a bajillion dollars. I view the fact that people don't want to pay for entertainment as an indictment on entertainment. Without advertising, the entertainment industry would be much smaller, but it would still be able to present high-quality products. This could mean less expensive sports leagues that don't have 30 teams and don't pay every player millions of dollars; movies without expensive special effects, expensive actors, and expensive marketing budgets; and news that doesn't pay writers that readers wouldn't pay to read.
The problem I have with the businesses would get by fine without ads is that it is demonstrably untrue. If you look up the biggest companies you will already know their names, their logos, their slogans, any music or sounds they use, etc. You buy from these companies all the time. Advertising works. Today you can quantify the effect of advertising better than ever. Anything on the internet can be tracked, and can be A/B tested.
This was poorly phrased in my post. The specific businesses in existence, specifically the biggest players in the market, stand to lose a lot without advertising. Their brands are often the source of their profitability. But I don't believe businesses as a whole would suddenly become unprofitable. The biggest shoe companies would lose some cachet and market share, but it would still be profitable to sell shoes. And while lots of shareholders have an interest in dominant brands staying dominant, I do not view it as a net loss to the economy for some companies to lose market share while others take their place.
doesn't add up to 100%—typo?