User Comment Replies

Recent AI model progress feels mostly like bullshit

Where does prompt optimization fit in to y’all’s workflows? I’m surprised not to see mention of it here. E.g OPRO https://arxiv.org/pdf/2309.03409 ?

groblegark's Shortform

groblegark2mo10

slightly related https://arxiv.org/abs/2503.00735

groblegark's Shortform

groblegark2mo218

Re: "Let's think step by step"
so let me get this straight.. a simple prompt is able to elicit an entire style of thinking, which is able to solve harder problems, and ultimately ends up motivating new classes of foundation model? Is that what happened last year? Are there any other simple prompts like that? Did we check? Sorry I'm trying to catch up.

8Vladimir_Nesov2mo

The s1 paper introduces a trick of replacing the end-of-thinking token with string "Wait", which enables continuing to generate a reasoning trace that is as long as you need even when the model itself can't control this well ("budget forcing", see Figure 3 in section 3.1).

3Seth Herd2mo

I just bumped across "atomic thinking", which asks the model to break the problem into co.ponent parts, attack each separately, and only produce an answer after that's done and they can all be brought together. This is how smart humans attack some problems, and it's notably different from chain of thought. I expect this approach could also be used to train models, by training on componenet problems. If other techniques don't keep progressing so fast as to make it irrelevant.

3mattmacdermott2mo

More or less, yes. But I don't think it suggests there might be other prompts around that unlock similar improvements -- chain-of-thought works because it allows the model to spend more serial compute on a problem, rather than because of something really important about the words.

groblegark's Shortform

groblegark3mo20

I think part of the important part is building your own (company's) collection of examples to train against, since the foundation models are trained against swebench already. And if it works the advantage would be on my CV in the worst case but in equity appreciation in the best case. So, just like any skill, right?

You're right that the whole thing only works if the business can generate returns to high quality code, and can write specifications faster than its complement of engineers can implement them. But I've been in that position sev... (read more)

groblegark's Shortform

groblegark3mo10

groblegark's Shortform

groblegark3mo10

mm.. I gave the wrong impression there; my actual boss doesn't have a huge opinion on AI; in fact he'll take some convincing.

I should state my assumptions:

software engineering will be completely automated in the next 3 years
in the beginning and maybe for a while, it will require advanced models and workflows
the workflows will be different enough between companies that it's worthwhile to employ some well paid engineers at each company to maintain them.
these engineers will have a much easier time finding a well paying job than 'regular' software engineers
whi

... (read more)

1daijin3mo

Nice. So something like grabbing a copy of swebench dataset, writing a pipeline that would solve those issues, then putting that on your CV? I will say though that your value as an employee is not 'producing software' so much as solving business problems. How much conviction do you have that producing software marginally faster using AI will improve your value to your firm?

1groblegark3mo

The reasons you give btw don't give me much consolation. The code leaking thing is very temporary; if you could host cutting edge models on AWS or Azure it wouldn't be an issue for most companies. If you could self host them it wouldn't be an issue for almost /any/ companies. The errors thing is a crux. The basic solution to that, I think, is scaling: multishot the problem, rank the solutions, test in every way imaginable, and then for each solved problem optimize your prompts till they can one-shot, keeping a backlog of examples to perform workflow regression testing against. The style thing is very tractable, AIs love following style instructions. The big moment for me was realizing that while each AI's context window is limited, within that window you can ask LOTS of different questions and expect a pretty good answer. So you ask questions that compress the information in the window for the purpose of your problem (llm's are pretty darn good at summarizing), and keep doing that until you have enough context to solve the problem.

groblegark's Shortform

groblegark3mo20

i dont have time to write any of this down so it's going to come out in the wrong order but here

agentic AI is the means of production for codegen
model access limits and closedness are therefore a threat to Workers
I use and maintain software. I survive by staying 5 feet in front of the steamroller
I am not wealthy, I can't afford to be tripped and squished.
OSS is traditionally the way of protecting myself in this situation
I need to write tons of good code and enable my company to do the same, and I need to do it while washing the dishes (Covid h

... (read more)

1daijin3mo

'If some 3rd party brings that bird home to my boss instead of me, I'm going to be unwealthy and unemployed.' Have you talked to your boss about this? I have, for me the answer was some combination of "Oh but using AI would leak our code" "AI is a net loss to productivity because it errors too much / has context length limitations / doesn't care for our standards" And that is not solvable by a third party, so my job is safe. What about you?

LESSWRONG
LW

All of groblegark's Comments + Replies