All of groblegark's Comments + Replies

Where does prompt optimization fit in to y’all’s workflows?  I’m surprised not to see mention of it here.  E.g OPRO https://arxiv.org/pdf/2309.03409 ?  

slightly related https://arxiv.org/abs/2503.00735

Re: "Let's think step by step"
so let me get this straight.. a simple prompt is able to elicit an entire style of thinking, which is able to solve harder problems, and ultimately ends up motivating new classes of foundation model?  Is that what happened last year? Are there any other simple prompts like that?  Did we check?  Sorry I'm trying to catch up.

8Vladimir_Nesov
The s1 paper introduces a trick of replacing the end-of-thinking token with string "Wait", which enables continuing to generate a reasoning trace that is as long as you need even when the model itself can't control this well ("budget forcing", see Figure 3 in section 3.1).
3Seth Herd
I just bumped across "atomic thinking", which asks the model to break the problem into co.ponent parts, attack each separately, and only produce an answer after that's done and they can all be brought together. This is how smart humans attack some problems, and it's notably different from chain of thought. I expect this approach could also be used to train models, by training on componenet problems. If other techniques don't keep progressing so fast as to make it irrelevant.
3mattmacdermott
More or less, yes. But I don't think it suggests there might be other prompts around that unlock similar improvements -- chain-of-thought works because it allows the model to spend more serial compute on a problem, rather than because of something really important about the words.

I think part of the important part is building your own (company's) collection of examples to train against, since the foundation models are trained against swebench already.  And if it works the advantage would be on my CV in the worst case but in equity appreciation in the best case.  So, just like any skill, right?

You're right that the whole thing only works if the business can generate returns to high quality code, and can write specifications faster than its complement of engineers can implement them.  But I've been in that position sev... (read more)

The reasons you give btw don't give me much consolation.  The code leaking thing is very temporary; if you could host cutting edge models on AWS or Azure it wouldn't be an issue for most companies.  If you could self host them it wouldn't be an issue for almost /any/ companies.  The errors thing is a crux.  The basic solution to that, I think, is scaling: multishot the problem, rank the solutions, test in every way imaginable, and then for each solved problem optimize your prompts till they can one-shot, keeping a backlog of examples to... (read more)

mm.. I gave the wrong impression there; my actual boss doesn't have a huge opinion on AI; in fact he'll take some convincing.

I should state my assumptions:

  • software engineering will be completely automated in the next 3 years
  • in the beginning and maybe for a while, it will require advanced models and workflows
  • the workflows will be different enough between companies that it's worthwhile to employ some well paid engineers at each company to maintain them.
  • these engineers will have a much easier time finding a well paying job than 'regular' software engineers
  • whi
... (read more)
1daijin
Nice. So something like grabbing a copy of swebench dataset, writing a pipeline that would solve those issues, then putting that on your CV? I will say though that your value as an employee is not 'producing software' so much as solving business problems. How much conviction do you have that producing software marginally faster using AI will improve your value to your firm?
1groblegark
The reasons you give btw don't give me much consolation.  The code leaking thing is very temporary; if you could host cutting edge models on AWS or Azure it wouldn't be an issue for most companies.  If you could self host them it wouldn't be an issue for almost /any/ companies.  The errors thing is a crux.  The basic solution to that, I think, is scaling: multishot the problem, rank the solutions, test in every way imaginable, and then for each solved problem optimize your prompts till they can one-shot, keeping a backlog of examples to perform workflow regression testing against. The style thing is very tractable, AIs love following style instructions. The big moment  for me was realizing that while each AI's context window is limited, within that window you can ask LOTS of different questions and expect a pretty good answer.  So you ask questions that compress the information in the window for the purpose of your problem (llm's are pretty darn good at summarizing), and keep doing that until you have enough context to solve the problem.

i dont have time to write any of this down so it's going to come out in the wrong order but here
 

  • agentic AI is the means of production for codegen
  • model access limits and closedness are therefore a threat to Workers
  • I use and maintain software.  I survive by staying 5 feet in front of the steamroller
  • I am not wealthy, I can't afford to be tripped and squished.
  • OSS is traditionally the way of protecting myself in this situation
  • I need to write tons of good code and enable my company to do the same, and I need to do it while washing the dishes (Covid h
... (read more)
1daijin
'If some 3rd party brings that bird home to my boss instead of me, I'm going to be unwealthy and unemployed.' Have you talked to your boss about this? I have, for me the answer was some combination of "Oh but using AI would leak our code" "AI is a net loss to productivity because it errors too much / has context length limitations / doesn't care for our standards" And that is not solvable by a third party, so my job is safe. What about you?