This post is broken down into two parts:

  1. Which AI productivity tools am I currently using (as an alignment researcher)?
  2. Why does it currently feel hard to spend +$1000/month on AI to increase one's productivity drastically?

Which AI productivity tools am I currently using?

Let's get right to it. Here's what I'm currently using and how much I am paying:

  • Superwhisper (or other new Speech-to-Text that leverage LLMs for rewriting) apps. Under $8.49 per month. You can use different STT models (different speed and accuracy for each) and LLM for rewriting the transcript based on a prompt you give the models. You can also have different "modes," meaning that you can have the model take your transcript and write code instructions in a pre-defined format when you are in an IDE, turn a transcript into a report when writing in Google Docs, etc. I tend to use it less when I'm in an open office with a lot of people around.
    • There is also an iOS app. I setup a Shortcut for it on my iPhone so I just need to double-tap the back of my phone, and it opens the app and starts recording.
  • Cursor Pro ($20-30/month). Switch to API credits when the slow responses take too long. More details on my workflow are below.
    • (You can try Zed (an IDE) too if you want. I've only used it a little bit, but Anthropic apparently uses it and there's an exclusive "fast-edit" feature with the Anthropic models.)
    • I connect Cursor to my Obsidian Vault and write using markdown files with AI assistance inside Cursor or directly in Obsidian. I also use Superwhisper to dump my thoughts into it and automatically format them in Markdown with a custom "mode."
  • Claude.ai Pro ($20/month). You could consider getting two accounts or a Team account to worry less about hitting the token limit.
    • One reason I’ll use the chat website itself is that they typically have a better system prompt than I can come up with. It’s common enough that I’ll try to get something working in Cursor and fail, but then I'll try in the chat app and get to the solution faster. I also use extensions that interface with the chat website, like https://glasp.co/youtube-summary. I like using Artifacts and Projects. I also don't want to overuse my Cursor Pro fast responses when I don't have to.
  • Chatgpt.com Pro account ($20/month). Again, can get a second account to have more o1-preview responses from the chat.
  • Aider (~$10/month max in API credits if used with Cursor Pro). AI coding assistant that runs in the terminal. I use this with Cursor and lean on the strengths I feel like they both have.
  • Google Colab Pro subscription ($9.99/month). You could get the Pro+ plan for $49.99/month.
  • Google One 2TB AI Premium plan ($20/month). This comes with Gemini chat and other AI features. I also sign up to get the latest features earlier, like Notebook LM and Illuminate.
  • jointakeoff.com ($22.99/month) for courses on using AI for development.
  • I still have GitHub Copilot (along with Cursor's Copilot++) because I bought a long-term subscription.
  • Grammarly ($12/month).
  • Reader by ElevenLabs (Free, for now). Best quality TTS app out there right now.
  • Bolt (~$9/month, though you can get started with the free version): This is a web chat app by @stackblitz that helps you quickly build full-stack web apps. Find it useful because it is highly specific to web dev. I've started to use this to get started quickly on a project and then port over the code to Cursor to build on top of it.

Other things I'm considering paying for:

  • Perplexity AI ($20/month). Like Google, but it uses more AI features for the search. I will often find myself using it over Google. The paid version uses a better AI model.
  • Other AI-focused courses that help me best use AI for productivity (web dev or coding in general).
  • Suno AI ($8/month). I might want to make music with it.

Apps others may be willing to pay for:

  • Warp, an LLM-enabled terminal ($20/month). I don't use the free version enough to upgrade to the paid version.
  • v0 chat ($20/month or free-tier). Used for creating Next.js websites quickly. As far as I can tell, does not handle backend like Bolt does, but may be more likely to use shadcn components.

My typical workflow for a new project is something like:

  1. Back and forth with Claude Chat to think through my research project. Include code from relevant high-quality codebases so that Claude has a better idea for what I'm aiming at. Papers too. Iterate a few times until satisfied.
  2. Brainstorm an MVP version of the project and ask (Chat) Sonnet to implement as well as it can.
  3. Use project plan instructions to initialize the project and create a prompts folder in the Cursor repo. The prompt contains .md files (files I can @ in the chat) for features of the codebase that came out of my interactions with Sonnet.
    1. I try to save prompt.md files of things I'll use quite often.
  4. Perhaps add some instructions for the .cursorrules file in the repository (this appends to every prompt of the LLMs used within this project).
  5. @ the instruction prompt(s) in Cursor's Composer or Aider and have it create the project files and structure based on the refined idea and initial code. Likely use o1-mini for this part.
  6. Iterate on the codebase with Sonnet for more precise improvements. Use o1-mini for more complex changes across the codebase.

I think spending time to provide good initial direction to your LLM is important and people should spend a bit of time really giving detailed instructions with examples to their LLM. Otherwise, your model will not focus give some generation that has dominated its pre-training.

A simple example of this is that if you prompt an LLM to build a website, it will often try to build a dumb html/css/js website unless you specifically say you want a more professional and modern website with details about the tech stack (next.js, supabase, etc).

Total spending

There are definitely ways to optimize my monthly payment to save a bit of cash, but I'm currently paying roughly $157/month.

That said, I am also utilizing research credits from Anthropic, which could range from $500 to $2000, depending on the month. In addition, I'm working on an "alignment research assistant" which will leverage LLMs, agents, API calls to various websites, and more. If successful, I could see this project absorbing hundreds of thousands in inference costs.

Why am I am spending more than most?

I am a technical AI alignment researcher who also works on augmenting alignment researchers and eventually automating more alignment research, so I'm biasing myself to overspend on products to make sure I'm aware of the bleeding-edge setup.

So, I'm certainly paying more than the average person when it comes to using AI for productivity. However, I can certainly imagine that I'm still paying less than I should in terms of AI software. This leads me to consider: "What should I spend considerably more on regarding AI software? Why isn't it easy to know this? If AI will increase productivity as much as I think it will, why hasn't it already?"

How could I spend way more on AI?

As AI becomes increasingly powerful and entrepreneurs/developers figure out how to make better user interfaces and interconnected systems with AI, we'll be getting massive jumps in our ability to leverage AI for boosting productivity.

Of course, people already see this with ChatGPT. However, I expect most people will underpay for AI tools.

Someone asked this question:

Suppose I wanted to spend much more on intelligence (~$1000/month), what should I spend it on?

This is a good question. I don't even know the obvious answer as someone who works in AI and even focuses on how to leverage these tools for safer development of AI. One reason for this is that most people have not given much thought about how to actually use intelligence and automation. Have you considered what you would do if you had three interns and an assistant? What if you had an intermediate-level software engineer?

Here's an insightful comment (slightly rewritten) by Gwern on the question, "If AI is so powerful, why hasn't it completely changed the world and increased GDP by several points yet?":

If you're struggling to find tasks for "artificial intelligence too cheap to meter," perhaps the real issue is identifying tasks for intelligence in general. Just because something is immensely useful doesn't mean you can immediately integrate it into your current routines; significant reorganization of your life and workflows may be necessary before any form of intelligence becomes beneficial.

There's an insightful post on this topic: The Great Data Integration Schlep. Many examples there illustrate that the problem isn't about AI versus employee or contractor; rather, organizations are often structured to resist improvements. Whether it's a data scientist or an AI attempting to access data, if an employee's career depends on that data remaining inaccessible, they may sabotage efforts to change. I refer to this phenomenon as "automation as a colonization wave": transformative technologies like steam power or the internet often take decades to have a massive impact because people are entrenched in local optima and may actively resist integrating the new paradigm. Sometimes, entirely new organizations must be built, and old ones phased out over time.

We have few "AI-shaped holes" of significant value because we've designed systems to mitigate the absence of AI. If there were organizations with natural LLM-shaped gaps that AI could fill to massively boost output, they would have been replaced long ago by ones adapted to human capabilities, since humans were the only option available. This explains why current LLM applications contribute minimally to GDP—they offer marginal improvements like better spellcheck or code generation, but don't usher in a new era of exponential economic growth.

One approach, if you're finding it hard to spend $1000/month effectively on AI, is to allocate that budget to natural intelligence instead—hire a remote worker, assistant, or intern. Such a person is a flexible, multimodal general intelligence capable of tool use and agency. By removing the variable of AI, you can focus on whether there are valuable tasks that an outsourced human could perform, which is analogous to the role an AI might play. If you can't find meaningful work for a hired human intelligence, it's unsurprising that you're struggling to identify compelling use cases for AI.

(If this concept is still unclear, try an experiment: act as your own remote worker. Send yourself emails with tasks, and respond as if you have amnesia, avoiding actions a remote worker couldn't perform, like directly editing files on your computer. Charge yourself an appropriate hourly rate, stopping once you reach a cumulative $1000.)

If you discover that you can't effectively utilize a hired human intelligence, this sheds light on your difficulties with AI. Conversely, if you do find valuable tasks, you now have a clear set of projects to explore with AI services.

Of course, this is beside the fact that we're still early, and we need a few more years to really see how powerful these AIs can become. I agree with Sam Altman (CEO of OpenAI) in his new blog post:

This may turn out to be the most consequential fact about all of history so far. It is possible that we will have superintelligence in a few thousand days (!); it may take longer, but I’m confident we’ll get there.

Leveraging AI for productivity presents a massive opportunity in the next few years. In fact, I expect there will be companies that essentially leverage AI automation internally in ways that the rest of the market doesn't (of course, I've considered doing this myself). These companies (like consultancies) will involve human-human interactions instead of interfacing with an AI but will charge a high premium for that interaction. Basically, their customers will compare the price to the rest of the market and find the price reasonable, but the rest of the market is still leveraging way too much human intelligence (HI) in comparison to artificial intelligence. It will take HI companies significantly longer to do the project and will be much more expensive.


Light spoiler for Pantheon (TV show) ahead!

There's a TV show called Pantheon, which covers the entire singularity where humans can upload themselves into the cloud. One interesting point in the plot is when one of the uploaded humans is told that they are still being held back by how they work in their human body, and that character has a really difficult time grasping what that means. They simply couldn't imagine acting in the world in any way that they did in their past life. It just wasn't part of their ontology, how they imagined the world.

Eventually, through enough effort, they figured out how to use their newly uploaded body in ways that allowed him to achieve an exponential increase in productivity per second.

I think we'll experience several of these shifts in the coming decades, and those who can act on them early may benefit greatly.

I'd be happy to hear what other people are using or have stopped using because they didn't get much value out of it!

New Comment
16 comments, sorted by Click to highlight new comments since:

How do you justify paying for services where you train their bot and agree not to compete with that which plays the imitation game where you are the “system under imitation?” They’re literally taking your mind patterns and making you dependent on them to think, and you’re paying for it.

Seems like a long run losing proposition to pay to teach for bots and become dependent upon external intelligence services that will imitate you and make you irrelevant. Can somebody list services that don’t train on inputs and don’t have customer noncompete clauses (directly or indirectly)? Pro-LLM crowd seems to crave a world where the only jobs available for natural humans are manual labor jobs. Am I wrong?

I know I’ll get downvoted for negativity but, “think for yourself!”

[-]jbash6-1

I could spend a lot more than $1000/month, because cloud services are a non-starter.

It seems to me that if you're going to use something like this to its real potential, it has to be integrated into your habitual ways of doing things. You have to use it all the time. It's too jarring to have to worry about whether it's trustworthy, or to "code switch" because you can't use it for some reason[1].

I can't imagine integrating any of those things into my normal, day to day routine unless the content of what I was doing were, in normal course, exposed only to me. Which in practice means locally hosted. Which would be prohibitively expensive even if it were possible.


  1. This is actually the same reason I rarely customize applications very much. It's too jarring when I get onto another machine and have to use the vanilla version. ↩︎

I can’t imagine integrating any of those things into my normal, day to day routine unless the content of what I was doing were, in normal course, exposed only to me.

I've had something like this issue. The places I most want to use LLMs are for work tasks like "refactor this terribleness to not be crap", or "find the part of this codebase that is responsible for X", or "fill out this pointless paperwork for me"; but I'm not going to upload my employer's data to an LLM provider. Also, if you're in tech, you might want to apply for a job at an AI company. If so, then anything you type into their LLM is potentially exposed to whoever is judging that application. Even if you're not doing anything questionable, you still have to spend attention on HR-proofing it.

(I'm sure privacy policies are a thing. Have you read them? I have not. I could fix that, but that is also an attention cost, and you have to trust that the policy will be honored when it matters)

The places where exposing things to the LLM provider is a non-issue (e.g. boilerplate), I mostly don't need help with and mostly do better than the LLM does.

(...for now)

I think my productivity at work would be most dramatically increased not by auto-completing my code (although that too would be nice) but rather by reading all the company Confluence pages and providing short summaries in plain language, connecting together information that is split into dozens of unconnected pieces, each of them written in a different place and often requiring different access rights. Maybe even more by reading all the existing code and configuration files, and updating the documentation with something that is actually true and can be interpreted unambiguously.

I just started a new job and I've been exporting Confluence pages to PDF and putting them in a Claude project so I can just ask Claude stuff.

That's a great idea... that would get me fired at my current job (security reasons). :D

I hope you have that automated, because you will probably want to refresh the exports in a few months, but even if you did it manually I believe the ability to get instant answers is worth it.

Yeah, I haven't got it automated yet. Someday I'll have the time.

Another place I did this was with the mountain of onboarding docs I got.  Now I can just ask Claude stuff like "how early do I have to request time off and who do I contact?" or "What's my dental insurance deductible?"

Sharing my setup too:

Personnaly I'm just self hosting a bunch of stuff:

  • litellm proxy, to connect to any llm provider
  • langfuse for observability
  • faster whisper server, the v3 turbo ctranslate2 versions takes 900mb of vram and are about 10 times faster than I speak
  • open-webui, as it's connected to litellm and ollama, i avoid provider lock in and keep all my messages on my backend instead of having some at openai, some at anthropic, etc. Additionally it supports artifacts and a bunch of other nice features. It also allows me to craft my perfecr prompts. And to jailbreak when needed.
  • piper for now for tts but plan on switching to a selfhosted fish audio.
  • for extra privacy I have a bunch of ollama models too. Mistral Nemo seems to be quite capable. Otherwise a few llama3, qwen2 etc.
  • for embeddings either bge-m3 or some self hosted jina.ai models.

I made a bunch of scripts to pipe my microphone / speaker / clipboard / llms together for productivity. For example I press 4 times on shift, speak, then shift again, and voila what I said was turned into an anki flashcard.

As providers, I mostly rely on openrouter.ai which allows to swap between providers without issue. These last few months I've been using sonnet 3.5 but change as soon as there's a new frontier model.

For interacting with codebases I use aider.

So at the end all my cost comes from API calls and none from subscriptions.

What makes grammarly pro worth it? I used the free version for a while, but it became so aggressive with unwanted corrections I couldn't even see the real suggestions, chrome caught up with the useful features, and on long essays it crippled my browser. 

Yeah, this has been my experience using Grammarly pro as well. 

I think it's up to you and how you write. English isn't my first language, so I've found it useful. I also don't accept like 50% of the suggestions. But yeah, looking at the plan now, I think I could get off the Pro plan and see if I'm okay not paying for it.

It's definitely not the thing I care about most on the list.

Personally, I only use the APIs on my computer. I have an Emacs setup based on gptel to bind sending different parts of buffers (either whole page/region or single line) to different models.

Use mostly Claude but sometimes it missbehaves and then I usually send it to 4o. I keep having Gemini in there too but struggle to ever use it. Likewise, I have haiku in there but that's mostly from the days of opus when I sometimes was happy enough with really quick responses compared to sluggish opus.

It's also important to keep different system prompts on different key combinations so that you can ask for a quick answer with just the command / code line you care about in response vs. well thought out answer that will require some text editing to get rid of the explanation. Come to think of it, I might have to write some post processors to only leave the code and throw out the CoT, which would sometimes work.

Emacs is always just one key combo away, whatever I'm doing, and it's always running as a daemon so bringing it up is instantaneous. I can't think of a more comfortable setup. I'm never using the web interfaces, it's a horrible user experience in comparison.

API is dirt cheap if you use it as I do (for single queries or short conversations). It only gets expensive once you really throw in a lot of stuff in the input, since input tokens are so much more expensive. For me, aider-style work on big code context where I expect the actual output to be ready to use, doesn't work well enough yet, and is frustrating. I will wait until better scaffolding or 5 level models for that.

This is really good, thanks so much for writing it! 

I've never heard of Whisper or Eleven labs until today, and I'm excited to try them out.

Thanks! I'm excited to go over the things I never heard of

 

So far,

  1. Elevenlabs app: great, obviously
  2. Bolt: I didn't like it
    1. I asked it to create a React Native app that prints my GPS coordinates to the screen (as a POC), it couldn't do it. I also asked for a podcast app (someone must and no one else will..), it did less well than Replit (though Replit used web). Anyway my main use case would be mobile apps (I don't have a reasonable solution for that yet) (btw I hardly have mobile development experience, so this is an extra interesting use case for me).
    2. It sounds like maybe you're missing templates to start from? I do think Bolt's templates have something cool about them, but I don't think
  3. Warp: I already use the free version and I like it very much. Great for things like "stop this docker container and also remove the volume"
  4. Speech to text: I use ChatGPT voice. My use case is "I'm riding my bike and I want to use the time to write a document", so we chat about it back and forth

 

Q:

5. How do you "Use o1-mini for more complex changes across the codebase"? (what tool knows your code and can query o1 about it?)

5.1. OMG, Is that what Cursor Composer is? I have got to try that

I just skimmed the jointakeoff course website and noticed it seems to be about using Cursor, a large component of your current workflow. Would you recommend it as a starting point?

There are multiple courses, though it's fairly new. They have one on full-stack development (while using Cursor and other things) and Replit Agents. I've been following it to learn fast web development, and I think it's a good starting point for getting an overview of building an actual product on a website you can eventually sell or get people to use.