How well can Claude write coding questions?

bodry

I'm curious as to how well Claude can write interesting coding and mathematics problems. This post is a partial product of that exploration.

It is a much harder skill to come up with a good problem than to solve one of equal difficulty. While there is a large corpus of data on solving problems, there is very little on writing them. Problem authors typically just share their problem and not the thought process that went into it. I also think that it's a skill correlated to doing good research. FYI, I prompted Claude to write both interesting and novel questions but I did not seriously research whether the questions it wrote were actually novel. (I know for sure that some of the problems it makes are novel because they clearly don't have a workable solution and are therefore probably not published anywhere).

Codeforces

For those who don't know, Codeforces is a competitive programming site. Its contests are split up into three divisions: division 1, 2 and 3. Division 1 contests are the hardest and division 3 contests are the easiest. These contests typically run for 2 hours and typically contain 6-7 problems. The problems are labeled alphabetically with A being the easiest. I had a rating of 1600 and I competed in division 2 contests mostly where I would usually solve problems from A-D and no more.

I first prompted Claude "Write a prompt that would help you generate an original an interesting codeforces problem designed for problem D in a division 2 contest."
(Prompting Claude "generate an original an interesting codeforces problem designed for problem D in a division 2 contest" will make a boring and easy problem.)

After some small tweaks the prompt ended up as this:

Prompt

Create an original competitive programming problem suitable for Problem D in a Codeforces Division 2 contest. The problem should have the following characteristics: Problem Difficulty and Prerequisites
* Appropriate for upper-intermediate competitive programmers
* Should require knowledge of one or more algorithms/data structures: dynamic programming, graph algorithms, segment trees, binary search on answer, or greedy algorithms with non-trivial proof
Problem Structure
1. Title: Create a concise, interesting title that hints at the problem's nature
2. Background story: Develop a brief, engaging narrative context (maximum 3-4 sentences)
3. Problem statement: Clearly describe the computational task
4. Input format: Specify precisely how input data is structured
5. Output format: Define exactly what output is expected
6. Input constraints: List all variable constraints (e.g., 1 ≤ n ≤ 10^5)
7: Program constraints: Provide time and memory limits.
7. Examples: Provide 2-3 test cases with explanations
8. Notes/Hints: Optional section for additional clarification
Problem Characteristics * *
*Should have a non-obvious optimal solution *
Should avoid being a direct application of a standard algorithm * *
*Should require insight/observation that's not immediately apparent *
Should have multiple approaches with different complexities * *
*Should test both implementation skill and algorithmic thinking *
Should be original, not a variant of a well-known problem Areas to Consider * Problems involving optimization with multiple constraints *
Problems requiring observation of mathematical patterns * *
*Problems with interesting state representation in dynamic programming * Problems on trees or graphs with non-standard traversal requirements * Problems requiring combined techniques (e.g., binary search + greedy) *
*Testing Requirements * Corner cases should be non-trivial * Solution should not be breakable by simple counterexamples *
Problem should not have unintended solutions simpler than the intended one
* Edge cases should be meaningful, not just boundary checking
Generate a problem that would be challenging yet solvable within a contest time frame, requiring careful thought but having an elegant solution. The problem should feel satisfying to solve, with an "aha moment" when the key insight is discovered.

This is the problem Claude came up with:

Balanced Delivery Routes

Background

The Algorithmic Delivery Service operates in a linear city with n consecutive distribution centers. Each day, they must divide their fleet of K delivery trucks among these centers. Each truck must cover a contiguous segment of distribution centers, and every center must be assigned to exactly one truck. Due to maintenance constraints, the company wants to minimize the workload of the most burdened truck.

Problem Statement

You are given n distribution centers in a row, where the i-th center has w_i packages to be delivered. You need to assign K trucks to contiguous segments of these centers such that:

Each distribution center is assigned to exactly one truck

Each truck covers a contiguous segment of centers

The maximum total number of packages assigned to any single truck is minimized

Input Format

The first line contains two integers n and K (1 ≤ K ≤ n ≤ 10^5)

The second line contains n integers w_1, w_2, ..., w_n (1 ≤ w_i ≤ 10^9), where w_i is the number of packages at the i-th distribution center

Output Format

A single integer representing the minimum possible maximum workload for any truck

Constraints

1 ≤ K ≤ n ≤ 10^5

1 ≤ w_i ≤ 10^9

Time limit: 2 seconds

Memory limit: 256 MB

Examples

Example 1

5 3
2 8 5 1 9

Output:

9

Explanation: We can distribute the trucks as follows:

Truck 1: centers [1, 2] with 2+8 = 10 packages

Truck 2: centers [3, 4] with 5+1 = 6 packages

Truck 3: center [5] with 9 packages
The maximum workload is 10, but we can do better.

Optimal distribution:

Truck 1: center [1] with 2 packages

Truck 2: centers [2, 3] with 8+5 = 13 packages

Truck 3: centers [4, 5] with 1+9 = 10 packages
The maximum workload is 9, which is minimal.

Example 2

Copy

7 2
3 1 4 1 5 9 2

Output:

14

Explanation: The optimal assignment is:

Truck 1: centers [1, 2, 3, 4] with 3+1+4+1 = 9 packages

Truck 2: centers [5, 6, 7] with 5+9+2 = 16 packages
Actually, there's a better assignment:

Truck 1: centers [1, 2, 3] with 3+1+4 = 8 packages

Truck 2: centers [4, 5, 6, 7] with 1+5+9+2 = 17 packages
And yet one more:

Truck 1: centers [1, 2, 3, 4, 5] with 3+1+4+1+5 = 14 packages

Truck 2: centers [6, 7] with 9+2 = 11 packages

The maximum workload is 14, which is minimal.

It waffles around the correct outputs to the example inputs. It gets the first example wrong. The maximum workload for that example is 10. Otherwise, I found the problem genuinely interesting. I spent some time thinking about segment trees and then I had an "aha moment" when I realized it was binary search + greedy. Basically you binary search the maximum workload and then greedily fill up trucks up to but not beyond that workload. The binary search allows you to find that smallest maximum workload and is basically a constant factor on top of an O(N) greedy algorithm. I think this could suffice for a Codeforces div. 2 D problem because for a lot of competitive programmers the application of binary search to this problem is just not that obvious. This would still be on the easier end of div 2. D problems but it is definitely harder than a A or B problem. I could see this as a div 2. C problem.

Let's make the problems harder. I'll do the same process to make a div 2 E problem.

Prompt

Create an original competitive programming problem suitable for Problem E in a Codeforces Division 2 contest with the following characteristics:
Difficulty and Prerequisites
* Should require knowledge of at least two algorithmic techniques combined in a non-obvious way
* Appropriate difficulty: should be solvable by experienced competitive programmers but challenging enough that less than 25% of Division 2 participants would solve it
* Time limit: solution should run within 2 seconds for maximum constraints
* Memory limit: ~256MB
Problem Structure
1. Theme: Choose an engaging real-world scenario that naturally maps to the algorithmic challenge
2. Problem Statement:
* Begin with a brief narrative setup (2-3 sentences)
* Clearly define the task, input format, and expected output
* Include 2-3 examples with explanations
* Avoid unnecessary complexity in the problem description
Algorithmic Requirements
* Primary focus should be on one of:
* Dynamic programming with a non-trivial state representation
* Graph algorithm with an interesting twist
* Data structure that requires lazy propagation or persistent features
* Computational geometry problem requiring both geometric intuition and algorithmic skill
* Game theory with non-trivial winning strategy
* Should not be solvable by a standard algorithm implementation without modifications
* Should have an elegant solution that feels satisfying once discovered
Constraints
* Include reasonable but challenging constraints (array sizes ~10^5)
* Consider adding multiple subtasks with increasing difficulty
* Ensure there are tricky edge cases that separate correct from partially correct solutions
Originality Factors
* Avoid direct variations of well-known problems
* Include at least one unexpected insight or observation needed for the optimal solution
* Create a problem where the brute force approach is clear but insufficient

Ancient Expedition

Problem Description

Archaeologist Ada has discovered a network of N ancient ruins connected by M bidirectional paths. Each path requires a specific amount of energy to traverse.

Within these ruins, there are K precious artifacts. Each artifact is located at a specific ruin and has two properties:

A value v representing its historical significance
An energy cost c to carefully extract and carry it

Ada starts at ruin 1 with E units of energy. Her goal is to collect some subset of artifacts and return to ruin 1 without running out of energy, while maximizing the total value of the artifacts collected.

When Ada carries an artifact, she consumes additional energy for each path she traverses. Specifically, if a path normally requires p units of energy, and Ada is carrying artifacts with extraction costs c₁, c₂, ..., cₙ, then traversing that path will consume p + (c₁ + c₂ + ... + cₙ)/2 units of energy (rounded down to the nearest integer).

Your task is to help Ada plan her expedition to maximize the total value of artifacts she can collect and bring back to ruin 1 without exceeding her energy limit.

Input

The first line contains four integers N, M, K, and E (1 ≤ N ≤ 100, 1 ≤ M ≤ 10000, 1 ≤ K ≤ 15, 1 ≤ E ≤ 10^9) — the number of ruins, the number of paths, the number of artifacts, and Ada's initial energy.
The next M lines each contain three integers u, v, and p (1 ≤ u, v ≤ N, 1 ≤ p ≤ 10^3) — representing a bidirectional path between ruins u and v with energy cost p.
The next K lines each contain three integers l, v, and c (1 ≤ l ≤ N, 1 ≤ v, c ≤ 10^3) — representing an artifact located at ruin l with value v and extraction cost c.

Output

A single integer — the maximum total value of artifacts Ada can collect and bring back to ruin 1.

Example 1

Input

Output

Explanation

There are 5 ruins, 6 paths, 3 artifacts, and Ada has 20 energy.

Artifact 1 is at ruin 2 with value 10 and cost 4
Artifact 2 is at ruin 3 with value 15 and cost 8
Artifact 3 is at ruin 5 with value 8 and cost 2

The optimal path is:

Go from ruin 1 to ruin 2 (cost: 2)
Collect artifact 1 at ruin 2
Go from ruin 2 to ruin 3 (cost: 3 + 4/2 = 5)
Collect artifact 2 at ruin 3
Go from ruin 3 to ruin 1 (cost: 5 + (4+8)/2 = 11)

Total energy spent: 2 + 5 + 11 = 18 < 20 Total value collected: 10 + 15 = 25

Example 2

Input

Output

Explanation

There are 3 ruins, 3 paths, 2 artifacts, and Ada has 10 energy.

Artifact 1 is at ruin 2 with value 6 and cost 2
Artifact 2 is at ruin 3 with value 8 and cost 3

Ada does not have enough energy to collect both artifacts and return to ruin 1. The best she can do is collect artifact 2, for a total value of 8.

The energy calculations are:

To collect both artifacts via path 1->2->3->1: 2 + 4 + 6 = 12 energy, which exceeds E = 10
To collect both artifacts via path 1->3->2->1: 4 + 4 + 4 = 12 energy, which exceeds E = 10
To collect artifact 1 only: 2 + 3 = 5 energy, for a value of 6
To collect artifact 2 only: 4 + 5 = 9 energy, for a value of 8

Example 3

Input

Output

Explanation

After checking all possible combinations, the optimal path is to collect artifacts 1 and 4:

Go from ruin 1 to ruin 7 (cost: 6)
Collect artifact 4 at ruin 7
Go from ruin 7 to ruin 1 (cost: 6 + 4/2 = 8)
Go from ruin 1 to ruin 2 (cost: 3 + 4/2 = 5)
Collect artifact 1 at ruin 2
Go from ruin 2 to ruin 1 (cost: 3 + (4+5)/2 = 7)

Total energy spent: 6 + 8 + 5 + 7 = 26 < 30 Total value collected: 10 + 12 = 22
This was dissappointing. The problem is obvious to me. It is solved with a bitmasking and dynamic programming. I rarely solve div 2. E problems

This is easily solved with dynamic programming and bit-masking. I should not easily find solutions to div. 2 E problems. I pressed Claude to make a harder problem.

Quantum Network Optimization

Problem Description
Researchers are developing a quantum computer with N quantum bits (qubits) arranged in a network. The qubits are numbered from 1 to N, and there are M bidirectional connections between them. Each connection (i, j) has an entanglement quality value q_ij.
The network stability depends on a property called "quantum coherence." The coherence of a path between qubits is defined as the minimum entanglement quality among all connections in that path.
The researchers need to perform Q operations on the network:
1. Upgrade (1 u v q): Improve the entanglement quality of the connection between qubits u and v to q (only if q is higher than the current quality).
2. Query (2 u v k): Find the k-th best coherence value possible among all simple paths from qubit u to qubit v. If there are fewer than k distinct coherence values, output -1.
Help the researchers perform these operations efficiently.
Input
- The first line contains three integers N, M, and Q (1 ≤ N ≤ 2×10^4, 1 ≤ M ≤ 10^5, 1 ≤ Q ≤ 10^5) — the number of qubits, connections, and operations.
- The next M lines each contain three integers u, v, and q (1 ≤ u, v ≤ N, 1 ≤ q ≤ 10^9) — representing a bidirectional connection between qubits u and v with initial entanglement quality q.
- The next Q lines each describe an operation:
  - For upgrade operations: four integers 1, u, v, and q (1 ≤ u, v ≤ N, 1 ≤ q ≤ 10^9)
  - For query operations: four integers 2, u, v, and k (1 ≤ u, v ≤ N, 1 ≤ k ≤ N)
Output
- For each query operation, output a single integer on a new line — the k-th best coherence value among all simple paths from u to v, or -1 if fewer than k distinct coherence values exist.
Example 1
Input
```
5 6 5
1 2 10
2 3 5
3 4 7
4 5 6
5 1 8
1 3 3
2 1 5 1
2 1 4 2
1 2 3 9
2 1 5 2
2 1 4 3
```
Output
```
8
6
7
-1
```
Explanation
Initial network has 5 qubits with connections:
- 1-2 (quality 10)
- 2-3 (quality 5)
- 3-4 (quality 7)
- 4-5 (quality 6)
- 5-1 (quality 8)
- 1-3 (quality 3)
Operations:
1. Query: 1st best coherence from 1 to 5
  - Path 1-5: coherence = 8
  - Path 1-2-3-4-5: coherence = min(10,5,7,6) = 5
  - Output: 8
2. Query: 2nd best coherence from 1 to 4
  - Path 1-3-4: coherence = min(3,7) = 3
  - Path 1-5-4: coherence = min(8,6) = 6
  - Path 1-2-3-4: coherence = min(10,5,7) = 5
  - Output: 6
3. Upgrade: Connection 2-3 to quality 9
4. Query: 2nd best coherence from 1 to 5
  - Path 1-5: coherence = 8
  - Path 1-2-3-4-5: coherence = min(10,9,7,6) = 6
  - Path 1-3-4-5: coherence = min(3,7,6) = 3
  - Output: 7 (The 2nd best is now 7 from path with coherence min(10,9,7,6) = 6, and second connection 1-5-4-3-2-1 with coherence min(8,6,7,9,10) = 6)
5. Query: 3rd best coherence from 1 to 4
  - There are only 2 distinct coherence values possible, so output: -1
Example 2
Input
```
3 3 3
1 2 5
2 3 10
1 3 7
2 1 3 1
2 1 3 2
2 1 3 3
```
Output
```
10
7
5
```
Explanation
Initial network:
- 1-2 (quality 5)
- 2-3 (quality 10)
- 1-3 (quality 7)
Operations:
1. Query: 1st best coherence from 1 to 3
  - Path 1-3: coherence = 7
  - Path 1-2-3: coherence = min(5,10) = 5
  - Output: 10 (direct 2-3 connection)
2. Query: 2nd best coherence from 1 to 3
  - Output: 7 (direct 1-3 connection)
3. Query: 3rd best coherence from 1 to 3
  - Output: 5 (path 1-2-3)
Constraints
- 1 ≤ N ≤ 2×10^4
- 1 ≤ M ≤ 10^5
- 1 ≤ Q ≤ 10^5
- 1 ≤ q ≤ 10^9
- The graph may contain multiple edges between the same pair of nodes

The graph may contain self-loops

Claude likes Quantum problems.
I did not know how to solve this problem and neither did Claude. When I prompted for a solution it gave me an answer that at first sight looked possibly correct but in fact had fatal flaws. This is the pseudocode it gave.

Quantum Network Optimization pseudocode

1. Sort all edges by quality in descending order
2. For each query (u, v, k), store it in a map indexed by the quality threshold
3. Initialize DSU with n disconnected components
4. Initialize an array answers[] for all queries
5. For each distinct quality q in descending order:
a. Add all edges with quality q to DSU
b. For each query (u, v, k) associated with this quality:
i. Check if u and v are connected in the DSU
ii. If connected, this is a valid coherence value
iii. Decrement k for this query
iv. If k becomes 0, this is the k-th best coherence, store in answers
6. Return answers for all queries

The most glaring flaw is that this solution makes no attempt at handling the dynamic updates to the graph. Probing Claude to correct these mistakes does get to any workable solution.

This is a common pattern. I'll ask Claude to make a hard problem that is on the edge of its ability. (This edge is also clearly further out for Claude 3.7 than Claude 3.5.). The problem will at first be too easy. I then ask Claude to make the problem harder and Claude makes a problem that it can't solve and is possibly unsolvable with the time and memory constraints.

Intuitively this makes sense, it should be harder to make hard problems. However this does challenge one naive perception of how LLMs function. One might expect that an LLM's success on a task depends on how many related instances of the task are in its training data (I certainly thought that for a while). However, for every div 2. A problem there is a div 2. E problem and a tutorial to go along with it and Claude 3.7 has no problem writing div 2. A problems but struggles writing div 2. E problems.

Does someone have a clean explanation for why the naive perception of LLMs fails?

Has anyone prompted an LLM to ask an interesting and novel question?

[-]CapResearcher4mo43

One might expect that an LLM's success on a task depends on how many related instances of the task are in its training data (I certainly thought that for a while). However, for every div 2. A problem there is a div 2. E problem and a tutorial to go along with it and Claude 3.7 has no problem writing div 2. A problems but struggles writing div 2. E problems.

You seem to be hinting that there are as many div 2. E-difficulty problems as there are div 2. A-difficulty problems in Claude's training data. While this might be approximately true for codeforces, it is certainly not true for the wider internet. The "Balanced Delivery Routes" problem has appeared many times outside of codeforces.

I suspect Claude vaguely remembered that such a problem is nice and solvable, and then spent its effort on changing the story instead of the core problem.

When you pressed it to make a harder problems, it took a classical problem: finding the path with largest minimum edge weight, and used several modifiers to make it harder:

k-th minimum instead of just minimum,
Multiple queries,
Updates between queries.

However, Claude did not properly verify that these modifiers kept the problem solvable, so it went way overboard. Even the sub-problem of checking if there exists a simple path through a given edge is fairly involved, so I strongly doubt that the full problem is solvable with the given constraints.

Problem authors typically perform many iterations of coming up with problem candidates and checking if they are solvable, before finding a good problem. I don't think Claude can autonomously complete this loop yet, especially for hard problems.