One might expect that an LLM's success on a task depends on how many related instances of the task are in its training data (I certainly thought that for a while). However, for every div 2. A problem there is a div 2. E problem and a tutorial to go along with it and Claude 3.7 has no problem writing div 2. A problems but struggles writing div 2. E problems.
You seem to be hinting that there are as many div 2. E-difficulty problems as there are div 2. A-difficulty problems in Claude's training data. While this might be approximately true for codeforces, it is certainly not true for the wider internet. The "Balanced Delivery Routes" problem has appeared many times outside of codeforces.
I suspect Claude vaguely remembered that such a problem is nice and solvable, and then spent its effort on changing the story instead of the core problem.
When you pressed it to make a harder problems, it took a classical problem: finding the path with largest minimum edge weight, and used several modifiers to make it harder:
However, Claude did not properly verify that these modifiers kept the problem solvable, so it went way overboard. Even the sub-problem of checking if there exists a simple path through a given edge is fairly involved, so I strongly doubt that the full problem is solvable with the given constraints.
Problem authors typically perform many iterations of coming up with problem candidates and checking if they are solvable, before finding a good problem. I don't think Claude can autonomously complete this loop yet, especially for hard problems.
I'm curious as to how well Claude can write interesting coding and mathematics problems. This post is a partial product of that exploration.
It is a much harder skill to come up with a good problem than to solve one of equal difficulty. While there is a large corpus of data on solving problems, there is very little on writing them. Problem authors typically just share their problem and not the thought process that went into it. I also think that it's a skill correlated to doing good research. FYI, I prompted Claude to write both interesting and novel questions but I did not seriously research whether the questions it wrote were actually novel. (I know for sure that some of the problems it makes are novel because they clearly don't have a workable solution and are therefore probably not published anywhere).
Codeforces
For those who don't know, Codeforces is a competitive programming site. Its contests are split up into three divisions: division 1, 2 and 3. Division 1 contests are the hardest and division 3 contests are the easiest. These contests typically run for 2 hours and typically contain 6-7 problems. The problems are labeled alphabetically with A being the easiest. I had a rating of 1600 and I competed in division 2 contests mostly where I would usually solve problems from A-D and no more.
I first prompted Claude "Write a prompt that would help you generate an original an interesting codeforces problem designed for problem D in a division 2 contest."
(Prompting Claude "generate an original an interesting codeforces problem designed for problem D in a division 2 contest" will make a boring and easy problem.)
After some small tweaks the prompt ended up as this:
Prompt
Create an original competitive programming problem suitable for Problem D in a Codeforces Division 2 contest. The problem should have the following characteristics: Problem Difficulty and Prerequisites
* Appropriate for upper-intermediate competitive programmers
* Should require knowledge of one or more algorithms/data structures: dynamic programming, graph algorithms, segment trees, binary search on answer, or greedy algorithms with non-trivial proof
Problem Structure
1. Title: Create a concise, interesting title that hints at the problem's nature
2. Background story: Develop a brief, engaging narrative context (maximum 3-4 sentences)
3. Problem statement: Clearly describe the computational task
4. Input format: Specify precisely how input data is structured
5. Output format: Define exactly what output is expected
6. Input constraints: List all variable constraints (e.g., 1 ≤ n ≤ 10^5)
7: Program constraints: Provide time and memory limits.
7. Examples: Provide 2-3 test cases with explanations
8. Notes/Hints: Optional section for additional clarification
Problem Characteristics * *
*Should have a non-obvious optimal solution *
Should avoid being a direct application of a standard algorithm * *
*Should require insight/observation that's not immediately apparent *
Should have multiple approaches with different complexities * *
*Should test both implementation skill and algorithmic thinking *
Should be original, not a variant of a well-known problem Areas to Consider * Problems involving optimization with multiple constraints *
Problems requiring observation of mathematical patterns * *
*Problems with interesting state representation in dynamic programming * Problems on trees or graphs with non-standard traversal requirements * Problems requiring combined techniques (e.g., binary search + greedy) *
*Testing Requirements * Corner cases should be non-trivial * Solution should not be breakable by simple counterexamples *
Problem should not have unintended solutions simpler than the intended one
* Edge cases should be meaningful, not just boundary checking
Generate a problem that would be challenging yet solvable within a contest time frame, requiring careful thought but having an elegant solution. The problem should feel satisfying to solve, with an "aha moment" when the key insight is discovered.
This is the problem Claude came up with:
Balanced Delivery Routes
Background
The Algorithmic Delivery Service operates in a linear city with n consecutive distribution centers. Each day, they must divide their fleet of K delivery trucks among these centers. Each truck must cover a contiguous segment of distribution centers, and every center must be assigned to exactly one truck. Due to maintenance constraints, the company wants to minimize the workload of the most burdened truck.
Problem Statement
You are given n distribution centers in a row, where the i-th center has w_i packages to be delivered. You need to assign K trucks to contiguous segments of these centers such that:
Each distribution center is assigned to exactly one truck
Each truck covers a contiguous segment of centers
The maximum total number of packages assigned to any single truck is minimized
Input Format
The first line contains two integers n and K (1 ≤ K ≤ n ≤ 10^5)
The second line contains n integers w_1, w_2, ..., w_n (1 ≤ w_i ≤ 10^9), where w_i is the number of packages at the i-th distribution center
Output Format
A single integer representing the minimum possible maximum workload for any truck
Constraints
1 ≤ K ≤ n ≤ 10^5
1 ≤ w_i ≤ 10^9
Time limit: 2 seconds
Memory limit: 256 MB
Examples
Example 1
5 3
2 8 5 1 9
Output:
9
Explanation: We can distribute the trucks as follows:
Truck 1: centers [1, 2] with 2+8 = 10 packages
Truck 2: centers [3, 4] with 5+1 = 6 packages
Truck 3: center [5] with 9 packages
The maximum workload is 10, but we can do better.
Optimal distribution:
Truck 1: center [1] with 2 packages
Truck 2: centers [2, 3] with 8+5 = 13 packages
Truck 3: centers [4, 5] with 1+9 = 10 packages
The maximum workload is 9, which is minimal.
Example 2
Copy
7 2
3 1 4 1 5 9 2
Output:
14
Explanation: The optimal assignment is:
Truck 1: centers [1, 2, 3, 4] with 3+1+4+1 = 9 packages
Truck 2: centers [5, 6, 7] with 5+9+2 = 16 packages
Actually, there's a better assignment:
Truck 1: centers [1, 2, 3] with 3+1+4 = 8 packages
Truck 2: centers [4, 5, 6, 7] with 1+5+9+2 = 17 packages
And yet one more:
Truck 1: centers [1, 2, 3, 4, 5] with 3+1+4+1+5 = 14 packages
Truck 2: centers [6, 7] with 9+2 = 11 packages
The maximum workload is 14, which is minimal.
It waffles around the correct outputs to the example inputs. It gets the first example wrong. The maximum workload for that example is 10. Otherwise, I found the problem genuinely interesting. I spent some time thinking about segment trees and then I had an "aha moment" when I realized it was binary search + greedy. Basically you binary search the maximum workload and then greedily fill up trucks up to but not beyond that workload. The binary search allows you to find that smallest maximum workload and is basically a constant factor on top of an O(N) greedy algorithm. I think this could suffice for a Codeforces div. 2 D problem because for a lot of competitive programmers the application of binary search to this problem is just not that obvious. This would still be on the easier end of div 2. D problems but it is definitely harder than a A or B problem. I could see this as a div 2. C problem.
Let's make the problems harder. I'll do the same process to make a div 2 E problem.
Prompt
Create an original competitive programming problem suitable for Problem E in a Codeforces Division 2 contest with the following characteristics:
Difficulty and Prerequisites
* Should require knowledge of at least two algorithmic techniques combined in a non-obvious way
* Appropriate difficulty: should be solvable by experienced competitive programmers but challenging enough that less than 25% of Division 2 participants would solve it
* Time limit: solution should run within 2 seconds for maximum constraints
* Memory limit: ~256MB
Problem Structure
1. Theme: Choose an engaging real-world scenario that naturally maps to the algorithmic challenge
2. Problem Statement:
* Begin with a brief narrative setup (2-3 sentences)
* Clearly define the task, input format, and expected output
* Include 2-3 examples with explanations
* Avoid unnecessary complexity in the problem description
Algorithmic Requirements
* Primary focus should be on one of:
* Dynamic programming with a non-trivial state representation
* Graph algorithm with an interesting twist
* Data structure that requires lazy propagation or persistent features
* Computational geometry problem requiring both geometric intuition and algorithmic skill
* Game theory with non-trivial winning strategy
* Should not be solvable by a standard algorithm implementation without modifications
* Should have an elegant solution that feels satisfying once discovered
Constraints
* Include reasonable but challenging constraints (array sizes ~10^5)
* Consider adding multiple subtasks with increasing difficulty
* Ensure there are tricky edge cases that separate correct from partially correct solutions
Originality Factors
* Avoid direct variations of well-known problems
* Include at least one unexpected insight or observation needed for the optimal solution
* Create a problem where the brute force approach is clear but insufficient
Ancient Expedition
Problem Description
Archaeologist Ada has discovered a network of N ancient ruins connected by M bidirectional paths. Each path requires a specific amount of energy to traverse.
Within these ruins, there are K precious artifacts. Each artifact is located at a specific ruin and has two properties:
Ada starts at ruin 1 with E units of energy. Her goal is to collect some subset of artifacts and return to ruin 1 without running out of energy, while maximizing the total value of the artifacts collected.
When Ada carries an artifact, she consumes additional energy for each path she traverses. Specifically, if a path normally requires p units of energy, and Ada is carrying artifacts with extraction costs c₁, c₂, ..., cₙ, then traversing that path will consume p + (c₁ + c₂ + ... + cₙ)/2 units of energy (rounded down to the nearest integer).
Your task is to help Ada plan her expedition to maximize the total value of artifacts she can collect and bring back to ruin 1 without exceeding her energy limit.
Input
Output
Example 1
Input
Output
Explanation
There are 5 ruins, 6 paths, 3 artifacts, and Ada has 20 energy.
The optimal path is:
Total energy spent: 2 + 5 + 11 = 18 < 20 Total value collected: 10 + 15 = 25
Example 2
Input
Output
Explanation
There are 3 ruins, 3 paths, 2 artifacts, and Ada has 10 energy.
Ada does not have enough energy to collect both artifacts and return to ruin 1. The best she can do is collect artifact 2, for a total value of 8.
The energy calculations are:
Example 3
Input
Output
Explanation
After checking all possible combinations, the optimal path is to collect artifacts 1 and 4:
Total energy spent: 6 + 8 + 5 + 7 = 26 < 30 Total value collected: 10 + 12 = 22
This was dissappointing. The problem is obvious to me. It is solved with a bitmasking and dynamic programming. I rarely solve div 2. E problems
This is easily solved with dynamic programming and bit-masking. I should not easily find solutions to div. 2 E problems. I pressed Claude to make a harder problem.
Quantum Network Optimization
Quantum Network Optimization
Problem Description
Researchers are developing a quantum computer with N quantum bits (qubits) arranged in a network. The qubits are numbered from 1 to N, and there are M bidirectional connections between them. Each connection (i, j) has an entanglement quality value q_ij.
The network stability depends on a property called "quantum coherence." The coherence of a path between qubits is defined as the minimum entanglement quality among all connections in that path.
The researchers need to perform Q operations on the network:
Help the researchers perform these operations efficiently.
Input
Output
Example 1
Input
Output
Explanation
Initial network has 5 qubits with connections:
Operations:
Example 2
Input
Output
Explanation
Initial network:
Operations:
Constraints
The graph may contain self-loops
Claude likes Quantum problems.
I did not know how to solve this problem and neither did Claude. When I prompted for a solution it gave me an answer that at first sight looked possibly correct but in fact had fatal flaws. This is the pseudocode it gave.
Quantum Network Optimization pseudocode
1. Sort all edges by quality in descending order
2. For each query (u, v, k), store it in a map indexed by the quality threshold
3. Initialize DSU with n disconnected components
4. Initialize an array answers[] for all queries
5. For each distinct quality q in descending order:
a. Add all edges with quality q to DSU
b. For each query (u, v, k) associated with this quality:
i. Check if u and v are connected in the DSU
ii. If connected, this is a valid coherence value
iii. Decrement k for this query
iv. If k becomes 0, this is the k-th best coherence, store in answers
6. Return answers for all queries
The most glaring flaw is that this solution makes no attempt at handling the dynamic updates to the graph. Probing Claude to correct these mistakes does get to any workable solution.
This is a common pattern. I'll ask Claude to make a hard problem that is on the edge of its ability. (This edge is also clearly further out for Claude 3.7 than Claude 3.5.). The problem will at first be too easy. I then ask Claude to make the problem harder and Claude makes a problem that it can't solve and is possibly unsolvable with the time and memory constraints.
Intuitively this makes sense, it should be harder to make hard problems. However this does challenge one naive perception of how LLMs function. One might expect that an LLM's success on a task depends on how many related instances of the task are in its training data (I certainly thought that for a while). However, for every div 2. A problem there is a div 2. E problem and a tutorial to go along with it and Claude 3.7 has no problem writing div 2. A problems but struggles writing div 2. E problems.
Does someone have a clean explanation for why the naive perception of LLMs fails?
Has anyone prompted an LLM to ask an interesting and novel question?