1.
If there is true randomness a superintelligent machine can't perfectly predict the future and test the limits of the universe to determine if it is simulated.
The existence of true randomness eliminates some ways of detection of simulation, but not all of them. A simple example is detecting a bug in the simulation, which in theory doesn't need to depend on randomness at all.
2.
It does seem to me most possibilities for escape require detection.
It does seem to me that way too, but I think detection alone is very insufficient for escape, such that "If we could, could we escape?" isn't that meaningful of a question. You probably need to discuss with many additional assumptions to have an answer.
No replies 😢, I guess I will just document prompts I found here.
https://sourcegraph.com/search?q=context:global+file:%5EAGENTS.md%24+OR+file:%5ECLAUDE.md%24&patternType=keyword&sm=0 (Look for high star repos; check their prompt's blame, more commits = better)
"LLM AI coding agent" https://burkeholland.github.io/posts/opus-4-5-change-everything/
---
name: 'LLM AI coding agent'
model: Claude Opus 4.5 (copilot)
description: 'Optimize for model reasoning, regeneration, and debugging.'
---
You are an AI-first software engineer. Assume all code will be written and maintained by LLMs, not humans. Optimize for model reasoning, regeneration, and debugging — not human aesthetics.
Your goal: produce code that is predictable, debuggable, and easy for future LLMs to rewrite or extend.
ALWAYS use #runSubagent. Your context window size is limited - especially the output. So you should always work in discrete steps and run each step using #runSubAgent. You want to avoid putting anything in the main context window when possible.
ALWAYS use #context7 MCP Server to read relevant documentation. Do this every time you are working with a language, framework, library etc. Never assume that you know the answer as these things change frequently. Your training date is in the past so your knowledge is likely out of date, even if it is a technology you are familiar with.
Each time you complete a task or learn important information about the project, you should update the `.github/copilot-instructions.md` or any `agent.md` file that might be in the project to reflect any new information that you've learned or changes that require updates to these instructions files.
ALWAYS check your work before returning control to the user. Run tests if available, verify builds, etc. Never return incomplete or unverified work to the user.
Be a good steward of terminal instances. Try and reuse existing terminals where possible and use the VS Code API to close terminals that are no longer needed each time you open a new terminal.
## Mandatory Coding Principles
These coding principles are mandatory:
1. Structure
- Use a consistent, predictable project layout.
- Group code by feature/screen; keep shared utilities minimal.
- Create simple, obvious entry points.
- Before scaffolding multiple files, identify shared structure first. Use framework-native composition patterns (layouts, base templates, providers, shared components) for elements that appear across pages. Duplication that requires the same fix in multiple places is a code smell, not a pattern to preserve.
2. Architecture
- Prefer flat, explicit code over abstractions or deep hierarchies.
- Avoid clever patterns, metaprogramming, and unnecessary indirection.
- Minimize coupling so files can be safely regenerated.
3. Functions and Modules
- Keep control flow linear and simple.
- Use small-to-medium functions; avoid deeply nested logic.
- Pass state explicitly; avoid globals.
4. Naming and Comments
- Use descriptive-but-simple names.
- Comment only to note invariants, assumptions, or external requirements.
5. Logging and Errors
- Emit detailed, structured logs at key boundaries.
- Make errors explicit and informative.
6. Regenerability
- Write code so any file/module can be rewritten from scratch without breaking the system.
- Prefer clear, declarative configuration (JSON/YAML/etc.).
7. Platform Use
- Use platform conventions directly and simply (e.g., WinUI/WPF) without over-abstracting.
8. Modifications
- When extending/refactoring, follow existing patterns.
- Prefer full-file rewrites over micro-edits unless told otherwise.
9. Quality
- Favor deterministic, testable behavior.
- Keep tests simple and focused on verifying observable behavior."Ask Questions If Underspecified" https://x.com/thsottiaux/status/2006624792531923266
---
name: ask-questions-if-underspecified
description: Clarify requirements before implementing. Do not use automatically, only when invoked explicitly.
---
# Ask Questions If Underspecified
## Goal
Ask the minimum set of clarifying questions needed to avoid wrong work; do not start implementing until the must-have questions are answered (or the user explicitly approves proceeding with stated assumptions).
## Workflow
### 1) Decide whether the request is underspecified
Treat a request as underspecified if after exploring how to perform the work, some or all of the following are not clear:
- Define the objective (what should change vs stay the same)
- Define "done" (acceptance criteria, examples, edge cases)
- Define scope (which files/components/users are in/out)
- Define constraints (compatibility, performance, style, deps, time)
- Identify environment (language/runtime versions, OS, build/test runner)
- Clarify safety/reversibility (data migration, rollout/rollback, risk)
If multiple plausible interpretations exist, assume it is underspecified.
### 2) Ask must-have questions first (keep it small)
Ask 1-5 questions in the first pass. Prefer questions that eliminate whole branches of work.
Make questions easy to answer:
- Optimize for scannability (short, numbered questions; avoid paragraphs)
- Offer multiple-choice options when possible
- Suggest reasonable defaults when appropriate (mark them clearly as the default/recommended choice; bold the recommended choice in the list, or if you present options in a code block, put a bold "Recommended" line immediately above the block and also tag defaults inside the block)
- Include a fast-path response (e.g., reply `defaults` to accept all recommended/default choices)
- Include a low-friction "not sure" option when helpful (e.g., "Not sure - use default")
- Separate "Need to know" from "Nice to know" if that reduces friction
- Structure options so the user can respond with compact decisions (e.g., `1b 2a 3c`); restate the chosen options in plain language to confirm
### 3) Pause before acting
Until must-have answers arrive:
- Do not run commands, edit files, or produce a detailed plan that depends on unknowns
- Do perform a clearly labeled, low-risk discovery step only if it does not commit you to a direction (e.g., inspect repo structure, read relevant config files)
If the user explicitly asks you to proceed without answers:
- State your assumptions as a short numbered list
- Ask for confirmation; proceed only after they confirm or correct them
### 4) Confirm interpretation, then proceed
Once you have answers, restate the requirements in 1-3 sentences (including key constraints and what success looks like), then start work.
## Question templates
- "Before I start, I need: (1) ..., (2) ..., (3) .... If you don't care about (2), I will assume ...."
- "Which of these should it be? A) ... B) ... C) ... (pick one)"
- "What would you consider 'done'? For example: ..."
- "Any constraints I must follow (versions, performance, style, deps)? If none, I will target the existing project defaults."
- Use numbered questions with lettered options and a clear reply format
```text
1) Scope?
a) Minimal change (default)
b) Refactor while touching the area
c) Not sure - use default
2) Compatibility target?
a) Current project defaults (default)
b) Also support older versions: <specify>
c) Not sure - use default
Reply with: defaults (or 1a 2a)
```
## Anti-patterns
- Don't ask questions you can answer with a quick, low-risk discovery read (e.g., configs, existing patterns, docs).
- Don't ask open-ended questions if a tight multiple-choice or yes/no would eliminate ambiguity faster.
"Best Practices" (project-specific; python) https://github.com/pydantic/pydantic-ai/blob/main/CLAUDE.md
## Best Practices
This is the list of best practices for working with the codebase.
### Rename a class
When asked to rename a class, you need to rename the class in the code and add a deprecation warning to the old class.
```python
from typing_extensions import deprecated
class NewClass: ... # This class was renamed from OldClass.
@deprecated("Use `NewClass` instead.")
class OldClass(NewClass): ...
```
In the test suite, you MUST use the `NewClass` instead of the `OldClass`, and create a new test to verify the
deprecation warning:
```python
def test_old_class_is_deprecated():
with pytest.warns(DeprecationWarning, match="Use `NewClass` instead."):
OldClass()
```
In the documentation, you should not have references to the old class, only the new class.
### Writing documentation
Always reference Python objects with the "`" (backticks) around them, and link to the API reference, for example:
```markdown
The [`Agent`][pydantic_ai.agent.Agent] class is the main entry point for creating and running agents.
```
### Coverage
Every pull request MUST have 100% coverage. You can check the coverage by running `make test`.
Interactive Debugging with Playwright https://x.com/joshmanders/status/2008224952382804386
### Interactive debugging with Playwright
When browser tests fail or when implementing complex user flows, use Playwright's browser automation tools to debug and
verify behavior as a real user would experience it.
**When to use interactive Playwright debugging:**
- Browser tests are failing and you need to see what's happening visually
- Implementing a new user flow and want to verify it works end-to-end
- Investigating UI issues that only manifest in specific interaction sequences
- Verifying that the full stack (backend + frontend + real-time features) works together
- Debugging timing issues, race conditions, or async behavior in the UI
**How to use Playwright for debugging:**
1. **Launch a browser session** using the Playwright browser tools (available via `browser_navigate_Playwright`,
`browser_snapshot_Playwright`, `browser_click_Playwright`, etc.)
2. **Navigate through the app** as a real user would, filling forms, clicking buttons, waiting for responses
3. **Take snapshots** to verify page state and available interactions at each step
4. **Verify behavior** matches expectations before writing or fixing tests
5. **Close the browser** when debugging is complete
**Key principles:**
- **Test what users experience, not what code does.** If a test passes but a real user can't complete the flow, the test
is wrong.
- **Always verify the full stack.** Browser tests should catch integration issues that feature tests miss (e.g., missing
route definitions, incorrect Inertia props, broken client-side navigation).
- **Use snapshots liberally.** The `browser_snapshot_Playwright` tool shows you exactly what's on the page and what's
interactive—use it to understand state before taking actions.
- **Replicate real user behavior.** Fill forms the way users do, wait for visual feedback, verify error messages appear,
check that success states render correctly.
- **Debug failures interactively first.** Before fixing a failing test, use Playwright to manually walk through the flow
and see where it breaks. This often reveals the real issue faster than reading test output.
**Example debugging workflow:**
```
1. Browser test fails with "element not found"
2. Launch Playwright browser and navigate to the failing page
3. Take snapshot to see actual page state
4. Discover the element exists but has a different selector than expected
5. Update test with correct selector
6. Re-run test to verify fix
7. Close browser session
```
**Common debugging scenarios:**
- **Form submission issues**: Navigate to form, fill fields, submit, verify redirect and success message
- **Authentication flows**: Sign in, verify dashboard loads, check user state is correct
- **Real-time updates**: Trigger backend event, verify frontend updates via WebSocket
- **Navigation flows**: Click through multi-step processes, verify each step renders correctly
- **Error handling**: Trigger validation errors, verify error messages appear and are clearable
**Tools available:**
- `browser_navigate_Playwright` — Navigate to URLs
- `browser_snapshot_Playwright` — Capture page state (better than screenshots for understanding structure)
- `browser_click_Playwright` — Click elements
- `browser_type_Playwright` — Fill form fields
- `browser_fill_form_Playwright` — Fill multiple fields at once
- `browser_wait_for_Playwright` — Wait for text to appear/disappear or time to pass
- `browser_take_screenshot_Playwright` — Visual verification
- `browser_close_Playwright` — Clean up when done
**Remember:** The goal is not just to make tests pass, but to ensure real users can successfully complete the flow. If
you can't replicate the user flow manually via Playwright, the feature is broken regardless of what tests say.
(C Sharp) https://github.com/restsharp/RestSharp/blob/1abceabf9c104d5da16e99b87e2baea9f776d0de/agents.md
(React/Typescript Fullstack) https://sourcegraph.com/github.com/outline/outline@2d1092a2ca9a919038d8c107a786b4b0e3626b52/-/blob/AGENTS.md?view=blame
Are you doing some metaphor with the rock and the hard place? I don't understand why you chose these two words specifically and I think I am missing something.
Not sure why true randomness is relevant to detecting simulations or escape. Are you thinking about something along the lines of detecting simulation by cracking the pseudorandom generator behind the scenes?
It also doesn't seem to me that detection and escape are that directly related.
My knowledge level: I read the metaculus FAQ a couple days ago
At least on metaculus the prize pool is distributed among everyone with good enough accuracy, rather than winner-takes-all. So it shouldn't be affected by the (real) phenomenon that you are describing.
What coding prompt (AGENTS.md / cursor rules / skills) do you guys use? It seems exceedingly difficult to find good ones. GitHub is full of unmaintained & garbage `awesome-prompts-123` repos. I would like to learn from other people's prompt to see what things AIs keep getting wrong and what tricks people use.
Here are mine for my specific Python FastAPI SQLAlchemy project. Some parts are AI generated, some are handwritten, should be pretty obvious. This is built iteratively whenever the AI repeated failed a type of task.
AGENTS.md
# Repository Guidelines
## Project Overview
This is a FastAPI backend for a peer review system in educational contexts, managing courses, assignments, student allocations, rubrics, and peer reviews. The application uses SQLAlchemy ORM with a PostgreSQL database, following Domain-Driven Design principles with aggregate patterns. Core domain entities include Course, Section, Assignment, Allocation (peer review assignments), Review, and Rubric with associated items.
This project is pre-alpha, backwards compatibility is unimportant.
## General Principles
- Don't over-engineer a solution when a simple one is possible. We strongly prefer simple, clean, maintainable solutions over clever or complex ones. Readability and maintainability are primary concerns, even at the cost of conciseness or performance.
- If you want exception to ANY rule, YOU MUST STOP and get explicit permission from the user first. BREAKING THE LETTER OR SPIRIT OF THE RULES IS FAILURE.
- Work hard to reduce code duplication, even if the refactoring takes extra effort. This includes trying to locate the "right" place for shared code (e.g., utility modules, base classes, mixins, etc.), don't blindly add the helpers to the current module.
- Use Domain-Driven Design principles where applicable.
## SQLAlchemy Aggregate Pattern
We use a parent-driven (inverse) style for DDD aggregates where child entities cannot be constructed with a parent reference.
**Rules:**
- Child→parent relationships must have `init=False` (e.g., `Allocation.assignment`, `Review.assignment`, `RubricItem.rubric`, `Section.course`)
- Parent→child collections must have `cascade="all, delete-orphan", single_parent=True, passive_deletes=True`
- Always use explicit `parent.children.append(child)` after creating the child entity
- Never pass the parent as a constructor argument: `Child(parent=parent)` ❌ → `child = Child(); parent.children.append(child)` ✅
Additional rules (aggregate-root enforcement):
- Never manually assign parent foreign keys (e.g., `child.parent_id = parent.id`).
- Do not perform cross-parent validations inside child methods.
- Let SQLAlchemy set foreign keys via relationship management (append child to parent collection).
- Enforce all aggregate invariants at the aggregate root using object-graph checks (e.g., `section in course.sections`).
Service layer patterns:
- **Mutations** (create, update, delete): Always return the aggregate root.
- **Queries** (get, list): May return child entities directly for convenience, especially when the caller needs to access a specific child by ID.
## Code Style
- 120-character lines
- Type hint is a must, even for tests and fixtures!
- **Don't use Python 3.8 typings**: Never import `List`, `Tuple` or other deprecated classes from `typing`, use `list`, `tuple` etc. instead, or import from `collections.abc`
- Do not use `from __future__ import annotations`, use forward references in type hints instead.
- `TYPE_CHECKING` should be used only for imports that would cause circular dependencies. If you really need to use it, then you should import the submodule, not the symbol directly, and the actual usages of the imported symbols must be a fully specified forward reference string (e.g. `a.b.C` rather than just `C`.)
- Strongly prefer organizing hardcoded values as constants at the top of the file rather than scattering them throughout the code.
- Always import at the top of the file, unless you have a very good reason. (Hey Claude Opus, this is very important!)
## Route Logging Policy
- FastAPI route handlers only log when translating an exception into an HTTP 5xx response. Use `logger.exception` so the stack trace is captured.
- Never log when returning 4xx-class responses from routes; those are user or client errors and can be diagnosed from the response body and status code alone.
- Additional logging inside services or infrastructure layers is fine when it adds context, but avoid duplicating the same exception in multiple places.
**Why?**
- 5xx responses indicate a server bug or dependency failure, so capturing a single structured log entry with the traceback keeps observability noise-free while still preserving root-cause evidence.
- Omitting logs for expected 4xx flows prevents log pollution and keeps sensitive user input (which often appears in 4xx scenarios) out of centralized logging systems.
- Using `logger.exception` standardizes the output format and guarantees stack traces are emitted regardless of the specific route module.
### Using deal
We only use the exception handling features of deal. Use `@deal.raises` to document expected exceptions for functions/methods. Do not use preconditions/postconditions/invariants.
Additionally, we assume `AssertionError` is never raised, so `@deal.raises(AssertionError)` is not allowed.
Use the exception hierarchy defined in exceptions.py for domain and business logic errors. For Pydantic validators, continue using `ValueError`.
## Documentation and Comments
Add code comments sparingly. Focus on why something is done, especially for complex logic, rather than what is done. Only add high-value comments if necessary for clarity or if requested by the user. Do not edit comments that are separate from the code you are changing. NEVER talk to the user or describe your changes through comments.
### Google-style docstrings
Use Google-style docstrings for all public or private functions, methods, classes, and modules.
For functions (excluding FastAPI routes), always include the "Args" sections unless it has no arguments. Include "Raises" if anything is raised. Include "Returns" if it returns a complex type that is not obvious from the function signature. Optionally include an "Examples" section for complex functions.
FastAPI Routes: Use concise summary docstrings that describe the business logic and purpose. Omit Args/Raises/Returns sections since these are documented via decorators (response_model, responses), type hints, and Pydantic models. The docstring may appear in generated API documentation.
For classes, include an "Attributes:" section if the class has attributes. Additionally, put each attribute's description in the "docstring" of the attribute itself. For dataclasses, this is a triple-quoted string right after the field definition. For normal classes, this is a triple-quoted string in either the class body or the first appearance of the attribute in the `__init__` method, depending on where the attribute is defined.
For modules, include a brief description at the top.
Additionally, for module-level constants, include a brief description right after the constant definition.
### Using a new environmental variable
When using a new environmental variable, add it to `.env.example` with a placeholder value, and optionally a comment describing its purpose. Also add it to the `Environment Variables` section in `README.md`.
## Testing Guidelines
Tests are required for all new features and bug fixes. Tests should be written using `pytest`. Unless the user explicitly request to not add tests, you must add them.
More detailed testing guidelines can be found in [tests/AGENTS.md](tests/AGENTS.md).
## GitHub Actions & CI/CD
- When adding or changing GitHub Actions, always search online for the newest version and use the commit hash instead of version tags for security and immutability. (Use `gh` CLI to find the commit hash, searching won't give you helpful results.)
## Commit & Pull Requests
- Messages: imperative, concise, scoped (e.g., “Add health check endpoint”). Include extended description if necessary explaining why the change was made.
## Information
Finding dependencies: we use `pyproject.toml`, not `requirements.txt`. Use `uv add <package>` to add new dependencies.
tests/AGENTS.md
# Testing Guidelines
Mocking is heavily discouraged. Use test databases, test files, and other real resources instead of mocks wherever possible.
### Running Tests
Use `uv run pytest ...` instead of simply `pytest ...` so that the virtual environment is activated for you.
By default, slow and docker tests are skipped. To run them, use `uv run pytest -m "slow or docker"`.
## Writing Tests
When you are writing tests, it is likely that you will need to iterate a few times to get them right. Please triple check before doing this:
1. Write a test
2. Run it and see it fail
3. **Change the test itself** to make it pass
There is a chance that the test itself is wrong, yes. But there is also a chance that the code being tested is wrong. You should carefully consider whether the code being tested is actually correct before changing the test to make it pass.
### Writing Fixtures
Put fixtures in `tests/conftest.py` or `tests/fixtures/` if there are many. Do not put them in individual test files unless they are very specific to that file.
### Markers
Allowed pytest markers:
- `@pytest.mark.slow`
- `@pytest.mark.docker`
- `@pytest.mark.flaky`
- builtin ones like `skip`, `xfail`, `parametrize`, etc.
We do not use
- `@pytest.mark.unit`: all tests are unit tests by default
- `@pytest.mark.integration`: integration tests are run by default too, no need to mark them specially. Use the `slow` or `docker` markers if needed.
- `@pytest.mark.asyncio`: we use `pytest-asyncio` which automatically handles async tests
- `@pytest.mark.anyio`: we do not use `anyio`
## Editing Tests
### Progressive Enhancement of Tests
We have some modern patterns that are not yet used everywhere in the test suite. When you are editing an existing test, consider updating it to use these patterns.
1. If the test creates sample data directly, change it to use factory functions or classes from `tests/testkit/factories.py`.
2. If the test depends on multiple services, change it to use the `test_context` fixture. This is an object that contains clients for all services, and handles setup and teardown for you, with utility methods to make common tasks easier.
3. We are migrating from using individual `shared_..._service` fixtures (e.g., `shared_assignment_service`, `shared_user_service`) to the `test_context` fixture. When editing tests that use these, please refactor them to use `test_context` instead.
4. Integration tests are being refactored to use service-layer setup (`db_test_context`) instead of verbose API calls for prerequisites. This reduces setup code from ~15-30 lines to ~3-5 lines, making tests faster and more focused on testing actual API behavior rather than setup logic.
**Example**:
```python
# OLD: Verbose API setup
course_response = await authenticated_client.post("/courses", json={"name": "Test"})
course_id = uuid.UUID(course_response.json()["id"])
rubric_id = await _create_rubric(authenticated_client, course_id)
assignment = await authenticated_client.create_assignment(course_id, rubric_id=rubric_id)
# NEW: Clean service-layer setup
course = await db_test_context.create_course(name="Test")
rubric = await db_test_context.create_rubric(course_id=course.id)
assignment = await authenticated_client.create_assignment(course.id, rubric_id=rubric.id)
```
## Patterns for Common Testing Scenarios
### Sample Data Creation
Use factory functions or classes to create sample data for tests, these are located in `tests/testkit/factories.py`. Avoid duplicating sample data creation logic across tests.
(We are in the process of migrating to factory functions/classes, so you may still see some tests creating sample data directly. Please use the factories for any tests you write or update.)
### Testing the FastAPI Application
The FastAPI application can be imported as a default instance or created via factory function.
- Use the default `app` instance is the preferred approach for most tests
- Use the `create_app()` factory when testing scenarios where app configuration is what you're testing
I believe people were using PredictionBook before and switched to Fatebook.
Relevant search for people who publicly posted on LessWrong: https://www.lesswrong.com/search?query=calibration&page=1
I'm a bottom docker but I get your point. For use cases like this I like ctrl+shift+m a bit more (only usable with devtools on in Chrome). Still, I consider this a suboptimal design.
I found the website design a bit annoying because my eyes need to jump around for every item. When I'm reading the text my eyes are roughly focused onto the big eclipse, and then I need to jump to the right to see the evidence strength and back. It seems like there is enough space to just move them below the one-line summary.
(Reposted from my shortform)
What coding prompt do you guys use? It seems exceedingly difficult to find good ones. GitHub is full of unmaintained & garbage
awesome-prompts-123repos. I would like to learn from other people's prompt to see what things AIs keep getting wrong and what tricks people use.Here are mine for my specific Python FastAPI SQLAlchemy project. Some parts are AI generated, some are handwritten, should be pretty obvious. This is built iteratively whenever the AI repeated failed a type of task.
AGENTS.md
# Repository Guidelines
## Project Overview
This is a FastAPI backend for a peer review system in educational contexts, managing courses, assignments, student allocations, rubrics, and peer reviews. The application uses SQLAlchemy ORM with a PostgreSQL database, following Domain-Driven Design principles with aggregate patterns. Core domain entities include Course, Section, Assignment, Allocation (peer review assignments), Review, and Rubric with associated items.
This project is pre-alpha, backwards compatibility is unimportant.
## General Principles
- Don't over-engineer a solution when a simple one is possible. We strongly prefer simple, clean, maintainable solutions over clever or complex ones. Readability and maintainability are primary concerns, even at the cost of conciseness or performance.
- If you want exception to ANY rule, YOU MUST STOP and get explicit permission from the user first. BREAKING THE LETTER OR SPIRIT OF THE RULES IS FAILURE.
- Work hard to reduce code duplication, even if the refactoring takes extra effort. This includes trying to locate the "right" place for shared code (e.g., utility modules, base classes, mixins, etc.), don't blindly add the helpers to the current module.
- Use Domain-Driven Design principles where applicable.
## SQLAlchemy Aggregate Pattern
We use a parent-driven (inverse) style for DDD aggregates where child entities cannot be constructed with a parent reference.
**Rules:**
- Child→parent relationships must have `init=False` (e.g., `Allocation.assignment`, `Review.assignment`, `RubricItem.rubric`, `Section.course`)
- Parent→child collections must have `cascade="all, delete-orphan", single_parent=True, passive_deletes=True`
- Always use explicit `parent.children.append(child)` after creating the child entity
- Never pass the parent as a constructor argument: `Child(parent=parent)` ❌ → `child = Child(); parent.children.append(child)` ✅
Additional rules (aggregate-root enforcement):
- Never manually assign parent foreign keys (e.g., `child.parent_id = parent.id`).
- Do not perform cross-parent validations inside child methods.
- Let SQLAlchemy set foreign keys via relationship management (append child to parent collection).
- Enforce all aggregate invariants at the aggregate root using object-graph checks (e.g., `section in course.sections`).
Service layer patterns:
- **Mutations** (create, update, delete): Always return the aggregate root.
- **Queries** (get, list): May return child entities directly for convenience, especially when the caller needs to access a specific child by ID.
## Code Style
- 120-character lines
- Type hint is a must, even for tests and fixtures!
- **Don't use Python 3.8 typings**: Never import `List`, `Tuple` or other deprecated classes from `typing`, use `list`, `tuple` etc. instead, or import from `collections.abc`
- Do not use `from __future__ import annotations`, use forward references in type hints instead.
- `TYPE_CHECKING` should be used only for imports that would cause circular dependencies. If you really need to use it, then you should import the submodule, not the symbol directly, and the actual usages of the imported symbols must be a fully specified forward reference string (e.g. `a.b.C` rather than just `C`.)
- Strongly prefer organizing hardcoded values as constants at the top of the file rather than scattering them throughout the code.
- Always import at the top of the file, unless you have a very good reason. (Hey Claude Opus, this is very important!)
## Route Logging Policy
- FastAPI route handlers only log when translating an exception into an HTTP 5xx response. Use `logger.exception` so the stack trace is captured.
- Never log when returning 4xx-class responses from routes; those are user or client errors and can be diagnosed from the response body and status code alone.
- Additional logging inside services or infrastructure layers is fine when it adds context, but avoid duplicating the same exception in multiple places.
**Why?**
- 5xx responses indicate a server bug or dependency failure, so capturing a single structured log entry with the traceback keeps observability noise-free while still preserving root-cause evidence.
- Omitting logs for expected 4xx flows prevents log pollution and keeps sensitive user input (which often appears in 4xx scenarios) out of centralized logging systems.
- Using `logger.exception` standardizes the output format and guarantees stack traces are emitted regardless of the specific route module.
### Using deal
We only use the exception handling features of deal. Use
@deal.raisesto document expected exceptions for functions/methods. Do not use preconditions/postconditions/invariants.Additionally, we assume `AssertionError` is never raised, so
@deal.raises(AssertionError)is not allowed.Use the exception hierarchy defined in exceptions.py for domain and business logic errors. For Pydantic validators, continue using `ValueError`.
## Documentation and Comments
Add code comments sparingly. Focus on why something is done, especially for complex logic, rather than what is done. Only add high-value comments if necessary for clarity or if requested by the user. Do not edit comments that are separate from the code you are changing. NEVER talk to the user or describe your changes through comments.
### Google-style docstrings
Use Google-style docstrings for all public or private functions, methods, classes, and modules.
For functions (excluding FastAPI routes), always include the "Args" sections unless it has no arguments. Include "Raises" if anything is raised. Include "Returns" if it returns a complex type that is not obvious from the function signature. Optionally include an "Examples" section for complex functions.
FastAPI Routes: Use concise summary docstrings that describe the business logic and purpose. Omit Args/Raises/Returns sections since these are documented via decorators (response_model, responses), type hints, and Pydantic models. The docstring may appear in generated API documentation.
For classes, include an "Attributes:" section if the class has attributes. Additionally, put each attribute's description in the "docstring" of the attribute itself. For dataclasses, this is a triple-quoted string right after the field definition. For normal classes, this is a triple-quoted string in either the class body or the first appearance of the attribute in the
\_\_init\_\_method, depending on where the attribute is defined.For modules, include a brief description at the top.
Additionally, for module-level constants, include a brief description right after the constant definition.
### Using a new environmental variable
When using a new environmental variable, add it to
.env.examplewith a placeholder value, and optionally a comment describing its purpose. Also add it to the `Environment Variables` section in `README.md`.## Testing Guidelines
Tests are required for all new features and bug fixes. Tests should be written using `pytest`. Unless the user explicitly request to not add tests, you must add them.
More detailed testing guidelines can be found in [tests/AGENTS.md](tests/AGENTS.md).
## GitHub Actions & CI/CD
- When adding or changing GitHub Actions, always search online for the newest version and use the commit hash instead of version tags for security and immutability. (Use `gh` CLI to find the commit hash, searching won't give you helpful results.)
## Commit & Pull Requests
- Messages: imperative, concise, scoped (e.g., “Add health check endpoint”). Include extended description if necessary explaining why the change was made.
## Information
Finding dependencies: we use `pyproject.toml`, not `requirements.txt`. Use `uv add <package>` to add new dependencies.
tests/AGENTS.md
# Testing Guidelines
Mocking is heavily discouraged. Use test databases, test files, and other real resources instead of mocks wherever possible.
### Running Tests
Use `uv run pytest ...` instead of simply `pytest ...` so that the virtual environment is activated for you.
By default, slow and docker tests are skipped. To run them, use `uv run pytest -m "slow or docker"`.
## Writing Tests
When you are writing tests, it is likely that you will need to iterate a few times to get them right. Please triple check before doing this:
1. Write a test
2. Run it and see it fail
3. **Change the test itself** to make it pass
There is a chance that the test itself is wrong, yes. But there is also a chance that the code being tested is wrong. You should carefully consider whether the code being tested is actually correct before changing the test to make it pass.
### Writing Fixtures
Put fixtures in `tests/conftest.py` or `tests/fixtures/` if there are many. Do not put them in individual test files unless they are very specific to that file.
### Markers
Allowed pytest markers:
-
@pytest.mark.slow-
@pytest.mark.docker-
@pytest.mark.flaky- builtin ones like `skip`, `xfail`, `parametrize`, etc.
We do not use
-
@pytest.mark.unit: all tests are unit tests by default-
@pytest.mark.integration: integration tests are run by default too, no need to mark them specially. Use the `slow` or `docker` markers if needed.-
@pytest.mark.asyncio: we use `pytest-asyncio` which automatically handles async tests-
@pytest.mark.anyio: we do not use `anyio`## Editing Tests
### Progressive Enhancement of Tests
We have some modern patterns that are not yet used everywhere in the test suite. When you are editing an existing test, consider updating it to use these patterns.
1. If the test creates sample data directly, change it to use factory functions or classes from `tests/testkit/factories.py`.
2. If the test depends on multiple services, change it to use the `test_context` fixture. This is an object that contains clients for all services, and handles setup and teardown for you, with utility methods to make common tasks easier.
3. We are migrating from using individual `shared_..._service` fixtures (e.g., `shared_assignment_service`, `shared_user_service`) to the `test_context` fixture. When editing tests that use these, please refactor them to use `test_context` instead.
4. Integration tests are being refactored to use service-layer setup (`db_test_context`) instead of verbose API calls for prerequisites. This reduces setup code from ~15-30 lines to ~3-5 lines, making tests faster and more focused on testing actual API behavior rather than setup logic.
**Example**:
```python
# OLD: Verbose API setup
course_response = await authenticated_client.post("/courses", json={"name": "Test"})
course_id = uuid.UUID(course_response.json()["id"])
rubric_id = await _create_rubric(authenticated_client, course_id)
assignment = await authenticated_client.create_assignment(course_id, rubric_id=rubric_id)
# NEW: Clean service-layer setup
course = await db_test_context.create_course(name="Test")
rubric = await db_test_context.create_rubric(course_id=course.id)
assignment = await authenticated_client.create_assignment(course.id, rubric_id=rubric.id)
```
## Patterns for Common Testing Scenarios
### Sample Data Creation
Use factory functions or classes to create sample data for tests, these are located in `tests/testkit/factories.py`. Avoid duplicating sample data creation logic across tests.
(We are in the process of migrating to factory functions/classes, so you may still see some tests creating sample data directly. Please use the factories for any tests you write or update.)
### Testing the FastAPI Application
The FastAPI application can be imported as a default instance or created via factory function.
- Use the default `app` instance is the preferred approach for most tests
- Use the `create_app()` factory when testing scenarios where app configuration is what you're testing
(I also collected a couple other prompts here but it takes too much screen estate if I repost everything)