A review-first workflow for grading handwritten math tests

Most of the time it takes to grade a class set of handwritten math isn’t spent on the hard questions. It’s spent on the easy ones — writing “great work” thirty times, recomputing the same sum to confirm the same number, marking the same clean factorisation for the same six marks on paper after paper.

A review-first workflow flips that. Instead of going A to Z through every paper, you look at the small subset of answers that actually need a human decision first, and approve the rest in a pass. Here’s what that looks like in practice.

Step 1. Mark the rubric before the papers

This is the only step that happens before you look at any student work. Decide what a full mark looks like per question, what a half looks like, and what counts as a valid alternate form. Write it down. If you don’t, you’ll drift between the first paper and the thirtieth, and the late-pile students will feel it.

If you’re using a tool to draft marks for you, this is where you hand over the rubric. Nothing else the tool does matters if the rubric is vague.

Step 2. Triage by confidence, not by paper

Once the first pass of marks exists — whether you drafted them yourself on a first read-through or a tool drafted them for you — don’t jump straight into proofreading from student one. Sort the answers by how confident you (or the tool) are in the mark.

Three buckets:

Clearly correct, clearly complete. Clean work, right method, right answer, full marks. Ninety per cent of the class usually lives here on most questions.
Clearly wrong. Wrong method, blank, or a final answer that obviously doesn’t follow from the work.
Actually uncertain. Messy handwriting, unusual approach, partial work, a step that could be read two ways, a final answer in a form that’s equivalent but not identical to the expected one.

Bucket 3 is what you’re being paid to look at. Buckets 1 and 2 are where you save time.

Step 3. Deal with the uncertain bucket first

Work through the flagged answers one at a time. This is the only part of marking where you need full attention: these are the answers where the mark and the comment actually depend on you. Don’t rush it. This is where a consistent rubric earns its keep — you’ll see cases that sit right on the boundary, and you want the same decision both times.

A few rules of thumb that save time here:

Read the setup first, the final answer last. If the setup is wrong, the final answer is almost incidental.
Give credit for any valid form. If the rubric allows “factored or expanded” and the student wrote one of those, that’s the mark. You don’t need to reconvert.
When in doubt, lean toward the student. You can always lower a mark on a regrade; an unfairly low mark that doesn’t get raised costs the student a conversation they shouldn’t have had to start.

Step 4. Skim the confident buckets

Now you open the “clearly correct” and “clearly wrong” buckets and skim. You’re not re-marking. You’re spot-checking. A good rhythm: read one in five at random, plus the first and last paper in each bucket. If nothing looks off after eight or ten skims, approve the whole bucket and move on.

The first time you do this, it feels wrong. Marking is supposed to feel thorough, and skimming doesn’t. But the thorough work has already happened — it happened in step 3. Steps 2 and 4 exist because the human effort gets more accurate when it’s concentrated, not diluted.

Where AI-drafted marks tend to be safe — and where they don’t

A year of watching teachers use draft marks on handwritten math gives a rough pattern:

Usually safe:

Clean final answers on standard forms (integers, simplified fractions, decimals to stated precision).
Showing-your-work questions where the worked steps match a rubric-listed method.
Short-answer questions on routine procedures: solve, factor, evaluate, simplify.

Needs a closer look:

Multi-step problems where the student used a valid but unusual approach.
Answers in a form the rubric didn’t explicitly list (e.g., writing (x - 3)(x + 2) when the rubric said (x + 2)(x - 3)).
Geometry diagrams and graphing, where the “answer” is visual.
Anything the student clearly rushed, scribbled, or corrected mid-stream.
Word problems where the student wrote prose that carries part of the reasoning.

None of that list should be surprising — it’s the same list of places a second human would want a look. The value of flagging isn’t to tell you something you didn’t know. It’s to let you find those cases without reading every paper from the top to find them.

Step 5. Approve and generate feedback

Once buckets 1 and 2 are spot-checked and bucket 3 is worked through, you approve. The last step is feedback. This is the easiest place to cut corners that shouldn’t be cut — a feedback sheet with real per-question notes is worth a lot more than a total at the top of the paper, and you’ve already done most of the thinking in step 3.

If you’re doing this by hand, copy-paste your bucket-3 comments into a per-student summary. If you’re using a tool to draft comments, this is the part where you read the drafts and make sure they sound like feedback from you, not from a chatbot.

Then print, staple, return. The papers go back Tuesday, not next Monday, and you still had a weekend.