Conscience Technology

Overview

We tested whether collecting an LLM's actual past failures, abstracting them into reusable patterns, and injecting them into prompts reduces error repetition.

Task: Korean document-grounded fact verification (avg 1,424 words, 5 domains)

Core finding: Accuracy went from 81.2% to 90.0% (p=0.032), but the mechanism was not what we initially expected.

5-Arm Experiment Results

Condition	Accuracy	NS F1	vs Baseline	p-value
A - Baseline	81.2%	0.857	—	—
B - Static toy examples	82.5%	0.870	+1.2%p	0.501
D - Length control	82.5%	0.868	+1.2%p	0.512
B' - Random real failures	91.2%	0.939	+10.0%p	0.010
C - Retrieved failures	90.0%	0.931	+8.8%p	0.032

No significant difference between B' and C (p=0.749). Retrieval added no value.

Effect Decomposition

Factor	Delta	Significant?	Interpretation
Length/attention (A→D)	+1.2%p	No	Longer prompts don't help
Failure content (D→B')	+8.8%p	Yes	This is the driver
Retrieval (B'→C)	-1.2%p	No	No added value

Per-Error-Type Improvement

Error Type	Baseline	With Failures	Delta
Factual Mismatch	100%	100%	0
Negation Flip	62%	100%	+38%p
Condition/Intensity	62%	77%	+15%p
Certainty/Status	46%	77%	+31%p
Ungrounded Reasoning	100%	100%	0

3-Phase Pipeline

Phase 1: Run 80 items, collect 13 failures (16.2% error rate)

Phase 2: Abstract each failure into Pattern / Signal / Lesson

Phase 3: Test on 80 completely new items across 5 conditions

Why Random Works

With 77% of failures concentrated in types 5 and 6, the probability of randomly sampling 3 items with zero relevant failures is only 1.2%.

Industry

Brix