
VIII. DISCUSSION
A. Limitations
Absence blindness is a structural property of the NTP
objective, but it is not absolute. Several mechanisms partially
counteract it:
1)
Instruction following: Fine-tuning on instruction-
following data can encode checklist-like behaviors im-
plicitly, so that the model implicitly completes structured
tasks. Heo et al. [32] show that internal states encode
instruction compliance, suggesting that instruction-tuned
models may partially internalize presence manifests.
However, this effect is task-specific and does not gener-
alize to novel gap types.
2)
Emergent planning: Dong et al. [33] show that LLM hid-
den representations encode future output structure beyond
the immediate next token, suggesting that deep models
implicitly plan ahead. However, this emergent planning
operates over probable continuations, not over absent-
but-required deliverables—the gap remains invisible even
if the model has latent planning capacity.
3)
Test-time compute scaling: Extended chain-of-thought
reasoning (e.g., OpenAI o1/o3 models) introduces more
generation steps, potentially allowing gaps to become
visible as the model articulates its reasoning. However,
Heyman and Zylberberg [11] demonstrate that even o3-
mini maintains hallucination patterns driven by absent-
but-filled edges, suggesting that extended thinking does
not resolve absence blindness.
B. Ethical Implications
Absence blindness has direct safety implications for high-
stakes LLM deployments. In clinical documentation, Mess et
al. [14] warn that “insidious and potentially significant errors
of omission” in AI-generated medical notes can harm patients.
In legal and financial contexts, an unverified claim that is
absent from the model’s epistemic ground may be generated
as confident output with no signal of uncertainty. The EPP
framework provides an actionable principle for safety engineers:
every high-stakes deployment must specify a presence manifest
and a gap-hunting mechanism.
C. Theoretical Connection to Formal Languages
The Gap-Hunter scaffold shares structural properties with
runtime verification in formal methods [34]: a finite automaton
over task states (PENDING, SATISFIED, FAILED) that moni-
tors execution against a specification. LLM agents operating
without such a monitor are analogous to programs running
without assertions—they may terminate successfully or silently
violate invariants with no observable difference at the output
level.
IX. CONCLUSION
We have argued that LLMs are structurally blind to absence:
the next-token prediction objective confers influence only
on tokens that are present in context, and thus the model
cannot perceive missing deliverables, unverified claims, or
uncompleted task steps. We named this property absence
blindness, formalized it in terms of the NTP loss function, and
showed that the entire literature on LLM failure mitigation—
from RAG to checklist prompting to verification agents—
converges on a single design principle: gaps must be hunted
from the outside and made into present objects.
The Externalized Presence Principle is not a new technique
but a unifying explanation for why existing techniques work.
Its practical upshot is a design imperative: any system de-
ploying LLMs in high-stakes agentic settings must include
an external gap-hunter that enumerates required deliverables,
detects unsatisfied items after each generation step, and injects
gap objects back into context until the presence manifest is
complete. Absence, left unaddressed, is invisible. Presence,
explicitly constructed, is the only force that moves the next
token.
REFERENCES
[1]
L. Chen et al., “Next token prediction towards multimodal intelligence:
A comprehensive survey,” arXiv:2412.18619, Dec. 2024. [Online].
Available: https://arxiv.org/abs/2412.18619
[2]
R. T. McCoy, S. Yao, D. Friedman, M. Hardy, and T. L. Griffiths,
“Embers of autoregression: Understanding large language models through
the problem they are trained to solve,” arXiv:2309.13638, Sep. 2023.
[Online]. Available: https://arxiv.org/abs/2309.13638
[3]
G. Bachmann and V. Nagarajan, “The pitfalls of next-token prediction,”
arXiv:2403.06963, Jul. 2024. [Online]. Available: https://arxiv.org/abs/
2403.06963
[4]
S. M. Downes, P. Forber, and A. Grzankowski, “LLMs are not just next
token predictors,” arXiv:2408.04666, Aug. 2024. [Online]. Available:
https://arxiv.org/abs/2408.04666
[5]
S. Banerjee, A. Agarwal, and S. Singla, “LLMs will always hallucinate,
and we need to live with this,” arXiv:2409.05746, Sep. 2024. [Online].
Available: https://arxiv.org/abs/2409.05746
[6]
Z. Xu, S. Jain, and M. Kankanhalli, “Hallucination is inevitable: An
innate limitation of large language models,” arXiv:2401.11817, Feb. 2025.
[Online]. Available: https://arxiv.org/abs/2401.11817
[7]
A. Dejl et al., “Comprehensiveness metrics for automatic evaluation of
factual recall in text generation,” arXiv:2510.07926, Oct. 2025. [Online].
Available: https://arxiv.org/abs/2510.07926
[8]
N. F. Liu et al., “Lost in the middle: How language models use long
contexts,” Trans. Assoc. Comput. Linguistics, vol. 12, 2024. DOI: https:
//doi.org/10.1162/tacl a 00638
[9]
C.-Y. Hsieh et al., “Found in the middle: Calibrating positional attention
bias improves long context utilization,” arXiv:2406.16008, Jul. 2024.
[Online]. Available: https://arxiv.org/abs/2406.16008
[10]
T. Lu, M. Gao, K. Yu, A. Byerly, and D. Khashabi, “Insights into
LLM long-context failures: When transformers know but don’t tell,”
arXiv:2406.14673, Oct. 2024. [Online]. Available: https://arxiv.org/abs/
2406.14673
[11]
A. Heyman and J. Zylberberg, “Reasoning large language model errors
arise from hallucinating critical problem features,” arXiv:2505.12151,
May 2025. [Online]. Available: https://arxiv.org/abs/2505.12151
[12]
M. Omar et al., “Multi-model assurance analysis showing large language
models are highly vulnerable to adversarial hallucination attacks during
clinical decision support,” Commun. Med., 2025. DOI: https://doi.org/10.
1038/s43856-025-01021-3
[13]
H. Li, H. Chi, M. Liu, and W. Yang, “Look within, why LLMs hallucinate:
A causal perspective,” arXiv:2407.10153, Jul. 2024. [Online]. Available:
https://arxiv.org/abs/2407.10153
[14]
S. A. Mess, A. Mackey, and D. E. Yarowsky, “Artificial intelligence
scribe and large language model technology in healthcare documentation:
Advantages, limitations, and recommendations,” Plast. Reconstr. Surg.
Global Open, vol. 13, Jan. 2025. DOI: https://doi.org/10.1097/GOX.
0000000000006450
[15]
K. Lee, E. Kim, J. Choi, and B. Chang, “NOAH: Benchmarking narrative
prior driven hallucination and omission in video large language models,”
arXiv:2511.06475, Nov. 2025. [Online]. Available: https://arxiv.org/abs/
2511.06475