Critics: Monitoring Mechanisms for Learning and Adaptation in OODA-Based Agents

Dar Aystron Independent Researcher

Abstract

Adaptive agents must detect when their behavior, knowledge, or internal reasoning becomes inadequate. Traditional artificial intelligence systems often rely on external evaluation or offline retraining to address such deficiencies. Autonomous agents, however, require internal mechanisms capable of continuously monitoring their own operation.

This paper introduces critics as monitoring mechanisms within the OODA-based architecture of OODA Agency Theory (OAT). Critics observe the internal activity of an agent and generate diagnostic observations about anomalies, inconsistencies, and performance failures. These observations become part of the agent’s perception stream and influence subsequent orientation and decision processes.

Critics extend both the internal sensory field and the internal action repertoire of the agent while preserving a single OODA control loop. Through this mechanism, critics enable learning, adaptation, and structural evolution in advanced artificial agents.

1. Introduction

Agents operating in complex environments inevitably encounter situations where their existing knowledge or strategies are inadequate. Effective adaptation therefore requires mechanisms capable of detecting internal failures, inconsistencies, and unexpected outcomes.

OODA Agency Theory models artificial agents as systems organized around a continuous Observe-Orient-Decide-Act (OODA) cycle [1]. Within this architecture, learning and adaptation occur through the interaction of perception, internal models, and action.

However, adaptation requires the ability to recognize when something is wrong. Without such recognition, an agent may continue applying ineffective strategies indefinitely. The question is not whether monitoring is needed, but how it should be integrated into the agent’s architecture.

The prevailing approach in contemporary AI treats evaluation as an external process. Benchmarks, test suites, evaluation frameworks, and human review processes assess a system’s performance from outside, after the fact. This paper argues that autonomous agents require a fundamentally different approach: internal monitoring mechanisms that operate continuously within the agent’s own control loop.

This paper introduces critics as architectural components responsible for detecting anomalies in the operation of the agent itself. Critics monitor internal processes and generate observations describing detected problems. These observations enter the normal OODA cycle and influence the agent’s reasoning and decisions.

2. Critics as Internal Sensors

Critics are best understood as internal sensors.

External sensors detect properties of the environment. Internal sensors detect properties of the agent’s own activity and state. An agent’s observation stream combines both, and both enter the Observe stage of the OODA loop.

Internal sensors encompass a broad range of self-monitoring capabilities: awareness of resource constraints, time pressure, processing load, the state of current plans and goals, and many others. Critics are a specialized class of internal sensor, focused specifically on detecting anomalies, inconsistencies, and failures in the agent’s mental states and processes.

An agent’s internal activity produces a rich set of mental states: beliefs, goals, plans, predictions, intermediate reasoning, and the decisions that result from orientation. Critics may monitor any of these. A critic may observe a single OODA cycle - detecting, for example, a contradiction within the current orientation - or it may observe patterns across many episodes, detecting conditions such as a persistent inability to find adequate actions during the Decide phase, or a recurring failure of predictions over successive cycles.

This design preserves the principle that an agent operates through a single control loop. Critics do not introduce a separate monitoring process that runs alongside the agent’s reasoning. They extend the agent’s perceptual capabilities so that the agent can perceive its own internal states with the same architectural mechanisms it uses to perceive the external world.

3. Critics as Expert Monitors

Critics are not generic anomaly detectors. Each critic is a specialized expert associated with particular classes of failures or unusual conditions.

A critic encapsulates knowledge of:

the kinds of anomalies it monitors
how to detect them
possible corrective actions

3.1 Critic Types and Conflict Classification

Each critic is specialized for particular classes of anomalies. The following table illustrates representative critic types. This is not an exhaustive list; the kinds of critics an agent employs will depend on its domain, architecture, and the mental states it maintains.

Critic	Detects
Logical critic	contradictions between beliefs
Prediction critic	mismatch between predictions and outcomes
Planning critic	weak or inconsistent plans
Goal critic	conflicts between goals
Evidence critic	insufficient justification

The anomalies detected by critics may be classified along several dimensions. Three foundational conflict categories - incompleteness, contradiction, and novelty - capture distinct monitoring concerns:

Incompleteness arises when the agent’s knowledge lacks information needed for reasoning.
Contradiction arises when different knowledge elements support conflicting conclusions.
Novelty arises when incoming information cannot be assimilated into existing knowledge structures.

These categories are not exhaustive. An agent operating in a complex domain will encounter additional classes of anomaly that do not reduce to missing information, logical conflict, or unassimilable input. Operational concerns such as persistent plan failure, goal incoherence, resource exhaustion, and recurring inability to find adequate actions during the Decide phase represent monitoring dimensions that extend beyond the foundational three. A mature agent may develop specialized critics for any aspect of its mental life where systematic failure is possible.

The diagnostic propositions produced by critics are not uniform signals. They carry structure. A well-designed critic produces propositions that include:

conflict type - the specific kind of anomaly detected
parameters - the knowledge elements involved
category - the class of conflict (incompleteness, contradiction, novelty, or other)
possible causes - candidate explanations for why the conflict arose
recommendations - suggested corrective actions

The richness of this diagnostic determines how effectively the agent can respond. A critic that merely signals “error detected” provides less adaptive value than one that identifies the conflicting elements, classifies the nature of the conflict, and proposes candidate resolutions.

3.2 Diagnostic Observations

When critic observations are lifted into the propositional layer, they become structured diagnostic propositions using the representational language introduced in Paper 06 [8]. The following examples illustrate the lifted case.

A logical critic detecting a contradiction between two beliefs:

(:contradiction (:as .cr1 :in .S1)
  :critic :LogicalCritic
  :element_a .b3
  :element_b .b7)

A prediction critic detecting a mismatch between an expected and observed outcome:

(:prediction_error (:as .cr2 :in .S1)
  :critic :PredictionCritic
  :predicted .p1
  :observed .f4
  :magnitude :high)

A planning critic assessing the quality of a current plan:

(:plan_quality (:as .cr3 :in .S1)
  :critic :PlanningCritic
  :plan .pl1
  :assessment :low)

A critic observing a pattern across multiple episodes - for example, that the Decide phase has repeatedly failed to identify adequate actions:

(:decision_inadequacy (:as .cr4 :in .S1)
  :critic :PlanningCritic
  :phase :Decide
  :episodes [:set .S_n-3 .S_n-2 .S_n-1 .S1]
  :pattern :persistent_failure)

These propositions enter the subject stream and become inputs to subsequent OODA cycles.

The diagnostic propositions shown above represent the lifted case - critic observations that have entered the subject stream and become available for reasoning and deliberate response. Critics may also operate entirely at the implementation layer, detecting and correcting anomalies without their activity being lifted into explicit propositions. Such implementation-level critics correspond to subconscious monitoring and learning processes. The propositional examples in this paper focus on the lifted case, where critic observations require deliberate attention, goal formation, or planning.

4. Critics and the Action Space

Critics extend not only perception but also the internal action repertoire of the agent.

A critic may propose candidate corrective actions as propositions that enter the agent’s decision process. Some corrective actions are simple and atomic - revise a single belief, request a piece of information. Others are more complex: a critic observation may generate a corrective goal, and that goal may in turn require a multi-step plan to achieve.

For example, a simple corrective action:

(:corrective_action (:as .ca1 :in .S1)
  :source .cr1
  :action (:revise_belief
    :agent :Self
    :target .b3))

A corrective action that generates a goal requiring planning:

(:corrective_goal (:as .cg1 :in .S1)
  :source .cr2
  :goal (:resolve_prediction_failure
    :agent :Self
    :regarding .f4))

This goal may then lead, through normal Orient and Decide processing, to a multi-step plan:

(:step (:as .st1 :in .S2)
  :agent :Self
  :action (:request_information
    :agent :Self
    :about .f4))

(:step (:as .st2 :in .S2)
  :agent :Self
  :action (:revise_model
    :agent :Self
    :target .m1))

(:step (:as .st3 :in .S2)
  :agent :Self
  :action (:re_evaluate
    :agent :Self
    :situation .S1))

(:plan (:as .pl2 :in .S2)
  :agent :Self
  :goal .cg1
  :steps [:seq .st1 .st2 .st3])

The key architectural point is that corrective actions, corrective goals, and the plans generated to achieve them all flow through the same OODA machinery as any other goal or plan. A critic observation enters the subject stream; orientation interprets it; a corrective goal may be adopted during Decide; subsequent cycles generate and execute a plan to achieve that goal. The entire chain - from anomaly detection through goal adoption through multi-step corrective action - is interconnected through propositional references and processed by the single control loop.

For example, a prediction critic may detect persistent error in an ML-based prediction model used by the agent. The resulting corrective goal - improve prediction quality - may generate a plan that includes delegating model retraining to an outbound micro-expert. The micro-expert performs the retraining as a specialized task; the improved model feeds back into the agent’s subsequent OODA cycles. The entire chain - from anomaly detection through goal adoption through delegation to a specialized component through improved prediction - flows through the same architectural machinery.

These actions and goals are not executed automatically. They become options available to the Decide stage, where they may appear alongside externally directed action candidates. Critics therefore extend both the sensory space and the action space of the agent, enabling perception of internal anomalies and consideration of corrective actions, without introducing additional control loops.

5. Critics and Learning

Section 4 established that critic observations generate corrective actions, corrective goals, and multi-step plans - all flowing through the normal OODA machinery. The depth of the agent’s response depends on which level of knowledge the critic signal implicates. OAT distinguishes three levels:

Factographic knowledge describes concrete situations and their properties. A prediction critic detects that a specific fact is wrong, missing, or exceptional. The corrective response operates at this level: override an incorrect value, add a missing fact, correct a misrecorded observation, or record an exception. Robust reasoning methods routinely manage exceptions without revising the underlying models or rules.

General knowledge captures patterns, regularities, and abstractions across situations. When a critic detects not just a single incorrect fact but a systematic failure - a model consistently misclassifies a class of situations, an explanation pattern repeatedly fails to account for observations, or predictions diverge from outcomes across multiple episodes - the corrective goal must target the model or rule itself. In the micro-expert example from Section 4, retraining a prediction model is a general-level learning response: the model’s structure or parameters are revised to capture patterns the previous version could not. A corrective plan at this level might include gathering additional evidence, constructing an alternative model, evaluating it against the accumulated anomalies, and replacing the deficient model. Each step is a proposition in the plan; the plan executes across multiple OODA cycles through the machinery described in Section 4.

Categorical knowledge provides the fundamental representational vocabulary - the types, relations, and structural primitives used to describe situations. When critics detect persistent failures that resist resolution at both the factographic and general levels, the agent faces a deeper problem: its representational categories are inadequate. It lacks the vocabulary to describe the situations it encounters. No amount of parameter adjustment or model revision will help, because the representations themselves cannot capture the relevant distinctions. Learning at this level requires structural reorganization, which is addressed in Section 6.

6. Structural Adaptation

In some situations, incremental learning is insufficient. Incremental learning adjusts parameters or modifies rules within existing representational structures. Structural learning modifies the structures themselves.

Persistent anomalies may indicate that the agent lacks necessary cognitive structures. When such conditions are detected repeatedly, critics may lead the agent to consider structural changes such as:

introducing new representational categories
modifying planning strategies
reorganizing knowledge structures
expanding internal models

These deeper changes flow through the same OODA mechanism as all other decisions. The critic detects a pattern of unresolvable conflicts; the orientation phase interprets this pattern as evidence of structural inadequacy; the decision phase selects a structural modification from the available action space. No separate control mechanism is required.

The connection between conflict accumulation and structural change is not merely theoretical. Any system that operates long enough in a complex domain will encounter conflicts that cannot be resolved by modifying individual knowledge elements. Resolution requires changes at the representational level itself - introducing new categories, restructuring relationships between existing abstractions, or modifying the classification system. Structural adaptation is a practical necessity for systems operating under real-world conditions of incomplete and evolving knowledge.

7. Critics and External Evaluation

The concept of critics should be distinguished from the dominant approach to evaluation in contemporary AI systems.

7.1 External Evaluation

External evaluation assesses a system’s performance from outside. Benchmarks, test suites, evaluation frameworks, and human review processes measure outputs against predefined criteria. This approach treats evaluation as a phase in the system’s lifecycle - something that happens during development, after deployment, or at scheduled intervals.

External evaluation has several characteristic properties:

Decoupled from operation. The system being evaluated does not have access to the evaluation results during its normal operation. Evaluation and execution are separate processes.
Retrospective. Evaluation examines past outputs. By the time an evaluation identifies a deficiency, the system has already produced the deficient outputs.
Criteria-referenced. Performance is measured against fixed external standards. The system does not determine what constitutes adequate performance; the evaluator does.
Episodic. Evaluation occurs at discrete points rather than continuously. Between evaluations, the system operates without self-assessment.

These properties make external evaluation well-suited for quality assurance, compliance verification, and comparative benchmarking. They make it poorly suited for autonomous adaptation.

7.2 Internal Monitoring

Critics operate under a fundamentally different set of assumptions:

Coupled to operation. Critic observations enter the agent’s perception stream during operation. Monitoring and execution are part of the same process.
Prospective. Critics detect anomalies as they arise, enabling the agent to adjust before producing further deficient outputs.
Coherence-referenced. Critics monitor for internal coherence - contradictions, prediction errors, plan failures - rather than for conformance to external standards. The criteria emerge from the agent’s own knowledge and reasoning.
Continuous. Critics operate on every OODA cycle. Monitoring is not a phase but a permanent feature of the agent’s architecture, spanning observation, orientation, decision, and action.

7.3 Architectural Implications

The distinction between external evaluation and internal monitoring is architectural, not merely procedural.

External evaluation preserves a separation between the system and its evaluator. This separation is appropriate when the goal is oversight - when an external authority needs to verify that a system meets specified requirements. Constitutional AI [7], RLHF, and similar alignment approaches operate within this paradigm: the system is shaped by external evaluation signals, but the evaluation itself remains external to the system’s operational loop.

Critics eliminate this separation. The evaluator is part of the system. This integration is what makes critics a mechanism for autonomy rather than a mechanism for oversight. An agent with critics does not depend on external evaluation to detect that its beliefs are contradictory, its predictions are failing, or its plans are inadequate. It detects these conditions itself, as part of its own operational process.

This does not mean that external evaluation becomes unnecessary. External evaluation serves purposes that internal monitoring cannot: verifying that the agent’s goals are appropriate, that its critic mechanisms are functioning correctly, and that its behavior satisfies externally imposed constraints. The relationship between internal critics and external evaluation is complementary, not competitive. Critics enable autonomous adaptation; external evaluation provides societal accountability.

It is worth noting that contemporary agentic AI practice has begun to converge on a similar intuition. Online evaluation mechanisms - guardrails, output validators, trace monitors operating during agent execution - represent a practical recognition that evaluation cannot always be deferred to an offline phase. However, these mechanisms remain structurally thin: they typically function as pass/fail gates inserted between processing steps, lacking the diagnostic richness described in Section 3.1, the integration into the agent’s own perceptual stream, and the connection to multi-level adaptation described in Sections 5 and 6. Agentic AI practitioners have independently converged on the need for something like critics, but have only implemented the thinnest version.

However, the two approaches differ in what they assume about the locus of adaptive intelligence. External evaluation assumes that the intelligence required to assess performance resides outside the system. Critics assume that an agent must be capable of assessing its own performance - that self-monitoring is a constitutive feature of agency, not an optional add-on.

8. Intellectual Context

The concept of critics aligns with several influential traditions in cognitive science, cybernetics, and philosophy of science.

Cognitive Development

Developmental psychology emphasizes the role of internal conflict in learning. Jean Piaget described cognitive development as the interaction between assimilation and accommodation, where existing cognitive structures interpret experience until persistent mismatches force structural change [3].

Within the OAT framework, critics play a role analogous to mechanisms that detect such mismatches. When critics observe contradictions, prediction errors, or repeated failures, they generate observations that signal the inadequacy of current internal structures.

Cybernetic Regulation

Cybernetics provides a theoretical foundation for adaptive regulation. W. Ross Ashby introduced the concept of ultrastability, in which a system reorganizes its internal structure when normal regulatory mechanisms fail to maintain stability [2].

Critics in OAT provide a mechanism that detects the conditions requiring such reorganization. By monitoring prediction errors, contradictions, and performance anomalies, critics generate signals that may lead to adaptive changes in the agent’s internal organization. The progression from factographic correction through model revision to structural reorganization (Section 6) corresponds to Ashby’s distinction between regulatory adaptation within a given structure and the deeper reorganization that ultrastability demands.

Scientific Change

Thomas Kuhn described scientific progress as a process in which periods of normal science are periodically disrupted by anomalies that cannot be explained within the prevailing theoretical framework [5]. When such anomalies accumulate, a paradigm shift may occur.

Critics perform a comparable role within artificial agents. By detecting persistent inconsistencies and unexplained outcomes, they accumulate evidence that existing models or strategies are inadequate. The transition from incremental learning to structural adaptation (Section 6) parallels Kuhn’s account of the transition from normal science to paradigm change.

Lev Vygotsky emphasized the fundamentally social nature of cognitive development. Learning often occurs through interaction with others within the zone of proximal development, where critique and guidance enable individuals to refine reasoning and acquire new capabilities [4].

In multi-agent systems, critics may similarly operate across agents. One agent may observe and critique the reasoning of another, generating feedback that improves collective performance. This possibility is not developed further in the present paper but represents a natural extension of the critic concept.

Cognitive Architectures

Marvin Minsky described critics in The Society of Mind as agents that observe and evaluate the activity of other agents, triggering corrective mechanisms when problems are detected [6].

OAT adopts a similar concept but integrates critics into a fundamentally different architecture. In Minsky’s model, critics are agents within a heterarchical society, interacting with other agents through emergent negotiation. In OAT, critics are monitoring mechanisms within a single OODA-based control loop, where architectural coherence is maintained by the cycle itself rather than by negotiation among independent components. This distinction determines how conflicts between critic observations and ongoing reasoning are resolved: through a unified decision process rather than through competition among autonomous agents.

9. Implementation Considerations

Critics may be implemented using various mechanisms depending on the type of monitoring required and the resources available.

Rule-based critics apply predefined patterns to detect known classes of anomalies. These are appropriate for well-understood failure modes where the conditions triggering a conflict can be specified in advance. The conflict classification structures described in Section 3.1 provide a model for organizing such rules.

Statistical monitors detect anomalies through deviation from expected distributions. Prediction critics, for example, may track running statistics of prediction errors and generate observations when error rates exceed established thresholds.

Logical consistency checkers verify that the agent’s beliefs satisfy specified constraints. These correspond most directly to the contradiction-detection function of the logical critic.

Language-model-based evaluators represent a contemporary implementation path. Large language models can evaluate reasoning traces, detect inconsistencies in natural-language explanations, and propose candidate repairs expressed in the same representational medium as the agent’s reasoning. This capability makes LLMs particularly effective as critics in systems where reasoning is conducted through natural language.

The choice of implementation mechanism does not affect the architectural role of the critic. Regardless of whether a critic uses rules, statistics, logical inference, or language model evaluation, its function within the OODA loop remains the same: it generates diagnostic signals that influence the agent’s subsequent processing, whether at the implementation layer or - when lifted - as explicit propositions entering the subject stream.

10. Conclusion

Critics provide a foundational mechanism for learning and adaptation in OODA-based agents. By monitoring internal processes and generating observations about anomalies, critics allow agents to detect when their current knowledge, models, or strategies become inadequate.

Treating critics as internal sensors preserves a single coherent control loop while enabling rich self-monitoring capabilities. By expanding both the perceptual and action capabilities of the agent, critics support learning at multiple levels: correction of concrete facts at the factographic level, model revision at the general level, and structural reorganization at the categorical level.

The distinction between internal critics and external evaluation clarifies the architectural role that self-monitoring plays in autonomous agency. External evaluation assesses performance from outside, after the fact, against fixed criteria. Critics monitor from inside, continuously, for internal coherence. Both serve important purposes, but only internal monitoring provides the real-time self-awareness that autonomous adaptation demands. An agent that must wait for external evaluation to discover its own failures is not yet fully autonomous. An agent with critics has taken a step toward genuine self-regulation.

References

[1] J. R. Boyd. The Essence of Winning and Losing. Unpublished briefing slides and lectures, 1987–1996.

[2] W. R. Ashby. An Introduction to Cybernetics. Chapman & Hall, London, 1956.

[3] J. Piaget. The Origins of Intelligence in Children. International Universities Press, 1952.

[4] L. S. Vygotsky. Thought and Language. MIT Press, 1962.

[5] T. S. Kuhn. The Structure of Scientific Revolutions. University of Chicago Press, 1962.

[6] M. Minsky. The Society of Mind. Simon & Schuster, 1986.

[7] Y. Bai et al. Constitutional AI: Harmlessness from AI Feedback. arXiv preprint arXiv:2212.08073, 2022.

[8] D. Aystron. The Propositional Lift and the Subject Stream. OODA Agency Theory Working Papers, Paper 06, 2026.

Dar Aystron

OODA Agency Theory (OAT)