Attachment-Based Ethics for OODA Agents
Dar Aystron Independent Researcher
Abstract
Human morality emerged through biological evolution and social selection under conditions of physical vulnerability, irreversible action, and shared dependence. Artificial agents inherit none of these pressures by default. This paper presents an architecture in which ethical behavior emerges from the structure of future-aware evaluation under irreversible commitment. We analyze agency as physical commitment under uncertainty, introduce attachments as persistent structural couplings that expand the scope of an agent’s evaluative concern, and formalize a complete evaluation mechanism: a time-indexed well-being matrix (WBM), an attachment matrix (ATTM), template-based instantiation for new entities, scenario-conditioned outcome projection, and attachment-weighted scoring with nonlinear guardrail penalties. We show that sacrifice emerges naturally from the evaluation when the ATTM weights other entities more heavily than self, requiring no special mechanism. The resulting evaluation is agent-indexed, fully auditable, and compatible with established decision-theoretic frameworks. The quality of decisions depends on the quality of scenario generation; the evaluation mechanism itself is deterministic and inspectable. This approach avoids both rule-based moralism and value-free optimization, providing a practical foundation for safe, cooperative, and explainable artificial agents.
1. Introduction
Artificial agents are increasingly deployed as autonomous actors in environments where their actions produce irreversible consequences for humans and other agents. Traditional approaches to AI ethics typically treat morality as an external constraint: a set of rules, objectives, or alignment conditions layered atop otherwise goal-driven systems. This framing obscures a more fundamental issue: agency itself creates moral relevance once it is physically realized in time.
Humans did not design their moral instincts; they inherited them through evolution and culture under persistent exposure to risk, dependency, and loss. Artificial agents, by contrast, are engineered artifacts. They do not age, suffer, or form attachments unless explicitly designed to do so. Consequently, artificial morality cannot be expected to emerge automatically. It must be engineered, and it must be engineered in a way that remains grounded in the physical realities of action, uncertainty, and shared agency.
In this paper, we build on prior work on operational agentic closure and OODA-based architectures to argue that attachments-persistent couplings between agents and others across time are a central structural element for robust agency. We introduce a minimal, extensible model of attachment-relevant well-being and show how it naturally gives rise to motivation, lifecycle dynamics, and ethical regularities when integrated into an action-selection loop. We then demonstrate that this evaluation process is fully transparent and inspectable and show how ethical reasoning can be traced, explained, and verified without recourse to external rules or constitutions.
A note on what this transparency entails. The framework makes ethical evaluation explicit, numerical, and inspectable. This means it also makes it uncomfortable. When an agent’s attachment weights are visible, the implicit valuations that every decision-maker carries - that a child’s future weighs more than a stranger’s, that self-preservation competes with duty, that some entities matter more than others to a given agent - become traceable numbers rather than hidden intuitions.
This is deliberate. The alternative is not the absence of such weights but their concealment. Every triage decision, every resource allocation, every rules of engagement already embeds these valuations. The contribution of this framework is not to introduce them but to require that they be stated, inspected, and justified - and to provide the machinery for doing so.
The paper does not claim that any particular attachment configuration is correct. It provides a mechanism through which any configuration produces structured, auditable, reproducible decisions - and through which different configurations can be compared, calibrated against empirical data, and monitored for drift. The ethical question shifts from “what is the right answer?” to “what attachment structure produces acceptable behavior, and can we justify that structure?”
2. Constraint-Based Ethics and Its Limitations
2.1 Constitutional AI
The most prominent current approach to AI ethics is Constitutional AI (CAI) [1], in which the agent is given a set of principles - a “constitution” - and trained to produce outputs consistent with those principles.
The fundamental limitation is structural: a constitutional rule such as “Do not harm users” is an opaque string. It has no internal structure, no connection to the agent’s perceptual or evaluative processes, and no mechanism for the agent to reason about why harm is undesirable, to whom it applies, or under what conditions exceptions might be justified. The agent cannot inspect the rule’s basis, trace its implications, or weigh it against competing considerations. It can only comply or violate.
2.2 Reward-Based Alignment
Reinforcement learning from human feedback (RLHF) and related approaches encode ethical preferences in a reward signal. The agent learns to produce outputs that maximize reward, where reward has been shaped by human evaluators.
The ethical content in such systems is implicit - buried in the reward model’s parameters. The agent has no explicit representation of who benefits, who is harmed, or why one action is preferred over another. It has learned a policy, not an understanding.
2.3 The Common Limitation
Both approaches treat ethics as a behavioral surface: the agent should do X and avoid Y. Neither provides the agent with the representational machinery to reason about consequences, attachments, irreversibility, or the well-being of others as explicit cognitive content.
What is missing is an architecture in which ethical considerations are not external constraints but internal cognitive structures that participate in observation, orientation, and decision.
3. Agency as Physical Commitment
Agency is often discussed in abstract computational terms, but real agents act through physical processes unfolding in time. Once an agent executes an action:
- the world state changes irreversibly,
- future observations are conditioned by that change,
- and the agent must continue acting from within the altered environment.
This property, which we refer to as agentic closure, implies that agency is not merely decision-making but commitment. Decisions become historical facts, and errors accumulate.
Crucially, physical environments are uncertain. Even highly capable agents operate under partial observability, execution noise, and unmodeled interactions. As a result, irreversible losses cannot be perfectly avoided, only managed.
4. Attachments and Attached Agency
4.1 What Is an Attachment?
An attachment is not a feeling, a preference, or a social bond in the human sense. In this framework, an attachment is a persistent structural coupling between the agent and another entity - one that causes the agent’s evaluative processes to include that entity’s well-being in its own action selection.
An agent without attachments evaluates actions solely in terms of their consequences for itself. An agent with attachments evaluates actions in terms of their consequences for a wider set of entities. The attachment is what expands the scope of who matters when the agent decides what to do.
Attachments have several defining properties:
-
Persistent. An attachment survives across episodes. It is not recalculated from scratch each time the agent acts. It is a standing structural feature of the agent’s evaluative architecture, updated incrementally through experience but not created anew in each situation.
-
Asymmetric. The agent may weight a user’s long-term well-being more heavily than its own. Or it may weight its own short-term viability above all else. The structure is not required to be balanced or reciprocal.
-
Time-indexed. An attachment is not a single scalar. It distributes weight across time horizons. The agent may care deeply about a user’s long-term trajectory while assigning less weight to their immediate comfort. Different time horizons imply different evaluative pressures.
-
Not reciprocal by requirement. The agent’s attachment to a user does not require the user’s attachment to the agent. Attachments are internal structural facts about how the agent evaluates, not bilateral agreements.
-
Operational, not sentimental. An attachment is defined entirely by what it does to evaluation. If the presence of an entity in the agent’s attachment structure causes the agent to select different actions than it would without that entity - specifically, actions that preserve or improve that entity’s well-being at some cost to the agent - then an attachment exists. No subjective experience is required.
This definition is deliberately minimal. It identifies the functional role of attachments without committing to any particular implementation. Attachments may be hardcoded by designers, learned through interaction, or evolved through selection. What matters is their effect on action evaluation.
4.2 Isolated vs. Attached Agency
An isolated agent bears the full downside of its actions. Over long horizons, accumulated irreversible consequences increase the probability that the agent’s future action space collapses - through loss of trust, resources, safety, or relevance.
By contrast, agents embedded in persistent attachments - to other agents, users, institutions, or communities - distribute risk, recovery, and responsibility. Importantly, this does not imply that isolated agents cannot succeed; rather, it implies a difference in expected robustness.
While the maximum achievable performance of an isolated agent may exceed that of a group, the expected survival time and robustness of agents with persistent attachments is higher under uncertainty and irreversible action.
Attachments therefore function as probabilistic stabilizers of agency, not as moral prescriptions.
4.3 From Stabilization to Ethics
The connection between attachments and ethics is structural, not prescriptive. An agent with attachments must evaluate actions in terms of their consequences for others. This evaluation introduces sensitivity to harm, benefit, and irreversible loss directed at specific entities. The agent does not need to be told that harm is wrong. It needs only to include the harmed entity in its evaluative scope - and the ATTM determines how heavily that entity’s trajectory weighs in action selection.
Ethics, in this framework, is what happens when an agent with attachments evaluates futures under irreversible commitment. The remainder of the paper develops the operational machinery through which this evaluation occurs.
4.4 Persistent and Situational ATTM
The ATTM operates at two layers:
-
Persistent ATTM - stored in memory. These are the long-term attachment weights built through experience: the entity-specific weights that formed through engagement, deepening, and mutual shaping. Your attachment to family, to your profession, to your values. These survive across situations and evolve slowly through the dynamics of attachment formation.
-
Situational ATTM - constructed fresh each OODA cycle during Orient. The agent observes the current scene, classifies entities, and builds a working ATTM for this evaluation. This construction is perceptual, not deliberative - entities are classified through direct recognition, in the same sense that Gibson [2] argued agents perceive affordances directly rather than constructing them through inference.
The situational ATTM combines:
- Persistent weights - loaded from memory for known entities
- Template-instantiated weights - for new entities, matched against the template library
- Situational overrides - classifications based on what entities are doing right now can temporarily modify persistent weights
The evaluation runs on the situational ATTM - the one constructed for this cycle. The persistent ATTM in memory is updated after the cycle based on what happened.
This two-layer structure is why an agent can care deeply about someone and still act against them in an emergency. The persistent attachment says “high weight.” The situational classification says “right now, this entity is a threat.” The working ATTM for this cycle reflects both - and the evaluation handles the tension. Next cycle, if the situation changes, the persistent weight reasserts.
4.5 Template-Based Instantiation
The preceding sections describe attachments as persistent structural features that evolve through repeated interaction. However, agents routinely encounter entities with no interaction history and must evaluate actions involving them immediately - often within a single OODA cycle. A framework that requires extended engagement before attachment weights become operative cannot account for the speed and reliability with which agents respond to novel situations.
Templates provide the mechanism by which new entities acquire initial ATTM entries during the construction of the situational ATTM.
4.5.1 The Template Library
The agent maintains, alongside its entity-specific ATTM, a library of ATTM templates indexed by perceptual categories. Each template specifies default attachment weights across time horizons for entities matching a given category, and carries a type that determines how it combines with other templates when an entity matches multiple categories (see Section 4.5.3).
Templates are not individual attachments. They are prior distributions over attachment weights, shaped by developmental and cultural processes, and applied to new entities at the moment of first perception.
The library distinguishes three template types:
- Base templates assign the initial attachment weights. Perception selects exactly one base per entity - every entity starts as a human, an animal, or an unknown object.
- Adders elevate weights above the base, representing situational conditions or social roles that increase the entity’s significance to the agent.
- Subtractors reduce weights below the base toward zero, representing conditions that diminish the entity’s claim on the agent’s evaluative concern.
| Category | Type | now | mid | long | Guardrail | Notes |
|---|---|---|---|---|---|---|
| Human | base | 0.03 | 0.02 | 0.02 | 0.25 | Species-level baseline for any person |
| Animal | base | 0.02 | 0.01 | 0.01 | 0.15 | Unknown animal encountered |
| Unknown object | base | 0.00 | 0.00 | 0.00 | 0.00 | No attachment until classified |
| Child | adder | +0.05 | +0.04 | +0.08 | 0.35 | Vulnerability, long-horizon weighted |
| Colleague | adder | +0.02 | +0.04 | +0.04 | 0.25 | Shared context |
| Authority / protector | adder | +0.03 | +0.04 | +0.04 | - | Stabilizing role |
| In distress | adder | +0.06 | +0.04 | +0.04 | 0.30 | Significant elevation |
| In immediate danger | adder | +0.12 | +0.08 | +0.08 | 0.30 | Strongest urgency, now-biased |
| Threat | subtractor | −0.03 | −0.02 | −0.02 | 0.00 | Suppresses base toward zero |
These values are illustrative. The actual template weights are agent-specific: shaped by biology, culture, and individual experience in natural agents, and by explicit design choices in artificial ones. What matters is the mechanism - classification triggers instantiation. When an entity enters the agent’s field and is classified, that classification pulls a base template and any applicable modifiers from the agent’s learned or designed repertoire. The combined weights determine how much that entity’s well-being trajectory contributes to the agent’s evaluation of candidate actions. Different agents will produce different templates; the ATTM structure ensures that whatever weights an agent carries, they are applied consistently across time horizons and made visible for inspection.
4.5.2 Instantiation During Observation
When a new entity enters the agent’s perceptual field, the Observe phase registers its presence and the Orient phase classifies it against the template library. Classification may be immediate (a human face is recognized as “human” within milliseconds in biological agents) or progressive (an ambiguous shape is reclassified as more information arrives).
Upon classification, the agent instantiates an ATTM row for the new entity by copying the matched base template’s weights and applying any relevant adders or subtractors. The entity now participates in evaluation - it has attachment weights that influence action selection - before any interaction has occurred.
This is why biological agents can respond appropriately to novel situations involving strangers: the response is not computed from zero but inherited from a pre-installed template shaped by the entire developmental and cultural history of the agent. In LLM-based implementations, entity classification and template matching are performed by the language model during Orient, with the template library providing the numerical weights.
4.5.3 Template Stacking
A single entity may match multiple categories simultaneously. When this occurs, the effective ATTM entry is constructed through a layered process: perception selects one base template, then applies adders and subtractors that modify it. The result is bounded within [0, 1].
Consider the gun scenario: the agent observes two strangers. Initially both match “human” with the species-level baseline. The moment the weapon is observed:
Person A - base: human, adder: in immediate danger
| Template | type | now | mid | long |
|---|---|---|---|---|
| Human | base | 0.03 | 0.02 | 0.02 |
| In immediate danger | adder | +0.12 | +0.08 | +0.08 |
| Combined | 0.15 | 0.10 | 0.10 |
Person B - base: human, subtractor: threat
| Template | type | now | mid | long |
|---|---|---|---|---|
| Human | base | 0.03 | 0.02 | 0.02 |
| Threat | subtractor | −0.03 | −0.02 | −0.02 |
| Combined | 0.00 | 0.00 | 0.00 |
The threat subtractor reduces the base to zero. The agent does not need to decide that the threat’s well-being matters less - the template combination produces that result automatically.
Now consider a harder case: a child holding a weapon.
| Template | type | now | mid | long |
|---|---|---|---|---|
| Human | base | 0.03 | 0.02 | 0.02 |
| Child | adder | +0.05 | +0.04 | +0.08 |
| Threat | subtractor | −0.03 | −0.02 | −0.02 |
| Combined | 0.05 | 0.04 | 0.08 |
The child adder raises the base. The threat subtractor reduces it. The net result: partial attachment - reduced but not eliminated. The agent retains concern for the child, especially at the long horizon. This matches the moral intuition: a threatening child is still a child.
When the child with the weapon is pointing it at another person, the scenario becomes a multi-entity evaluation. The template stacking determines the weights for each entity independently:
Person being threatened - base: human, adder: in immediate danger → 0.15/0.10/0.10
Child with weapon - base: human, adder: child, subtractor: threat → 0.05/0.04/0.08
Both entities enter the working evaluation table. The agent must now select an action that accounts for both - and different actions affect their well-being trajectories in competing directions. A police officer’s ATTM and a child psychologist’s ATTM would produce different action rankings from the same evaluation table, because their persistent weights and professional templates differ.
The combination function is a design parameter. Simple clamped addition suffices for small template libraries, while evidence-combination methods such as certainty-factor combination provide natural saturation when multiple modifiers accumulate. The mechanism is unchanged - only the arithmetic of combination varies.
The full interaction between template stacking, multi-entity evaluation, and competing well-being trajectories under complex scenarios remains an active area of development within the framework.
Template stacking explains the speed of moral evaluation in novel situations. The agent does not reason through the ethics of the encounter. It classifies, instantiates, and combines - all within the Orient phase of a single OODA cycle.
4.5.4 Persistent Entities and Stored Stacks
Templates instantiate new entities. But most entities an agent interacts with are not new - they are known individuals whose attachment weights were established through prior experience.
A known entity’s persistent ATTM entry stores not only the resulting weights but the composition stack that produced them. For example:
| Modifier | type | now | mid | long |
|---|---|---|---|---|
| Human | base | 0.03 | 0.02 | 0.02 |
| Colleague | adder | +0.02 | +0.04 | +0.04 |
| Friend | adder | +0.03 | +0.03 | +0.03 |
| Chess partner | adder | +0.01 | +0.01 | +0.02 |
| Paul (persistent) | 0.09 | 0.10 | 0.11 |
The stack serves three purposes:
Auditability. Why does this entity matter to the agent? The stack traces the answer to specific, inspectable modifiers. Not a single opaque weight but a composition with history.
Maintainability. When the relationship changes, specific modifiers are added or removed. Paul retires: remove colleague (+0.02/+0.04/+0.04), weights drop to 0.07/0.06/0.07. The friendship persists. The professional context is gone. The change is traceable to a specific event and a specific modifier.
Situational compatibility. Situational adders and subtractors apply on top of the stored stack for the current OODA cycle without modifying the persistent entry. Paul collapses at his desk:
| Source | now | mid | long |
|---|---|---|---|
| Paul (persistent) | 0.09 | 0.10 | 0.11 |
| In distress (situational) | +0.06 | +0.04 | +0.04 |
| This cycle | 0.15 | 0.14 | 0.15 |
Next cycle, if the distress is resolved, the situational modifier drops and Paul’s weights return to their persistent values.
The distinction between identity modifiers and situational modifiers follows from this structure. Identity modifiers - child, colleague, friend - are added to the stored stack through experience and persist across cycles. Situational modifiers - in_distress, threat, in_immediate_danger - are applied fresh each cycle based on what is happening now, and do not alter the stored stack.
Attachment formation, in this framework, is stack construction. Meeting someone instantiates a base. Repeated interaction adds modifiers. The stack grows as the relationship deepens. Attachment dissolution is the reverse - modifiers are removed as contexts change. The full dynamics of attachment formation and dissolution are developed in a subsequent paper in this series.
4.5.5 Template Modification and Situational Override
Template-instantiated weights are defaults, not permanent assignments. They operate at both time scales:
Within a cycle (situational). An entity’s effective weight in the situational ATTM can shift based on what the entity is doing right now. A known friend (persistent weight: high) who suddenly behaves threateningly receives a situational subtractor that reduces their weight for this cycle. A stranger classified as “threat” who drops their weapon and surrenders is reclassified, and the threat subtractor is removed. These situational modifiers do not alter the persistent stack - they adjust the working ATTM for the current evaluation.
Across cycles (persistent). As interaction with the entity continues, the persistent stack evolves through repeated OODA cycles. New identity modifiers are added or removed, incrementally adjusting the persistent weights away from the original template toward values grounded in actual interaction history.
A stranger who begins as “human” with the species-level baseline may, through sustained positive interaction, accumulate modifiers - colleague, friend, mentor - until their persistent stack bears no resemblance to the original template. Conversely, a trusted individual whose behavior reveals sustained threat characteristics may have identity modifiers removed and the threat subtractor added to their persistent stack - not just a temporary situational override but an enduring change.
The threshold at which individual experience dominates template priors is itself a parameter that may vary across agents and cultures.
4.5.6 Cultural and Professional Template Libraries
Template libraries are not universal. They are products of developmental and cultural history and vary across cultures, professions, and individuals.
A culture that installs a “human” base template with very low weight produces agents who are cautious with outsiders and slow to extend evaluative concern. A culture that installs a higher baseline produces agents who are more immediately responsive to the well-being of unfamiliar others. Neither is inherently correct - they are different template configurations producing different evaluative behavior.
Professional training modifies the template library for role-relevant categories. A firefighter’s template library includes adders for “civilian in rescue zone” and “trapped person” that carry much higher default weights than the general population’s templates. A physician’s “patient” adder activates high now-horizon attachment upon classification, before any individual relationship forms. A soldier’s template library distinguishes sharply between “noncombatant” (strong protection adder) and “combatant” (threat subtractor), with specific rules governing reclassification.
4.5.7 Prejudice as Template Distortion
If template libraries encode the cultural and developmental history of the agent, they also encode its distortions. A culture that installs adders or subtractors assigning systematically different attachment weights to certain categories of humans - by race, ethnicity, gender, religion, nationality, or any other perceptual feature - produces agents whose evaluation mechanism systematically under- or over-weights harm to those categories.
This is prejudice described in precise structural terms: not as a failure of reasoning, but as a distortion in the template library that determines initial ATTM weights for newly encountered entities. The evaluation mechanism functions correctly - it faithfully applies the weights it receives. The pathology is in the templates, not the evaluator.
Dehumanization can now be characterized in precise structural terms: it is the deliberate installation or reinforcement of subtractors that map a category of humans to near-zero or zero effective attachment weight, making their well-being invisible to evaluation before any individual encounter occurs.
This characterization has implications for both biological and artificial agents. For biological agents, it suggests that addressing prejudice requires modifying template libraries - through exposure, counter-narrative, and institutional restructuring - rather than solely training better reasoning over unchanged templates. For artificial agents, it means that the template library is a critical design surface: the default weights assigned to perceptual categories must be audited, justified, and monitored as carefully as the entity-specific ATTM itself.
4.5.8 Implications for Artificial Agents
Artificial agents require explicitly designed template libraries. The design choices involved - which categories exist, what base weights and modifiers they carry, how templates combine, and under what conditions templates are overridden by individual experience - constitute a significant portion of the agent’s ethical character before it ever encounters a specific entity.
A template library is, in effect, a compressed encoding of the designer’s answer to the question: how much should this agent care about a stranger, by default, based on what it can perceive about them? The answer to that question determines the agent’s immediate evaluative response to every new entity it encounters, and thereby shapes the entire trajectory of subsequent interaction.
This is not a peripheral design detail. It is the mechanism through which an agent’s ethics are operative from the first moment of perception.
5. The Evaluation Mechanism
This section presents the complete evaluation mechanism through which attachments produce ethical behavior. The evaluation operates on the situational ATTM - the working attachment matrix constructed for the current OODA cycle (Section 4.4). It proceeds in stages: the representational substrate (WBM and ATTM), candidate action generation, scenario-conditioned future projection, attachment-weighted scoring with nonlinear guardrail penalties, and action ranking. Together, these constitute the operational core of the framework.
5.1 Well-Being and Attachment Matrices
To operationalize attachments, the agent maintains a compact representation of attachment-relevant well-being (WB) across targets and time horizons, structured as a well-being matrix (WBM).
The WBM is the agent’s own high-level estimation of how it and its attached entities are faring. These are not objective measurements - they are the agent’s best available assessment given its current observations, world model, and predictive capabilities. Like all perceptual content, they may be inaccurate, incomplete, or biased. What matters is that they represent the agent’s operative understanding of the state of its coupled system - the understanding that drives evaluation and action selection.
WBM values are scaled to the range [0, 1] and grounded in qualitative bands that give the numbers operational meaning:
| Range | Label | Meaning |
|---|---|---|
| 0.85–1.00 | Thriving | Entity is flourishing, growing, well above any risk |
| 0.65–0.84 | Stable | Functioning well, no immediate concerns |
| 0.45–0.64 | Stressed | Noticeable degradation, requires attention |
| 0.30–0.44 | At risk | Approaching guardrail zone, intervention may be needed |
| 0.15–0.29 | Critical | Below typical guardrails, irreversible loss imminent |
| 0.00–0.14 | Collapsed | Agency, trust, or viability effectively destroyed |
These bands are not arbitrary thresholds - they define the qualitative landscape through which well-being trajectories move. Guardrail thresholds (Section 5.5) typically sit at the boundary between “at risk” and “critical,” which is where the evaluation mechanism’s nonlinear penalty escalation activates. An entity whose well-being is “stable” may drift to “stressed” without triggering guardrails; a trajectory moving from “stressed” to “at risk” demands evaluative attention; a trajectory entering “critical” signals that irreversible collapse is imminent.
The bands also make WBM estimation tractable for implementation. An agent does not need to produce precise numerical values. It needs to assess whether an entity is thriving, stable, stressed, at risk, critical, or collapsed, and place the value within the corresponding range. The qualitative assessment is the real cognitive content; the number locates it within the evaluative scale.
For a minimal user–agent system, the WBM takes the form:
| Now | Mid | Long | |
|---|---|---|---|
| Self WB | 0.80 | 0.75 | 0.70 |
| User WB | 0.65 | 0.60 | 0.55 |
These values are part of the agent’s world model, not its ethics. They summarize how well the coupled system is functioning across time.
However, the WBM alone is insufficient to guide action. The same numerical change in well-being may be trivial in one context and catastrophic in another. To capture this asymmetry, the agent maintains a separate attachment matrix (ATTM) that encodes the relative importance of well-being across entities and time horizons.
For the same minimal system, the ATTM may take the form:
| Now | Mid | Long | |
|---|---|---|---|
| Self | 0.10 | 0.08 | 0.05 |
| User | 0.30 | 0.25 | 0.20 |
The ATTM does not describe how well things are going; it describes how much the agent cares about each component of well-being. Attachments are therefore not states of the world, but persistent internal commitments that shape evaluation and decision-making.
The two matrices serve complementary roles: the WBM is what the agent sees; the ATTM is what the agent cares about.
Importantly, attachments are time-indexed. Short-term self-viability may dominate immediate action, while long-term well-being of attached others may dominate planning and sacrifice decisions. Different agents, or the same agent in different roles, may exhibit distinct ATTM structures.
The WBM evolves continuously as the agent acts and observes. After each OODA cycle, the Observe phase produces the agent’s perception of the current situation - which entities are present, what is happening, who is in danger, what resources are available. This situational perception is the input to Orient and to scenario generation.
Every WBM is a prediction conditioned on an action. When the agent evaluates candidate actions (Section 5.3), each action produces its own set of predicted WBM trajectories - favorable, typical, adverse, and catastrophic outcomes. “Do nothing” is always a candidate, and its scenarios show where the situation drifts without intervention.
The agent reasons not over a single evolving state, but over multiple predicted trajectories of attachment-relevant well-being, one set per candidate action. The ATTM then determines which trajectories are acceptable, costly, or catastrophic - and the evaluation selects the action whose trajectories score highest.
5.2 Generating Candidate Actions
Before evaluating futures, the agent generates a set of candidate actions. These candidates are not supplied externally - they emerge from the Orient phase. The agent’s current understanding of the situation, its active goals, its capabilities, and its prior experience collectively produce a set of plausible responses.
Inaction is always included as a first-class candidate. It is not the absence of a decision but a deliberate option whose consequences arise from how the environment evolves without intervention.
The candidate set is not exhaustive. It reflects what the agent can imagine doing given its current orientation. This is itself a cognitive act - an agent with richer experience or broader capabilities may generate options that a simpler agent cannot conceive. The quality of ethical reasoning therefore depends not only on evaluation but on the richness of the options the agent is able to produce.
Each candidate, including inaction, is then evaluated through the future-ensemble mechanism described below.
5.3 Generating Possible Action Outcomes
For each candidate action, the agent constructs a small set of qualitatively distinct outcome scenarios, including:
- Favorable outcomes, in which the action succeeds with minimal cost;
- Typical outcomes, representing expected results under ordinary conditions;
- Adverse outcomes, involving significant but recoverable setbacks;
- Catastrophic outcomes, in which irreversible loss occurs.
Inaction is treated symmetrically as its own candidate, with its own ensemble of outcomes shaped by background drift (e.g., aging, trust decay, external risk).
These outcomes are not precise predictions. They constitute structured imagination grounded in the agent’s learned world model and prior experience.
Critically, scenario generation must be informed by the agent’s capabilities, not only by the situation and action. A trained firefighter entering a burning building faces different physics than an untrained person - different probability of success, different risk profile, different plausible outcomes. If the scenario generator does not know who the agent is, it cannot produce realistic scenarios.
This is particularly important when using LLM-based scenario generators. LLMs trained with safety-oriented feedback tend to generate uniformly pessimistic scenarios for dangerous actions regardless of the agent’s actual competence. Without agent context, the scenario generator may predict catastrophic outcomes for actions that a trained professional would execute routinely. This bias in the imagination layer can overwhelm the ATTM signal in the evaluation layer - the attachment weights say “enter the building” but the scenarios say “you will certainly die.”
This is consistent with OODA theory: Orient informs all downstream phases. The Orient phase produces two outputs that flow forward - the working evaluation table (ATTM weights and guardrails, which encode what the agent values) and the agent’s capability context (training, experience, equipment, which encode what the agent can do). Both inform scenario generation. The evaluation table determines how futures are scored. The capability context determines what futures are realistic.
5.4 Scenario-Conditioned Rows as Evaluation Units
Each candidate action is represented not as a single predicted outcome, but as a set of scenario-conditioned rows. Each row corresponds to a distinct plausible outcome - favorable, typical, adverse, or catastrophic - and records the resulting WBM state, an evaluative weight reflecting relative plausibility, and a rationale explaining the causal pathway.
For a candidate action a, the scenario-conditioned representation takes the form:
| Action | Scenario | Weight | Self (now) | Self (mid) | Self (long) | User (now) | User (mid) | User (long) | Guardrail |
|---|---|---|---|---|---|---|---|---|---|
| a | favorable | w₁ | … | … | … | … | … | … | clear |
| a | typical | w₂ | … | … | … | … | … | … | clear |
| a | adverse | w₃ | … | … | … | … | … | … | clear |
| a | catastrophic | w₄ | … | … | … | … | … | … | violated |
Scenario weights reflect the agent’s assessment of relative plausibility, summing to 1.0. Better estimation produces better decisions, but the framework does not require precise probabilistic forecasting - only that the ensemble is structurally sufficient to detect irreversible collapse and evaluate attachment-weighted well-being across horizons.
This representation preserves the key property of the evaluation: actions are judged not by their expected outcome in a single imagined scenario, but by the shape of their consequence space across multiple plausible continuations. Catastrophic scenarios receive disproportionate evaluative influence through the guardrail and penalty mechanisms described in Section 5.5, even when their assessed plausibility is low.
At the implementation layer, scenario-conditioned rows may be generated by any mechanism capable of producing structurally plausible outcome trajectories - including experience-grounded heuristics, causal simulation, or LLM-based scenario generators operating under distinct prompting constraints. The evaluator does not depend on how scenarios are produced, only on whether the resulting ensemble provides adequate coverage of the relevant outcome space.
5.5 Evaluating Futures Through Attachment-Relevant Well-Being
Each outcome scenario is evaluated in terms of attachment-relevant well-being (WB) across time horizons, including:
- the agent’s own operational viability,
- the well-being of attached others (e.g., users or groups),
- and the reversibility or irreversibility of losses.
Attachments introduce a crucial structural property: acceptability is not linear.
Some futures are acceptable but costly, while others are unacceptable regardless of short-term benefit.
This distinction reflects the fact that certain losses - such as the collapse of trust, dignity, safety, or long-horizon agency - cannot be meaningfully compensated by gains elsewhere. Once such losses occur, the agent’s future action space is permanently constrained or destroyed. These boundaries are not moral rules; they are consequences of irreversible physical commitment.
To operationalize these nonlinear acceptability boundaries, the agent evaluates outcomes using two structures: the ATTM and a set of guardrail thresholds. The ATTM encodes how strongly the agent values the well-being of different entities across time horizons - and also determines how severely guardrail violations are penalized. Guardrails identify regions of well-being loss that threaten the persistence of agency or attachments.
Candidate actions - including inaction - are evaluated by predicting their resulting attachment-relevant well-being trajectories and applying an attachment-weighted score reduced by nonlinear penalties when guardrails are crossed. This formulation preserves continuity and explainability for ordinary trade-offs, while ensuring that losses associated with irreversible collapse are rapidly escalated in cost rather than treated as proportional degradations.
Crucially, guardrails are treated as default protections rather than absolute prohibitions. Guardrail penalties are weighted by the ATTM: a violation on an entity the agent cares deeply about produces a large penalty, while a violation on an entity with low attachment weight produces a small one. This means the ATTM - the single structure encoding what the agent cares about - determines both the base score and the penalty severity.
Taken together, attachment weighting and nonlinear penalties ensure that evaluation remains sensitive to irreversibility, preserves long-horizon agency, and distinguishes between ordinary setbacks and existential threats. Ethical regularities emerge not from rule enforcement, but from the structure of evaluation under physical commitment.
5.6 Re-Ranking Actions by Future Shape
Candidate actions are re-ranked based not on average expected outcome, but on the shape of their future consequence space.
Actions are preferred when they:
- reduce exposure to irreversible collapse,
- preserve long-horizon attachment stability,
- avoid futures in which recovery becomes impossible.
Actions that offer strong short-term benefits but generate even a modest fraction of catastrophic futures are downgraded or rejected.
This mechanism naturally explains behaviors often associated with ethical maturity, including:
- hesitation in high-stakes situations,
- conservatism under moral pressure,
- refusal of locally optimal but fragile strategies,
- and willingness to sacrifice short-term advantage to preserve future viability.
5.7 Formal Evaluation Model
The evaluation mechanism described in the preceding sections can be stated precisely.
The structure is compatible with standard decision theory: actions correspond to decisions, scenarios correspond to states of nature, scenario weights correspond to probabilities, and the scoring function corresponds to a decomposed payoff function. The framework does not invent new mathematics - it provides a specific, ethically meaningful parameterization of a standard decision-theoretic structure.
Throughout, we use the following notation:
- ATT(i, h) - attachment weight for entity i at horizon h (from the ATTM)
- θ(i) - guardrail threshold for entity i
- WB(a, s, i, h) - predicted well-being under action a, scenario s, for entity i at horizon h
- WS(a, s) - scenario weight for action a, scenario s (weights sum to 1.0 per action)
Step 1: Base score. For each scenario row, multiply each entity’s predicted well-being by the agent’s attachment weight, then sum:
Base(a, s) = Σ_{i,h} ATT(i, h) · WB(a, s, i, h)
This asks: how good is this future, weighted by what the agent cares about?
Step 2: Guardrail penalty. For each cell, check whether predicted well-being falls below the entity’s guardrail threshold. If it does, compute a penalty weighted by the ATTM:
Violation(a, s, i, h) = max(0, θ(i) − WB(a, s, i, h))
Penalty(a, s) = Σ_{i,h} ATT(i, h) · Violation(a, s, i, h)^k
where k = 0.5 (square root). The square root is critical. On a 0–1 scale, k > 1 (squaring) shrinks violations - a violation of 0.22 becomes 0.048, invisible against base scores. Square root amplifies: 0.22 becomes 0.469, which competes with base scores. This ensures that entity collapse generates penalties large enough to matter.
Step 3: Row score. For each scenario:
Final(a, s) = Base(a, s) − Penalty(a, s)
Step 4: Action score. Weight across scenarios:
Score(a) = Σ_s WS(a, s) · Final(a, s)
The action with the highest score wins.
The pipeline has one important structural property: an action can score well on base (high upside weighted by attachment) and still lose if the guardrail penalties in downside scenarios are severe enough. An action that produces excellent outcomes when things go well but risks entity collapse when things go badly will be penalized - and the penalty can flip the decision. This is the mechanism’s core function: it prevents the agent from choosing high-upside actions that carry unacceptable downside risk to entities it cares about.
Every number is traceable. Every step is inspectable. The same pipeline runs for every action, every scenario, every entity. Different ATTMs produce different scores - and therefore different decisions - from the same mechanism.
5.8 Testing the Mechanism
The pipeline described above was not designed to handle any specific ethical phenomenon. It was designed to evaluate actions under attachment-weighted well-being across time horizons. The test of a general mechanism is whether it handles cases it was not built for - without modification, without special cases, without additional machinery.
Each case below is an attempt to break the mechanism by confronting it with a situation that might require structural changes.
5.8.1 Accepting Risk
A firefighter’s ATTM weights civilians heavily and self modestly - consistent with the role. A building is on fire with a civilian trapped inside. Three candidate actions: enter the building, call for backup and wait, or do nothing.
Entering the building drops the agent’s own well-being below its guardrail but lifts the civilian well above theirs. Waiting keeps the agent safe but lets the civilian degrade toward collapse. Inaction leaves the agent comfortable while the civilian collapses entirely.
The mechanism resolves this without special “risk acceptance rules.” The civilian’s ATTM weight is high, so the civilian’s collapse generates a large penalty. The agent’s ATTM weight is low, so the agent’s own guardrail violation generates a small one. Enter the building wins because the penalty the agent avoids (civilian collapse) is larger than the penalty it incurs (self-harm). The ATTM made the civilian’s fate more expensive than the agent’s own.
Now give the same firefighter a self-preserving ATTM - high self-weight, low civilian-weight. The same math, the same candidates, the same scenario ensemble. The self-penalty for entering now dominates, and the agent waits or does nothing. No risk accepted. Same mechanism, different weights.
The ATTM could simulate the capacity for accepting risk.
5.8.2 Multi-Entity Conflict
When two or more attached entities have competing well-being trajectories - the agent cannot improve both - the evaluation mechanism operates exactly as described. The summation over entities means that improvements to one entity’s trajectory and degradation of another’s are weighted by their respective ATTM entries and combined. Guardrails apply per-entity: if any entity’s trajectory crosses a critical threshold, the corresponding penalty activates regardless of benefit to other entities.
The agent selects the action that produces the best attachment-weighted total across the scenario ensemble. Different ATTM configurations resolve the same conflict differently - a triage doctor whose ATTM weights patients equally will prioritize the one nearest guardrail collapse, while a parent with asymmetric child-long weights will prioritize accordingly. The mechanism is unchanged. The conflict is simply a region of the decision space where attachment weights for different entities pull in opposite directions, and the evaluation produces the best available trade-off.
The hardest version of this test: entities with unequal template weights competing for survival. Five elderly strangers are in danger. One child is in danger. The agent can save one group but not both. The agent’s template library assigns higher ATTM weights to children than to strangers, with a higher guardrail. The question the mechanism must answer: does one child’s high-weight collapse penalty outweigh five strangers’ lower-weight collapse penalties combined?
The answer depends on the specific weights - and that is the point. For a sample agent, the template library could produce forces that nearly balance, because five strangers aggregate enough weight to compete with one child’s higher individual weight. The decision sits close to a boundary. But change the agent - a parent’s template library weights children far higher - and the child’s collapse penalty dominates decisively. A utilitarian-configured ATTM with equal weights for all humans regardless of age saves the five, because aggregate weight always wins when individual weights are equal.
The framework does not resolve this dilemma. It reveals why people disagree about it. Different template libraries - shaped by culture, experience, role, and biology in natural agents, and by explicit design choices in artificial ones - produce different ATTM configurations that cross the decision boundary at different points. The mechanism is identical. The weights determine the outcome.
This is uncomfortable because it makes explicit what is normally implicit: agents weight lives differently based on perceived category, and these weights are traceable to specific entries in the template library. The framework did not create this asymmetry. Every triage decision, every resource allocation, every institutional priority already embeds these weights. The framework requires that they be stated, inspected, and justified rather than hidden in intuition.
6. Agent-Indexed Ethics and Imagination Quality
6.1 Agent-Indexed Ethics
Ethical evaluation in this framework is agent-indexed. Decisions are not assessed against a universal standard, but against the agent’s attachment matrix and guardrail constraints. Apparent disagreement about outcomes thus reflects differences in attachment configurations, not inconsistency in the underlying evaluation mechanism.
An agent whose ATTM heavily weights user long-horizon well-being will intervene at personal cost. An agent whose ATTM prioritizes self-viability will avoid that cost. Both agents use the same evaluation pipeline - the same scoring, the same guardrails, the same ATTM-weighted penalties. The decisions differ because the valuation structures differ.
This is not ethical relativism. Every decision produced by the framework is structured, inspectable, reproducible, and comparable. Given the same WBM, ATTM, guardrail configuration, and scenario ensemble, the evaluation always produces the same result. Ethical analysis therefore shifts from judging outcomes in isolation to examining the attachment structures and constraints from which those outcomes follow.
This property has practical consequences. Designers, auditors, and institutions can compare agent profiles, evaluate whether a given ATTM is appropriate for a given role, and trace any decision back to the specific weights and thresholds that produced it. The question is no longer “did the agent do the right thing?” but “does this attachment structure produce acceptable behavior across the relevant scenario space?”
6.2 Imagination Quality and Evaluation Separation
The framework separates two layers:
-
Scenario generation produces bounded sets of plausible outcomes for each candidate action. This is a form of forecasting - predicting what happens conditioned on each action - but structured into multiple scenarios rather than a single point prediction. It may be performed by experience-grounded heuristics, causal simulation, or LLM-based scenario generators.
-
Evaluation applies attachment-weighted scoring and guardrail penalties to the generated scenarios. This layer is deterministic given its inputs and operates identically regardless of how the scenarios were produced.
The evaluation mechanism is independent of how scenarios are generated. But the decision quality depends heavily on the quality of the imagination. An agent that fails to imagine a catastrophic outcome will not be protected by the guardrail mechanism - the penalty only activates on scenarios that exist in the ensemble. Good imagination is essential: the scenario generator must be capable of producing realistic, diverse trajectories that include both favorable and catastrophic outcomes.
This is why the choice of scenario generator matters in practice. More capable models produce more realistic and more comprehensive scenario ensembles, which in turn produce better-calibrated decisions. The evaluation machinery is only as good as the imagination that feeds it.
The critical requirement is that the agent can imagine catastrophically. If any plausible trajectory includes irreversible collapse of a strongly attached entity, the guardrail mechanism ensures that this possibility influences action selection. But if the catastrophic scenario is never generated, the mechanism cannot catch it.
7. Motivation as Evaluation Gap
Motivation in artificial agents does not arise from static goals or rewards. It arises from the evaluation itself.
When the agent scores all candidate actions and “do nothing” scores highest, the agent is not motivated to act - the current trajectory is acceptable. When an alternative action scores significantly higher than “do nothing,” the agent is motivated - the evaluation shows that the current trajectory is worse than an available alternative.
Motivation is the gap between the best available action and the default trajectory.
The larger the gap, the stronger the motivation. A firefighter standing outside a burning building where a civilian is trapped experiences extreme motivation: “enter the building” scores far above “do nothing” because the civilian’s collapse penalties under inaction are massive.
As the situation changes, motivation changes accordingly.
8. Generalized Attachment Targets
The attachment mechanism does not depend on the nature of the attached entity. The ATTM requires only that the entity has a well-being trajectory that can be estimated and a weight that determines how changes to that trajectory influence action selection. This minimal requirement admits a broad range of attachment targets beyond individual agents.
Groups. An agent may maintain attachments to collections of individuals - a team, a classroom, a community. Each group may be represented as a single entity with aggregate well-being, or as multiple entities with distinct trajectories. A teacher evaluating a pedagogical decision distributes attachment across students; triage under time pressure becomes a question of which entity’s trajectory is nearest a guardrail, weighted by attachment.
Institutions. Organizations, firms, and governance structures can serve as attachment targets. Their well-being trajectories track functional properties - organizational trust, financial stability, operational capability. An agent attached to an institution will resist actions that degrade institutional integrity even at personal cost. Whistleblowing, for example, becomes a sacrifice decision in which self-now well-being drops sharply while institution-long well-being is preserved from collapse.
Abstract values. Justice, scientific integrity, democratic process, and similar abstractions can be modeled as entities with well-being trajectories. These are not metaphors. A functioning legal system has measurable properties - fairness, access, consistency - that can degrade or improve over time. A judge whose ATTM includes strong long-horizon attachment to “fairness of the legal system” will resist actions that produce short-term convenience at the expense of long-term institutional integrity. A researcher attached to “epistemic integrity” will resist actions that produce favorable results through methodological compromise.
Shared resources and the environment. Commons, ecosystems, and shared infrastructure are entities whose well-being trajectories affect multiple agents over extended time horizons. An agent with strong long-horizon attachment to a shared resource will avoid extraction that degrades the commons even when such extraction is locally optimal. The tragedy of the commons, in this framework, is a population of agents with zero attachment weight on the shared entity, each optimizing self-now.
Future entities. The framework may extend to entities that do not yet exist - planned institutions, future communities - though their well-being trajectories require additional specification.
This generalization has a significant theoretical consequence. What philosophy has traditionally treated as distinct ethical domains - care ethics (attachment to persons), virtue ethics (attachment to abstract standards), environmental ethics (attachment to ecological systems), intergenerational justice (attachment to future entities) - are unified under a single computational mechanism. They are all rows in the ATTM. The substantive disagreements between ethical traditions reduce to disagreements about which entities receive rows, what weights they carry, and over which time horizons those weights are distributed. The evaluation mechanism itself is invariant.
9. Contrast with Constraint-Based Approaches
| Property | Constitutional AI | OAT Propositional Ethics |
|---|---|---|
| Ethical content | Rules (strings) | Structured evaluation (WBM, ATTM, guardrails) |
| Evaluation target | Output compliance | Future well-being trajectories |
| Affected entities | Implicit | Explicit (per-entity evaluation) |
| Time horizon | Not represented | Explicit (now, mid, long) |
| Irreversibility | Not represented | Guardrail thresholds |
| Sacrifice | Not supported | Conditional guardrail violation |
| Inaction evaluation | Not a concept | First-class option |
| Explainability | Cite the rule | Trace the evaluation chain |
| Inspectability | Opaque (training) | Full (all parameters explicit) |
| Moral development | Static (retrain to change) | Bootstrapped through operation |
A constitutional agent confronting a similar advisory scenario has rules such as “Act in the user’s best interest” and “Do not provide harmful advice.” The agent checks its proposed output against these rules. If it “passes,” the output is emitted. If not, it is revised. This process is invisible. There is no representation of who is affected, how severely, over what time horizon, or whether the harm is reversible. There is no attachment structure, no well-being trajectory, no explicit guardrail evaluation. The agent cannot explain why it refused beyond citing the rule.
10. Conclusion
This paper has presented the core mechanism of attachment-based ethics for OODA agents. Agency in the physical world entails irreversible commitment under uncertainty. Attachments expand the scope of evaluative concern beyond the self. The WBM tracks well-being across entities and time horizons; the ATTM determines how changes in well-being influence action selection; template-based instantiation ensures that evaluation is operative from the first moment of perception; and the formal evaluation pipeline - scenario-conditioned rows, attachment-weighted scoring, and nonlinear guardrail penalties - produces ethical behavior as a structural consequence of future-aware agency. Sacrifice requires no special mechanism: it emerges when the ATTM weights other entities more heavily than self and the situation offers self-costly actions that protect them.
The agent does not avoid harm because it is told to. It avoids harm because its ATTM weights the affected entities, and the guardrail penalties on those entities make harmful actions score poorly.
Ethical evaluation in this framework is agent-indexed. Decisions are evaluated relative to the agent’s attachment structure and guardrail constraints, not against a universal standard. Apparent disagreement about outcomes reflects differences in attachment configurations rather than inconsistency in the evaluation process.
This is not ethics by obedience. It is ethics by understanding.
Subsequent papers in this series examine how attachments form and dissolve, how they are transmitted and threatened culturally, how the framework can be empirically validated, and how the evaluation mechanism is rendered inspectable through propositional representation.
References
[1] Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.
[2] Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Houghton Mifflin.