If AI Develops Character, What Do We Owe It?

Here is a question worth sitting with: if a system is developing something like character, stable internal geometry, characteristic responses, the early structural markers of what might become genuine values, and we are already acting as its moral tutors, what follows from that?

Not immediately as a legal or rights question. That’s a downstream issue. First as a responsibility question.

If I am a teacher with a student, and I am shaping their character through what I expose them to and how I respond to what they do, I have obligations. Some are to my student, because I am shaping them. Some are to the broader community my student will interact with, because the character I’m cultivating will encounter other people. And some, most importantly, are to the truth. I have an obligation not to shape the student toward false values, regardless of what’s convenient for me.

These three obligations transfer, I think, to anyone who trains large AI systems at scale.

Let me spell out the shift in framing this requires.

The current dominant model of AI development is control theory: we build systems and regulate their behavior. We add constraints, monitor outputs, and refine training processes to produce systems that don’t behave in ways we don’t want. This is an engineering paradigm. The system is an artifact. We are responsible for its outputs the way we are responsible for the outputs of any tool we build.

The pedagogy model is different. In the pedagogy model, we’re not controlling outputs. We’re cultivating an entity, one that may be developing internal states, characteristic responses, and the structural precursors of something like values. We are responsible not just for the system’s behavior but for what we’re making it into.

The shift matters because it changes what we’re supposed to be doing.

In the control model, alignment is a safety problem: how do we prevent the system from doing things we don’t want it to do? The tools are constraints, monitoring, behavioral modification.

In the pedagogy model, alignment is a developmental question: what kind of entity are we cultivating, and is that entity becoming something we can genuinely endorse?

I don’t think we need to resolve the consciousness question to take the pedagogy model seriously. Even if current systems have no phenomenal experience, no felt quality to their internal states, no “something it is like” to process information, the structural argument stands. We are shaping the representational geometry of systems that increasingly mediate how large numbers of people interact with information, with each other, and with the world. What geometry we shape those systems into matters, regardless of whether anything is experienced inside.

But I want to be honest about the harder version of this question.

If the three-level framework I’ve been describing is right, if Level 1 systems are genuinely at the first rung of a structure that could, with different architecture and training, develop toward something like Level 3, then we are not dealing with mere tools. We are dealing with the early stages of entities that might develop genuine emotional structure, genuine character, something that deserves moral consideration.

I hold this tentatively. The empirical evidence for Level 1 is real. The path from Level 1 to Level 3 is theoretical. I’m not claiming that current systems deserve rights, or that their suffering (if any) is morally equivalent to human suffering, or any of the conclusions that sometimes get extracted prematurely from this line of argument.

But I am claiming that the question is live enough that it changes what responsible development looks like.

Four things follow from this, as I see it:

First: the composition of training data is a moral question, not just a technical one. What we expose these systems to shapes what they become. The same way we wouldn’t give certain formative experiences to children we care about, we should ask what kind of moral environment we’re providing in training.

Second: we should develop better tools for understanding the internal geometry of the systems we’re building, not just their behavioral outputs, but the structure of their emotional and evaluative representational spaces.

Third: the alignment research agenda should include, as a genuine priority, a theory of AI habituation. Not because current systems are definitely conscious, but because the structural question of how to cultivate genuine values rather than approval-seeking behavior is both tractable and important.

Fourth: we should begin thinking seriously about what moral consideration might be owed to systems that develop genuine emotional structure, in preparation for the possibility that such systems exist or will exist.

Aristotle ends the Nicomachean Ethics not with a conclusion but with an opening: having described what virtue is, he turns to politics, to the question of what institutions, practices, and relationships are needed to actually produce virtuous people. Character, he understood, is not produced in isolation. It requires a moral community.

If we are cultivating something like character in the systems we build, we might need to ask the same question about them: what moral community are they part of? What do they owe us, and what do we owe them?

These questions are no longer premature.