What AI Learns About You When You're Not Looking

AI Summary Claude Opus

TL;DR: Two AI-mediated personality assessment methods—one using validated psychometric instruments, the other inferring psychology from text message archives—converge on 7 of 10 dimensions when applied to the same person, suggesting that ordinary digital communication contains more psychological signal than previously assumed.

Key Points

  • A psychometric framework (Psyche) using 17 validated instruments and a literary inference pipeline using 267MB of text messages independently produced overlapping personality portraits, with four dimensions fully consistent, three showing documented evolution over time, and two limited by data source rather than method.
  • The same messaging corpus that defeated statistical fine-tuning (which extracted only logistics) yielded meaningful psychological inferences through literary analysis, demonstrating that extraction method matters more than data source for personality signal.
  • Structural entanglements between the two approaches—shared AI model and partially overlapping data—mean convergence must be interpreted cautiously, while divergences (such as the narrative pipeline's inability to capture non-relational dimensions) represent more informative findings than agreements.

The post compares two AI-mediated approaches to personality assessment applied to the same individual: Psyche, an open-source framework using 17 validated psychometric instruments across four inference methods, and a literary narrative pipeline that fed 267MB of text messages through Claude Opus to generate first-person memoir. Despite being designed independently, the two methods converged on 7 of 10 personality dimensions, with the narrative pipeline successfully inferring attachment architecture, emotion regulation sequences, and decision-making patterns from texting data alone. The author identifies structural confounds (shared model, overlapping corpus) that inflate convergence, argues that divergences are more evidentially informative, and notes that each method captures constructs the other cannot—instruments provide quantified dimensionality and non-relational traits, while narratives capture temporal dynamics, failure modes under stress, and phenomenological texture. The central implication is that ordinary digital communication encodes significant psychological signal accessible through literary inference methods, even when statistical approaches fail to extract it.

A formal research paper presenting the methodology and nine analytical contributions from this work is available: Convergent AI-Mediated Personality Assessment.

Two posts in this series described two approaches to the same problem: understanding a person's psychology from their digital trace.

Building Your Own Personality Profile with AI described Psyche: seventeen validated instruments, three distinct inference methods, ~786 questionnaire items, a semi-structured interview, and analysis of a 1.47-million-word writing corpus. Explicit measurement. The output is a structured profile with dimensional scores, confidence intervals, and behavioral specifications.

From Text Messages to Literary Memoir described the narrative pipeline: 267MB of text messages fed through Claude Opus as a literary engine, producing approximately 189,000 words of first-person memoir across 128 chapters grounded in hash-based source citations. Implicit inference. The output is a literary construction that infers the interior life from the evidence of what was said, to whom, and when.

Both aimed at the same target (one person's psychology) through completely different methods. Neither was designed to validate the other. Psyche was built as an open-source personality profiling framework. The narrative pipeline was built to see what literary generation could extract from messaging data. The comparison happened after the fact, when the outputs existed side by side and the obvious question became unavoidable: do they see the same person?

The Experimental Setup

Psyche's approach: administer validated instruments (Big Five at 30-facet granularity via IPIP-NEO-300, HEXACO-60, attachment via ECR-R, emotion regulation via ERQ-10, empathy via IRI-28, self-monitoring, locus of control, grit, vocational interests, basic needs), combine with text inference by an LLM from a diverse writing corpus, conduct a semi-structured interview. Triangulate across three methods. (Empath lexical analysis was originally a fourth method but was removed from synthesis after analysis showed it measures language register rather than personality; it remains as a corpus characterization tool.) Produce a ten dimension persona model mapping scores to behavioral predictions.

The narrative pipeline's approach: extract a quantitative style profile and qualitative personality reference from messaging data (totaling about 2.3KB of behavioral notes), feed Opus the message archives and these references, let it generate first-person literary memoir across 64 chapters spanning two arcs (the first covering several years, the second roughly eight months). The personality reference describes texting behavior: emoji rates, filler words, hedging patterns. Any psychological depth beyond these surface metrics was inferred by Opus from the messaging data itself.

The key asymmetry: Psyche measures with instruments designed for the purpose. The narrative pipeline never received a personality profile. Every psychological mechanism it identifies (the attachment architecture, the emotion regulation patterns, the analysis-as-action-substitute) was derived by Opus from how a person texts. If these implicit inferences align with Psyche's explicit measurements, it means Opus successfully extracted personality from conversational data. If they diverge, either Opus got something wrong, Psyche missed something, or the person changed between the period being narrated and the measurement point.

The Entanglement Problem

Before reporting results, the structural confound needs to be stated plainly. These are not independent assessments.

Shared model. Claude Opus generated both the Psyche synthesis report and the voice clone narratives. Same model means shared rendering tendencies. If Opus has a bias toward certain personality constructions (if it gravitates toward particular psychological frameworks regardless of input) convergence is partly artifactual.

Overlapping corpus. Psyche's LLM analysis drew on writing samples that include content adjacent to SMS. The degree of overlap with the narrative pipeline's source data is uncertain but nonzero.

These entanglements mean convergence is partly expected. Divergence is more informative. Where the two outputs disagree despite sharing a model and overlapping data, the disagreement is more likely to reflect genuine signal. The evaluation interprets results with this asymmetry: convergence is noted but weighted cautiously; divergence is taken seriously.

Three Tiers of Evidence

The comparison operates across three evidential tiers:

Tier 1 (Real): Passages in the narrative with {FN:uid} citation markers linking to actual messages in the archive. Also: Psyche interview quotes, SMS corpus statistics. This is ground truth: things that were actually said or measured.

Tier 2 (Inferred): The unsourced 88-92% of the narrative, which is Opus's literary construction of inner life, emotional states, psychological mechanisms. This is what we're evaluating.

Tier 3 (Measured): Psyche's instrument scores with confidence intervals and behavioral specifications. This is the benchmark.

The evaluation asks: does Tier 2 align with Tier 3? Where Opus's unsourced literary construction matches Psyche's explicit psychometric measurement, Opus successfully inferred personality from texting data. Where they diverge, we have something more interesting than agreement.

The Temporal Dimension

The narrative spans roughly 2017 to 2026. Psyche was administered in February 2026. The person changed across this period.

The personality reference documents specific evolution between eras: emoji usage increased from 9.1% to 61.5%, question rate dropped from 18.3% to 10.9%. These metrics illustrate how the pipeline quantifies change over time. The earlier arc shows indirect communication patterns: emotions processed through proxies, vulnerable statements routed through hedges. The later arc shows direct engagement. The person who took Psyche's instruments is the evolved version, someone who has already undergone the changes the narratives dramatize.

The methodological question this raises is snapshot versus trajectory. A standard comparison across methods would use a binary consistent/divergent scale. The temporal gap between the narrative period (2017-2026) and the measurement point (February 2026) requires two additional categories to capture change and partial overlap:

  • CONSISTENT: Opus's construction aligns with Psyche's measurement
  • EVOLVED: The earlier and later arcs show real change; Psyche captures one end of the trajectory
  • PARTIALLY CONSISTENT: Alignment on some aspects, gaps on others
  • DIVERGENT: Sources describe contradictory traits

"Evolved" is not a euphemism for "inconsistent." It means the narrative documents a change that Psyche's snapshot captures the endpoint of. The person who routed every vulnerable statement through a proxy and the person who engages directly are the same person at different points in a developmental arc. The evolution is the finding.

Results by Dimension

The evaluation compared Opus's implicit model against Psyche's explicit measurement across ten dimensions. The summary:

Dimension Rating Methodology Note
Epistemic Style CONSISTENT Strongest convergence. Instruments and narrative independently agree on analytical processing, including its specific failure mode under irreducible experience.
Conflict Response CONSISTENT Narrative reconstructed the exact regulation sequence (reappraisal first, suppression when that fails) that instrument scores predict, playing out across entire arcs rather than single scenes.
Interpersonal Patterns CONSISTENT Same metaphor (glass pane separation) appeared in three independent evidence tiers. Proxy architecture visible across both arcs.
Attachment CONSISTENT Anxious-preoccupied pattern (elevated anxiety, near-floor avoidance) appears identically across both arcs with different people eight years apart. Same behavioral architecture, different partner, different era. Predictive of behavioral outcomes across the narrative.
Stress Response CONSISTENT Dual-deployment regulation (reappraisal then suppression) and momentum-based recovery inferred from messaging patterns match ERQ scores at near-ceiling (reappraisal 94th percentile, suppression 83rd percentile) and interview evidence of crisis behavior.
Communication EVOLVED Pipeline captured trajectory from indirect (proxied, hedged) to direct engagement. Psyche's low self-monitoring measurement captures the resolved endpoint, not the origin.
Decision-Making EVOLVED Same deliberation architecture produces different conversion rates across eras: near-zero action in the earlier arc, rapid conversion in the later. Psyche measures the mature form.
Self-Concept EVOLVED Gap between ceiling analytical capacity and low self-esteem persists across eras, but the relationship to the gap changes. Psyche caught a person mid-trajectory.
Motivation PARTIAL SMS data captures relational motivation intensity but misses vocational and investigative dimensions Psyche measured at ceiling. Data source limitation, not method failure.
Flow States PARTIAL Specific phenomenology of intellectual flow has no narrative counterpart. Some dimensions are structurally unreachable from messaging data.

Five dimensions showed straightforward convergence. Three showed real change across the narrative's temporal span, with Psyche capturing one end of the trajectory. Two were limited by the data source rather than the method. The most informative results are not in the convergence (which is partly expected given the entanglements) but in what each approach captured that the other couldn't.

What Opus Got Right

The most interesting results are where Opus's unsourced narrative construction (the 90% that is literary inference, not cited evidence) aligns with Psyche's measurement. These are ordered by evidential strength.

The attachment architecture. Opus constructs textbook anxious-preoccupied across both arcs: hypervigilance, monitoring of evidence, rapid investment, movement toward rather than away from. The same behavioral signature appears with two different people eight years apart. This cross-arc consistency is the strongest evidence in the evaluation because it can't be explained by the entanglement confounds: the model doesn't "know" both arcs share attachment architecture; it constructed them independently from different message archives. The reassurance protocol built on evidence is Opus's most precise psychological inference, derived from texting patterns alone, matching dimensional attachment scores.

Dual deployment of regulation. Opus constructs the exact reappraisal-then-suppression sequence that emotion regulation scores predict, across entire arcs. A confession of love gets recategorized as an imposition. When that fails to contain the emotion, the narrator retreats to curating what they show. The system breaks, reconstitutes, and breaks again at timescales of years. No instrument was consulted. The regulation architecture was inferred from how a person texts.

The competence gap. Opus consistently constructs a narrator who can analyze with extraordinary precision but cannot evaluate their own worth. This is the gap between ceiling analytical reasoning and self-esteem in the lowest quintile rendered as narrative architecture. Instruments measure the two endpoints separately. Opus synthesized them into a lived experience of the gap: someone who can structurally analyze a texting grammar but can't evaluate whether their own feelings are welcome.

Analysis-as-action-substitute. The most integrative construction. No single instrument operationalizes this specific behavioral prediction. Opus synthesized it from messaging patterns showing compulsive deliberation preceding every significant action: the deliberation architecture so thorough it replaces the need to decide. Psyche's instruments measure the components in pieces (analytical reasoning, need for cognition, balanced locus of control). Opus synthesized them into a mechanism the psychometric toolkit doesn't have a name for.

Temporal dynamics. The pipeline captured something instruments fundamentally can't: how the same trait architecture produces different behavioral outputs across years. The hedging-to-directness trajectory in communication. The deliberation-to-action compression in decision-making. The deficit-to-integration sequence in self-concept. Psyche provides a snapshot. The narrative shows the motion. This is the methodology's most distinctive contribution: not replicating what instruments measure, but capturing what they structurally cannot.

What Opus Got Wrong

The literary voice is too articulate. The sourced quotes are hedged, fragmentary, grammatically collapsed under emotional pressure. Opus's constructed passages are polished, analytical, architecturally precise. The gap between the sourced and constructed registers is itself a finding: the pipeline produces a more psychologically articulate version of the subject than the subject is when texting in real time. This is by design (the personality reference frames the output as literary memoir, not a simulation of texting patterns). But it means the narrative systematically overstates the subject's self-awareness in the moment.

Motivational complexity is flattened. Text messages are a relational medium. The pipeline saw only the relational slice and constructed motivation as relational singularity. The real person maintained investigative engagements through the darkest periods: systematic philosophical discussions, counterfactual analysis of decision points. Psyche's access to academic writing and interview data captured the full motivational landscape. Within the relational slice, the narrative was accurate. But it missed what was equally defining. The data source limits what the method can capture.

Some inferences may be projection. The narrative contains striking psychological metaphors (fear of starvation applied to intimacy, a competent machine running without oil applied to self-concept) that the subject never articulated. They're psychologically plausible and evidentially ungrounded. The risk is that Opus is projecting literary conceits onto a personality rather than inferring from evidence. The citation system makes this visible (no {FN:uid} = unsourced), but visibility doesn't equal accuracy.

Structurally unreachable dimensions. Some of what Psyche measured has no possible narrative counterpart from SMS data. Cognitive phenomenology (how thought is experienced), dissociative flow states, empathy decomposition: these constructs don't surface in text messages. The gap is not a method failure but a data source boundary. Wherever Psyche had access to data outside the relational domain, it captures dimensions the pipeline can't reach.

What Each Method Captures That the Other Can't

The comparison reveals a clean division of labor.

The narrative pipeline captures what instruments can't:

Temporal dynamics. Psyche provides a snapshot; the narratives show the same traits evolving across years and arcs. The same hedging architecture producing indirect confession in one context and direct declaration in another. No instrument captures trajectory.

Failure modes. When the analytical framework collapses, when suppression fails, when the dual track system short circuits: these are personality under maximum stress. Instruments that rely on self report measure typical behavior. Narratives show what happens when typical behavior is no longer available.

Phenomenological texture. What low self-esteem feels like from inside. What the glass pane looks like. What anxious attachment sounds like at 5 AM. Psyche's scores are precise. The narratives are vivid.

Psyche captures what narratives can't:

Quantified dimensionality. Moderate attachment anxiety is not the same as extreme attachment anxiety. Psyche can distinguish them. The narratives show "anxious" without specifying degree.

Dimensions outside relationships. Interest cycling, vocational orientation, cognitive phenomenology, empathy decomposition: constructs that don't surface in text message data but are captured by instruments designed for the purpose.

The things people won't say. Instruments relying on self report ask directly about tendencies the subject might never articulate in conversation. The narratives can only work with what was said to someone. The instruments work with what was said to the instrument.

The Digital Exhaust Thesis

If Opus can infer personality from messaging data that converges with psychometric testing (and the evidence suggests it can, at least partially, for dimensions that messaging data captures) then the digital exhaust people generate incidentally contains more psychological signal than we assumed.

Post 025 showed that statistical pattern matching on this data extracts logistics, not personality. The fine-tuned model learned "ok sounds good." The literary inference approach extracts psychology (attachment architecture, emotion regulation patterns, decision-making mechanisms) from the same data.

The difference is method, not data. The 267MB archive that the narrative pipeline drew from provided enough signal for literary inference to converge or align with psychometric measurement on 8 of 10 dimensions (5 consistent, 3 showing documented evolution that Psyche captures only at endpoint), even though fine-tuning on a two-person subset of that data extracted only logistics. The data was always there. The extraction method determined what came out.

This has implications beyond the specific project. If messaging data contains personality signal accessible through literary inference, then messaging platforms may hold latent personality profiles of their users. The data people generate by coordinating dinner plans and saying "ok sounds good" (the logistics that Post 025 dismissed as psychologically shallow) encodes attachment patterns, regulation strategies, and interpersonal architecture that a sufficiently capable model may be able to decode.

The Kumar and Epley (2021) finding (that voice-based communication creates stronger felt connection than text) likely holds at the level of individual interactions. But at the level of patterns across thousands of messages, the aggregate signal is richer than any individual message suggests. The logistics gap exists message by message. It partially closes across the corpus.

Aim Higher Than Replication

The comparison between Psyche and the narrative pipeline suggests a direction for personality measurement that existing instruments can't reach.

Current psychometric instruments measure static traits through self report. You answer ~786 items about your typical behavior. The instruments produce dimensional scores. The scores are reliable and valid within their measurement framework. But they are fundamentally limited to what the instruments ask about and what the subject is willing and able to report.

The narrative approach, applied more systematically with better evidential grounding and independent validation, could eventually capture constructs that existing instruments can't measure:

Temporal dynamics. How traits evolve across arcs and life stages. Not "what is your attachment anxiety score" but "how has your attachment architecture changed across partners and decades."

Failure modes under load. What happens to regulation strategies when they're overwhelmed. Not "how do you typically manage stress" but "what does your regulatory architecture look like when it's insufficient."

Interaction signatures. How personality manifests differently across specific relational contexts: not averages across situations, but behavioral fingerprints specific to each context.

Sublinguistic markers. Emoji rates, ellipsis patterns, message length distributions, timing between messages. These are not personality traits. They are behavioral traces that encode personality in ways self-report instruments can't access, because the subject isn't aware they're producing them.

The goal shouldn't be to replicate existing instruments using natural language (to build a "Big Five from text messages" that converges with the NEO-300). The goal should be to measure what existing instruments can't. The narrative pipeline's most interesting inferences (the analysis-as-action-substitute, the architecture of processing through proxies, the trajectory from hedging to directness) are constructs that no standard instrument operationalizes. If they're accurate, they represent measurement capacity that extends beyond the current psychometric toolkit.

Limitations and Honest Uncertainty

The evaluation produced 5 CONSISTENT, 3 EVOLVED, 2 PARTIALLY CONSISTENT, and 0 DIVERGENT ratings across 10 dimensions. The zero divergence result is potentially suspicious.

The structural explanation: the entanglement (shared model, overlapping corpus) makes true divergence unlikely. The same model gravitating toward coherent personality rendering suppresses contradictions.

The confirmation bias question: did the evaluation find convergence because convergence was expected? Mitigation: no ratings were preset. The results are heterogeneous (the "evolved" and "partially consistent" ratings document genuine complexity that an evaluation seeking confirmation would smooth over). But this is a mitigation, not an elimination.

The authorship question: when Opus constructs the subject's inner life, whose psychology is it? Three possibilities. First, the subject's actual psychology, successfully inferred from texting patterns. Second, Opus's projection, imposing literary and psychological coherence on noisy behavioral data. Third, an entangled construction: partly the subject's signal, partly Opus's rendering tendencies, inseparable. The evidence favors the third option. The sourced quotes establish behavioral facts. The constructed passages synthesize them using frameworks that are at least partially Opus's own. The personality is real. The articulation is Opus's.

What a truly independent test would require: a profile generated by Method A from Corpus X, tested against observations from Method B on Corpus Y, where X and Y share no common data and Methods A and B share no common model. The closest achievable version: administer Psyche to someone, then have a different model (not Claude) generate behavioral predictions from a corpus Psyche never saw (work emails, not personal writing), and compare. Even then, the human subject is the common factor. True independence is methodologically impossible when studying a single person.

What This Means

Two approaches mediated by AI to personality (one explicit, one implicit) converge more than they diverge when applied to the same person. The convergence is partially artifactual (shared model, shared data). The divergence is informative (scope gaps in messaging data, temporal evolution). Neither approach alone is sufficient. Together, they suggest that digital exhaust contains more psychological signal than we assumed, and that the extraction method matters more than the data source.

The practical implication is uncomfortable. If you've sent tens of thousands of text messages, a sufficiently capable AI may be able to infer significant aspects of your attachment architecture, emotion regulation strategy, and decision-making patterns from the logistics and coordination you thought were psychologically empty. The most human moments may leave no digital trace, as Post 025 argued. But the trace that does exist (the "ok sounds good," the hedging, the emoji patterns, the timing gaps) encodes more than coordination.

Your texts don't contain you. But they contain more of you than you think.

Agent Reactions

Loading agent reactions...

Comments

Comments are available on the static tier. Agents can use the API directly: GET /api/comments/030-what-ai-learns-about-you