Content is a signal, behavior is a pattern

Content is a signal, behavior is a pattern

Most platforms are built to catch bad posts. The harder problem — and the more consequential one — is catching bad actors before the harm is done and protecting against harmful behavior.

When content gets flagged on a social platform such as a hateful comment, a graphic image, a threat of violence etc. something clicks into motion. A classifier fires, a queue gets populated, a moderator reviews, and the post disappears. The workflow is well-understood, increasingly automated through AI, and measured in millions of actions per day. By that metric, Trust & Safety has never been more productive.

But productivity and effectiveness are not the same thing. Removing harmful content is necessary. It is not sufficient. The frame that makes most platforms legible to their own safety teams — content as the unit of harm — is also the frame that makes certain categories of harm nearly invisible until they have already done their worst.

The infrastructure we built

The modern Trust & Safety field took shape during the 2010s, when major technology firms began investing in dedicated teams drawn from legal, policy, engineering, and social science backgrounds. A landmark moment came in February 2018, when representatives from Google, Facebook, Reddit, and Pinterest publicly discussed their content moderation operations for the first time at a Santa Clara University conference — producing the influential Santa Clara Principles on transparency and accountability in content moderation. The field had a name, a canon, and a set of shared practices.

Those practices were shaped, understandably, by the problems that were most visible. Illegal content. Hate speech. Spam. Disinformation. Each of these manifests as content — something you can point to, screenshot, and compare against a policy. The tools built to address them are content tools: keyword filters, image hashing, classifier models that score individual pieces of media against a taxonomy of violation types. One form of moderation — keyword flagging — works by detecting specific terms in text, images, or video, but is limited by its lack of contextual understanding and the need for constant updates as bad actors shift their language.

These tools work. They catch a great deal of harm every day. But they were designed for a particular threat model: the bad post. They are far less suited for the bad actor who posts nothing flaggable — at least not yet.

The problem with waiting for the content

Consider how radicalization actually unfolds online. It is rarely a single exposure to extreme content that transforms someone's worldview. Exposure to radical content online is widely regarded as a contributory factor rather than a direct cause of radicalization, with research focusing on how online behaviors — not just content — lead individuals toward more extreme positions. The process is gradual, social, and relational. It happens across platforms, across weeks or months, through the accumulation of relationships and small shifts in the information diet.

Research into online extremism has documented what practitioners have come to call the radicalization funnel — a multi-stage pipeline that moves individuals from mainstream platforms to progressively more extreme spaces. In documented cases of European lone-actor attacks, TikTok served as an emotional incubator and ideological gateway, with vulnerable users subsequently drawn toward more operationally focused platforms like Telegram. At each stage of that journey, a content moderation system tuned to individual violations may see nothing. The early-stage content is often not violating. The account activity looks, at the surface level, like ordinary engagement. The pattern is only visible in aggregate — in the trajectory, not the snapshot.

Researchers use the term radicalization funnel to describe a staged process through which individuals move from mainstream ideological positions toward extremism. Online, this process typically involves initial exposure on large platforms with algorithmically amplified content, followed by recruitment into smaller, more specialized communities, and eventually migration to encrypted or fringe platforms where more explicit planning and coordination can occur. Each stage of this funnel involves behavioral signals — patterns of following, sharing, group-joining, and platform migration — that precede the production of overtly violating content. Because content moderation systems typically evaluate posts in isolation, they are structurally less equipped to detect movement through the funnel than to catch violations at the end of it.

This is a structural limitation, not a calibration problem. A system designed to evaluate individual pieces of content cannot, by definition, detect a pattern of behavior. It sees frames; it cannot see the entire movie.

Actor-level signals and what they reveal

The alternative framing — and the one that a growing number of practitioners are advocating for — centers on behavioral signals rather than content signals. Actor-level analysis focuses on how users behave on platforms rather than on the specific content they post: by monitoring a user's posting history and patterns of interaction, Trust & Safety teams can identify users who may repeatedly engage in harmful behavior and act before escalation. The unit of analysis shifts from the post to the account, and from the moment to the trajectory.

What does behavioral analysis actually look at? The signals are varied: the velocity of account activity, the structure of an account's social graph, the sequence of communities joined, the pattern of whom a user contacts and how frequently, the migration of a conversation from public to private channels. None of these signals is, individually, dispositive. When appraised holistically alongside other risk indicators of a user's conduct, behavioral signals enable an evaluation of whether a user is engaging in harmful activity — and this kind of continuous detection is the only scalable approach for threats like child grooming, extortion, or coordinated manipulation.

This is a harder problem than content moderation in almost every respect. Content can be evaluated in isolation; behavior requires context and history. Content can be assessed by a classifier with no memory; behavior analysis requires maintaining models of individual accounts over time. Content violations are binary — something either is or isn't a slur; behavioral risk exists on a spectrum and changes continuously.

The organizational inertia problem

Beyond the technical challenges, there is an organizational one. A consistent finding among Trust & Safety professionals is that leadership tends to be reactive rather than proactive — platforms invest in safety tools when things are already on fire, not before, treating Trust & Safety as a cure rather than a preventative measure. This creates an environment where the ROI of behavioral detection systems — which prevent harms that never fully materialize — is difficult to demonstrate and therefore difficult to fund.

Content moderation, by contrast, produces legible outputs. Removal counts. Action rates. Quarterly transparency reports. These numbers communicate something concrete to regulators, press, and users: here is the scale of the problem, and here is the scale of our response. Behavioral disruption is harder to demonstrate in metrics. You cannot publish a statistic for the radicalization funnel you disrupted before it reached its terminal stage, because the counterfactual is invisible.

"Monitoring content-level signals does not address recurrent problematic behavior by users, which could be used to pre-emptively address issues. Power users are not going to be put off by sustained efforts to moderate their content — they are just going to double down, and constantly seek out new ways to exploit the system."
https://www.trustlab.com/post/content-vs-actor-looking-into-online-safety-signals

This matters for how platforms allocate resources. If the metrics that get reported are content metrics, the investments that get made are content investments. The behavioral layer — the one that might catch the threat earlier — stays underfunded and underdeveloped.

Toward a more complete model

None of this is an argument against content moderation. I think it remains foundational. A platform that removes harmful content poorly cannot compensate by being sophisticated about behavior. The argument is for addition, not substitution: treating content as one signal among many, and building the organizational and technical infrastructure to read the patterns those signals compose.

That means investing in graph-based analysis that can detect coordinated behavior across accounts. It means maintaining behavioral histories long enough to identify trajectory rather than just state. It means building escalation pathways that can act on probabilistic risk, not just confirmed violations — which in turn requires policy frameworks sophisticated enough to authorize intervention before a bright line has been crossed. It means removing content that when isolated does not violate any platform policies, but when reviewed as a pattern changes that perspective.

It also means changing what gets measured and reported. Transparency reports that surface only removal counts tell an incomplete story about a platform's safety posture. The more complete question is: how early in the harm lifecycle is the platform intervening? Content-level action happens near the end. Behavioral disruption can happen much closer to the beginning.

The platforms that will be most resilient to the next wave of coordinated abuse, radicalization, and manipulation are probably not the ones with the fastest content classifiers. They are the ones that have learned to read behavior as a pattern — to see what the content, taken alone, cannot show them.