Who's Actually In Your AI Data

Every confident statistic about how AI is reshaping work is partly a statistic about who shows up in the conversation logs.

I read a lot of AI workforce reports.

Most of them open the same way. A number. "Forty-nine percent of jobs already have a quarter of their tasks done by AI." "Knowledge work is being displaced 12x faster than service work." "AI is doing real work in this percentage of occupations."

The numbers come from conversation logs. A researcher gets a dataset from a vendor, usage data from Claude, or ChatGPT, or Copilot, and maps the conversations to occupations. From there you get an exposure score. Multiply by employment. Get a headline.

I bought a lot of those numbers. They felt rigorous. Real conversations, real tasks, real work. What's more grounded than that?

Two Northwestern researchers, Michelle Yin and Burhan Ogut, posted a paper last week that pulled the ground out from under me.

The paper is called "Who Uses AI? Platforms, Workforce, and AI Exposure." The argument is one sentence: the exposure scores aren't measuring the workforce, they're measuring the platform's user base.

I gave one of those numbers a starring role myself. Early this year I was helping a CFO at a financial-services firm think through a hiring freeze on their junior research staff. I put up a slide with a stat I had pulled from a workforce report: something close to half of the tasks in analyst-type roles were already AI-exposed.

It landed exactly as I intended. The room nodded. The freeze felt obvious, even responsible. We were reading the territory, or so I thought.

What I had actually put on the slide was a fact about who talks to AI platforms. Analysts are some of the loudest users in any conversation dataset. The stat was real and the inference was backwards, and I did not have the language to see it until I read this paper. I am not proud of that slide.

Here is what shook me about the paper. They took the same outcome, post-ChatGPT employment effects in U.S. occupations, and reran the standard estimation pipeline three times. Once with one platform's conversation logs. Once with another vendor's. Once with the enterprise channel of the same vendor instead of the consumer channel.

Same outcome. Same controls. Same estimator. The only thing they changed was which platform's logs they fed in.

The coefficient changed by a factor of 1.9.

The within-vendor consumer-versus-enterprise comparison disagreed in sign. Same product. Different users. Opposite conclusions about whether AI is destroying or creating jobs in a given occupation.

When they reweighted the data to match Bureau of Labor Statistics workforce shares, what the U.S. labor force actually looks like, the estimates dropped by 42 to 93 percent.

The composition problem is enormous. Computer and Mathematical occupations are 3.4 percent of U.S. employment. They are 32 percent of consumer conversation traffic on the platform they studied. They are 52 percent of enterprise traffic. Food Preparation workers are 8.8 percent of employment. They are 0.7 percent of conversations.

So when a paper tells you "AI is doing 25 percent of the tasks in this occupation" using platform logs, what it is partly telling you is whether that occupation is the kind of occupation that talks to AI platforms. Engineers are the loudest. Cashiers are silent. The data hears the loud and treats the silent as if they were absent, not as if they were simply not in the room.

I was already skeptical of headline AI statistics. This paper gave me the mechanic of why.

The pattern is the same thing that was wrong with early COVID-19 testing data. If you only test sick people, the case fatality rate looks catastrophic. The number is real. The number is also wrong. Testing was the filter; the world the data described was the world that walked into the tent.

Platform usage is the new testing tent. Knowledge workers walked in. Service workers didn't. Now we have a body of research extrapolating from the tent to the country.

Three things this means for an executive reading the next workforce-exposure report.

Ask which platform produced the data. The answer is usually one vendor. If you are about to act on a 25 percent number, the 25 percent is a number about that vendor's user base before it is a number about your workforce.

Ask whether the analysis reweighted to BLS occupation shares. If it didn't, you can roughly bracket the bias as two to twenty times too large for over-represented occupations, two to twenty times too small for under-represented ones. Yin and Ogut's bounds are partial, not point estimates, but they are real bounds.

Notice which occupations are conspicuously absent from the discussion. If your sector is retail, healthcare delivery, food service, transportation, skilled trades, or any kind of physical-presence work, your sector is the silent part of the dataset. The reports that say "AI hasn't reached you yet" may be reporting on the dataset, not on you.

Before you act on any workforce-exposure number, make whoever brought it to you answer three questions out loud: which platform produced the data, whether it was reweighted to match the actual labor force, and whether your own sector is loud or silent in that dataset. If they cannot answer all three, the number is not decision-grade yet. Treat it as a prompt to go measure the work in your own building, not as permission to cut.

The deeper thing I'm taking from this paper is about how new measurement infrastructure gets built. We don't yet have a Bureau of Labor Statistics for AI. The platforms have the data. They publish it. The publications get cited. The citations get reported. The reports turn into board materials. The board materials drive headcount decisions. The headcount decisions are made by executives who think they are reading the territory and are actually reading the platform's map of itself.

This is not a problem about bad researchers. Yin and Ogut are clear that they're working with the same data anyone else has access to. It is a problem about a measurement layer that has not had time to professionalize. We are at least a few years from a credible workforce-exposure index that doesn't have this skew. In the meantime, the loudest voices in the data are not the broadest.

The paper closes with a methodological move I find quietly important. They formalize the bias as non-classical measurement error and derive partial-identification bounds for the true employment elasticities. In English: they don't claim to know the right answer. They claim to know how wrong the wrong answer can be. The bound understates substitution more than augmentation, meaning the gap between "AI is taking jobs" and "AI is augmenting workers" is even larger than the platform numbers suggest, in the direction of substitution being more real than the headlines say.

I'll keep reading the workforce papers. I'll be more careful about which slide I put in the deck.

The only number I now fully trust on AI exposure is the one I can back into from the work I've actually watched get done in my own building.

Everything else is provisional until somebody reweights it.

Who's Actually In Your AI Data

The Better You Are at AI, the Less You Catch Its Mistakes

Stop Asking If AI Will Replace Your People

The Skill Your Best People Are Losing

The cheapest output AI gives your team is confidence

What if Anthropic chose Albania as a research hub?

Stop asking Claude if your plan works. Ask how it failed.

Who's Actually In Your AI Data

> Continue reading

The Better You Are at AI, the Less You Catch Its Mistakes

Stop Asking If AI Will Replace Your People

The Skill Your Best People Are Losing

The cheapest output AI gives your team is confidence

What if Anthropic chose Albania as a research hub?

Stop asking Claude if your plan works. Ask how it failed.

Ideas, observations,and honest takes.

Ideas, observations,
and honest takes.