Case Study — 2023–2025

CAIR-CA, 2024: The 6.4× Measurement Gap and What It Reveals About How Discrimination Is Counted

An empirical reading of the divergence between community-collected and law-enforcement-collected civil rights data — and what the California-specific case tells us about the dark figure problem in marginalised-community discrimination measurement.

Reference Case Study·11 min read·1 primary source
154

Anti-Muslim bias events documented by CAIR-California in its 2024 reporting (complaint-intake methodology, community-collected)

24

Anti-Islamic bias events recorded by the California DOJ in the 2024 Hate Crime in California report (law-enforcement-reported methodology)

6.4×

Ratio of community-collected count to official-channel count for the same underlying phenomenon in the same jurisdiction in the same year

1995

First year of the California DOJ CJSC hate-crime reporting series — three decades of comparable official data, with the underreporting problem documented in the report's own methodology notes

01The empirical gap and what makes it readable

The starting point is the side-by-side comparison. For 2024, CAIR-California documented 154 anti-Muslim bias events in the state. The California Department of Justice's Hate Crime in California 2024 report recorded 24 anti-Islamic bias events in the same jurisdiction for the same year. The ratio is 6.4× — and the substantive analytical move is to refuse to reconcile the two numbers into agreement and instead read the gap itself as the evidentiary object of interest.

The reason the gap is readable rather than confusing is that both data systems are well-documented at the methodology level. The California DOJ's hate-crime reporting series goes back to 1995 and includes explicit methodology notes; the report itself acknowledges, in language operators should sit with, that 'hate crime data has generally been underreported and the California Department of Justice recognizes that the data presented in its reports may not adequately reflect the actual number of hate crimes occurring in the state.' That acknowledgment is the official-side concession that the 24 number is a floor, not a ceiling. CAIR-CA's intake methodology is documented in its own legal reports — what an event has to look like to enter the dataset, how categories are defined, what counts as a single event versus multiple. The two methodologies are not measuring the same thing; the difference between what they measure is precisely what makes the gap interpretable.

The cleanest framing of the difference: the official-channel dataset measures incidents that were reported to law enforcement, classified by law enforcement as bias-motivated, and met the definitional thresholds of the California penal code's hate-crime provisions. The community-collected dataset measures incidents that the affected community member chose to report to a community-trusted intermediary, met the intermediary's intake threshold, and were documented in a categorisation schema designed to capture the community's experience rather than the criminal-statute definitional structure. The two datasets are partial pictures of overlapping but non-identical underlying populations. Treating them as competing estimates of a single quantity misreads what each is measuring.

"The substantive analytical move is to refuse to reconcile the two numbers into agreement, and instead read the gap itself as the evidentiary object of interest."

02The dark figure problem and the academic literature it sits inside

The 6.4× gap is not a novel finding in the methodological sense; it is a clean local instance of a well-established pattern in the academic literature on the dark figure problem in crime reporting. The dark-figure literature, going back to early survey-versus-official-statistics comparisons in the 1960s and 1970s and continuing through the National Crime Victimization Survey's parallel-track reporting alongside FBI Uniform Crime Reports, establishes that for any crime category where reporting decisions are influenced by the victim's relationship to law enforcement, the official-channel count systematically understates incidence — and that the magnitude of the understatement varies predictably with the strength of the relationship between the affected community and law enforcement.

For categories like sexual assault, domestic violence, and bias crimes against marginalised communities, the literature consistently finds that official-channel counts capture roughly one-third to one-fifth of community-survey-estimated incidence, depending on the specific community and the specific crime category. The CAIR-CA-versus-California-AG ratio of 6.4× sits at the more pronounced end of that range, which is consistent with the published literature on bias crimes against communities whose historical relationship with federal and local law enforcement includes surveillance, watchlist programs, and post-9/11 enforcement actions. The Supreme Court's 2024 unanimous opinion in FBI v. Fikre (the no-fly-list procedural due process case) is the federal-side legal record of one specific dimension of that historical relationship; the Court's procedural ruling does not require any party to take a position on the underlying geopolitical questions, which is what makes it pedagogically clean as an anchor.

The point worth pulling out of the academic literature for operator-grade purposes: the dark-figure gap is not a measurement defect to be eliminated through better data collection on the official-channel side. It is a structural feature of the reporting process that exists regardless of how diligently the official-channel side collects data, because the gap is upstream of data collection — it lives in the decision the affected community member makes about whether to report at all, and to whom. The methodological response is not to replace one dataset with the other, but to treat the two together as triangulating measurements of the same underlying phenomenon, with the gap between them as a proxy variable for the strength of the community-to-law-enforcement relationship. This is, in fact, the standard methodological move in the dark-figure literature, and it is the move CAIR-CA's reporting structure implicitly invites.

03What the gap does and does not let us conclude

An operator-grade read on the 6.4× gap has to be honest about the inferential limits, because the gap is suggestive on more dimensions than it is conclusive on, and overclaiming on the suggestive dimensions is how methodologically careful work loses credibility.

What the gap does let us conclude. First: the official-channel count is, by the California AG's own methodological acknowledgment, a floor. The actual incidence of anti-Muslim bias events in California in 2024 is at least 24 and is plausibly at least an order of magnitude higher; CAIR-CA's 154 is one defensible point estimate of where the true figure sits, sourced from a methodology with its own well-documented limitations but designed specifically to capture the population the official-channel methodology systematically misses. Second: the ratio of community-collected to official counts is informative as a relative measure — comparing the gap across years, across crime categories, or across communities is more methodologically defensible than comparing absolute counts from either dataset in isolation. Third: the gap's persistence across reporting cycles is, on its own, evidence that the underreporting is structural rather than transient, which is the conclusion the dark-figure literature would predict.

What the gap does not let us conclude. First: the absolute true incidence of the underlying phenomenon. Both datasets are partial; neither is a full census; the true figure sits somewhere between CAIR-CA's count and a substantially higher number, but the upper bound is not estimable from these two sources alone. Second: comparative claims about how California's situation compares to other states using each state's own official-channel data, because cross-state variation in official-channel methodology is itself substantial and the variation in community-collected datasets is even larger. Third: causal claims about which specific factors drive the gap — the gap is consistent with multiple causal stories (community trust in law enforcement, classification practices at the local-agency level, statutory definitional differences, intake-process accessibility) and the available data does not adjudicate between them.

04Why this matters beyond civil rights data: the generalisable methodological pattern

The CAIR-CA-versus-California-AG comparison is a clean local instance of a measurement pattern that operators encounter across many domains where the underlying phenomenon is systematically under-reported through official channels. Public-health surveillance for stigmatised conditions; consumer-complaint data for industries where regulatory reporting is voluntary; employee-experience data in organisations where formal grievance channels are perceived as career-risky; user-experience data for products where formal support channels are difficult to access — all of these domains exhibit the same structural pattern as the civil-rights case, and the methodological response in each case is similar.

The generalisable lesson is that any dataset on a phenomenon whose reporting is upstream-gated by the affected party's decision to engage with the reporting channel will systematically understate true incidence, and the magnitude of the understatement will be largest for exactly the populations whose experiences are most important to measure. The methodological response is not to improve the official-channel data collection — that work is necessary but cannot fully close the gap, because the gap is upstream of collection. The response is to invest in parallel community-collected, victim-survey, or third-party-intermediary data streams that are designed to capture the population the official channel misses; to publish both data streams with their methodologies explicit; and to read them together as triangulating measurements rather than competing estimates.

This is the methodological pattern operators in product, public-policy, employee-experience, and user-research contexts should be ready to apply. The CAIR-CA case is useful as a teaching example precisely because the empirical gap is large enough to be unambiguous, both methodologies are documented well enough to be readable, and the official-side acknowledgment of the underreporting problem is explicit in the official-side report itself. Cleaner instances of the pattern are difficult to find; this is one to study.

"Any dataset on a phenomenon whose reporting is upstream-gated by the affected party's decision to engage with the reporting channel will systematically understate true incidence — and the magnitude of the understatement will be largest for exactly the populations whose experiences are most important to measure."

05How to build datasets that survive this critique on either side

If you are running an organisation that produces a dataset on a phenomenon that is systematically under-reported through the dominant existing channels — whether you are CAIR-CA on the community-collection side, the California AG on the official-channel side, or an operator in a non-civil-rights context with the same structural problem — there are five methodological commitments that, in our reading, materially improve the dataset's defensibility under the critique the dark-figure literature provides.

First: publish the methodology in enough detail that an external reader can identify what is in the dataset and what is excluded. Both CAIR-CA's intake methodology and the California AG's reporting methodology meet this bar; many comparable datasets in adjacent domains do not. Second: publish the denominator alongside the numerator wherever feasible. A count of 154 events is more interpretable when the count of intake-screened matters and the count of total-population-at-risk are also published, because the ratio structure is what makes year-over-year and cross-jurisdiction comparisons defensible. Third: explicitly acknowledge the limits of what the dataset can support inferentially. The California AG's report does this in its methodology notes; the strongest community-collected datasets do it as well. Fourth: where a parallel dataset exists, reference it and read the gap rather than implying competition. The strongest version of CAIR-CA's reporting on this gap would explicitly compare its 154 to the AG's 24, frame the gap in dark-figure terms, and identify the joint reading. Fifth: archive the underlying microdata in a form that allows independent reanalysis. The OpenJustice portal does this for the California AG's data; the strongest community-collected datasets do it as well, with appropriate confidentiality protections.

These five commitments are domain-general. They are the operating standards of methodologically defensible measurement work in any context where the underlying phenomenon is partially observable through multiple imperfect channels — which is, on a careful reading, most contexts operators actually work in.

06What we tell operators reading this case alongside the operational frame

The companion case (cair-ca-2024-capacity-under-demand-shock) reads the same underlying data through the operational lens — what happens to a lean nonprofit when its service demand spikes by orders of magnitude on short notice. The two cases together do something neither does alone: they let the reader see how the operational and methodological dimensions of a civil-rights organisation's work are not separable in practice, even though they are separable as analytical frames.

The operational case shows a CAIR-CA that absorbed a demand shock by scaling KYR tooling, redesigning intake categories, and protecting its measurement infrastructure through the shock window. The methodological case takes that protected measurement infrastructure as given and asks what its outputs let us conclude about the underlying phenomenon being measured. Without the operational story, the methodological output looks like it was inevitable; reading the two together makes clear that the methodological output was, in part, a consequence of the operational decision to protect measurement infrastructure under shock conditions when the alternative — letting the measurement work compress to free up service-delivery capacity — would have been the easier short-term operating choice.

This is, in our reading, the structurally useful joint lesson: methodological rigor in the published outputs of a civil-rights or community-condition organisation is not free. It is a function of operational choices the organisation makes in real time about which work to protect under capacity pressure. Operators reading published datasets should treat the existence of the dataset as evidence about the producing organisation's operating culture, not just as a source of facts about the phenomenon being measured. The CAIR-CA case is unusual in that the operational story behind the published methodology is itself documented well enough to be readable; in most adjacent cases it is not, and the methodological consumer is left to infer the operational backstory from the structure of the published outputs.

Three decades of California hate-crime reporting and one specific year of divergence

The timeline situates the 2024 gap in the longer methodological arc of California's official hate-crime reporting series and CAIR-CA's parallel community-collected data. The substantive value of laying it out this way is that the 6.4× gap is neither a 2024 novelty nor a one-year artifact; it is a stable feature of the comparison across reporting cycles, which is part of what makes it methodologically interpretable.

  1. 1995

    California DOJ CJSC begins annual hate-crime reporting

    Establishes the official-channel data series. Methodology notes acknowledge underreporting from the start, framing the published counts as a floor rather than a census.

  2. Late 1990s–2010s

    Steady official-channel reporting; community-collected data streams develop in parallel

    CAIR-CA and analogous community-collected reporting infrastructure mature alongside the official series. The gap is documented across the period at varying magnitudes.

  3. Pre-2023

    Stable gap regime

    The community-collected versus official-channel gap is well-established in California's bias-crime reporting; the 2024 ratio is consistent with the longer-arc pattern, which is part of what makes it methodologically interpretable.

  4. Oct 2023

    Demand shock on the community-collected side

    Documented in the operational frame (companion case): order-of-magnitude break in CAIR-CA's intake baseline. Both the official-channel and community-collected counts begin moving; they do not move proportionally.

  5. 2023 reporting year

    Hate Crime in California 2023 published

    The pre-comparison year for the 2024 figures. Establishes the 2023 baseline against which the 2024 movement on both sides is read.

  6. March 2024

    FBI v. Fikre decided 9-0 by the Supreme Court

    Gorsuch opinion on procedural due process for no-fly-list removals. Federal-side legal record of one specific dimension of the historical relationship between the affected community and federal law enforcement; relevant to the dark-figure framing without requiring any reader to take positions on underlying geopolitical questions.

  7. 2024 reporting year

    The 154 vs 24 figures

    CAIR-CA's 2024 California-specific count of 154 anti-Muslim bias events; California AG's 2024 count of 24 anti-Islamic bias events. The 6.4× ratio is the central empirical object of this case.

  8. 2024–2025

    Both data series published with their respective methodologies intact

    OpenJustice portal publishes raw 2024 microdata; CAIR's 2025 civil rights report (covering 2024) publishes the community-collected counts with category-level breakdowns. Both methodologies remain externally readable, which is what makes the joint reading possible.

What we tell operators about reading and producing measurement-gap data

We use this case in operator engagements with leadership at organisations that either produce datasets on systematically under-reported phenomena or consume those datasets as inputs to operating decisions. The category is broader than people initially recognise: any product analytics function reading a complaint or support-ticket stream as a proxy for user experience; any HR function reading formal grievance counts as a proxy for organisational climate; any public-health surveillance function reading reported case counts as a proxy for true incidence; any policy-analysis function reading regulatory complaint volumes as a proxy for industry behavior — all face structurally identical versions of the dark-figure problem.

The single most-transferable methodological commitment is the one the CAIR-CA-versus-California-AG comparison makes legible: refuse to reconcile divergent counts of the same phenomenon into agreement, and read the gap itself as evidentiary. Most operators, on encountering two data sources that disagree, default to picking one and discarding the other or to averaging them into a single point estimate. Both moves discard the most informationally rich part of the data, which is the structure of the disagreement. The dark-figure literature has spent six decades documenting why the gap is the data, and the operating practice in measurement-careful contexts increasingly reflects that lesson.

The companion case on capacity under demand shock takes the operational frame on the same underlying organisation. The two cases are designed to be read together. The methodological case answers what the data lets us conclude; the operational case answers what the organisation had to do to protect the data infrastructure that produced the conclusions. Neither question is fully separable from the other in practice, and operators in adjacent contexts will face both versions of the problem.

Three methodological questions the case raises — with our answers

These are the questions an operator producing or consuming a measurement-gap dataset should be ready to answer about their own situation. We answer each from the CAIR-CA evidence rather than leaving them open, with the standard 'condition under which we would revise' framing.

01

When two data sources on the same phenomenon disagree by a factor of 6, which one should an operator believe?

Why It Matters

Our reading: neither, in the sense the question implies. The methodologically defensible move is to refuse the framing — both sources are partial measurements of overlapping but non-identical underlying populations, and the disagreement is the data. The official-channel count is, on the California AG's own acknowledgment, a floor; the community-collected count is one defensible point estimate of where the true figure sits, sourced from a methodology designed to capture the population the official channel misses, with its own well-documented limitations on the upper bound. An operator who needs to make a decision against this data should make it on the joint reading: the underlying phenomenon is at least at the official count, plausibly at the community count, and the gap between them is a separate informative variable about the strength of the affected community's relationship to the official-channel reporting infrastructure. The condition under which we would revise: if a third independent measurement source — a victim-survey, an academic study, an audit study — converges materially closer to one of the two existing counts, the joint reading would update toward the convergent estimate. No such third source exists for California anti-Muslim bias events in 2024, which is what leaves the joint reading as the most defensible position.

02

How should an operator producing a community-collected dataset position it relative to the official-channel dataset on the same phenomenon?

Why It Matters

Our reading: explicitly, with the gap framed in dark-figure terms, with both methodologies published in enough detail that external readers can perform the joint reading themselves. The strongest version of community-collected reporting in this category does not present itself as a competing estimate to the official-channel data; it presents itself as a complementary measurement designed to capture the population the official channel structurally misses, and it identifies the gap as the joint informative output. CAIR-CA's reports approach this standard but do not fully reach it on the public record; the strongest version of their reporting would explicitly cite the AG's count alongside their own and frame the joint reading. The condition under which we would revise this recommendation: in jurisdictions where the official-channel acknowledgment of underreporting is absent or weak, the community-collected dataset has to do more of the methodological work itself, and the explicit-comparison move becomes correspondingly more important. California's case is somewhat unusual in that the official-side acknowledgment is unusually explicit, which lowers the burden on the community-collected side to make the dark-figure case from scratch.

03

Where else in an operator's typical data environment does this measurement-gap pattern appear, and how should they treat it when it does?

Why It Matters

Our reading: more places than operators initially recognise. Any data stream where the reporting decision is upstream-gated by the affected party's choice to engage with the reporting channel exhibits the same structural pattern. Concrete examples: support-ticket volumes as a proxy for product issues (gated by the user's decision to contact support); formal-grievance counts as a proxy for organisational climate (gated by employee perception of grievance-channel safety); product-review volumes as a proxy for user experience (gated by the user's motivation to write a review); regulatory complaint counts as a proxy for industry behavior (gated by consumer awareness of the complaint mechanism). In each case, the methodologically defensible response is the same as in the CAIR-CA case: invest in a parallel measurement stream designed to capture the population the gated channel misses (proactive surveys, third-party intermediaries, victim-of-defect outreach), publish both streams with methodologies explicit, and read the gap as informative about the gating mechanism rather than as a defect to be eliminated. The condition under which we would revise: if the cost of the parallel measurement stream is so high that it cannot be sustained at meaningful scale, the operating recommendation becomes to apply explicit dark-figure adjustments to the gated-channel data and to acknowledge the resulting estimates as ranges rather than point figures. Pretending the gated channel is a census when it is not is the failure mode the dark-figure literature exists to prevent.

Four engagements we run against this thesis.

None of these require a multi-year transformation. Each is scoped to land specific operating-model improvements with a measurable result.

01

Measurement-gap diagnostic for your own data environment

We work with leadership to identify which of your operating-decision-relevant datasets exhibit the structural measurement-gap pattern — gated reporting channels, partial population coverage, systematic exclusion of high-friction reporters. The deliverable is the specific list of datasets where decisions are currently being made on counts that are floors rather than censuses, and where the operating recommendation is a parallel measurement investment rather than a single-channel improvement.

02

Parallel-measurement-stream design

For datasets where the measurement-gap pattern is identified, we help leadership design the parallel measurement stream — survey methodology, third-party-intermediary partnership, audit-study design — that captures the population the existing channel misses. The design includes the methodological commitments (denominator publication, methodology transparency, joint-reading framing) that the dark-figure literature establishes as the standard for defensible work in this category.

03

Joint-reading reporting framework

Once parallel measurement streams exist, the operating challenge becomes how to publish them together in a way that surfaces the joint reading rather than implying competition between sources. We help organisations design the reporting framework — the report structure, the methodology disclosures, the framing of the gap as evidentiary — that makes the joint reading the natural one for external consumers of the data.

04

Cross-domain pattern transfer

The CAIR-CA case is one instance of a measurement pattern that recurs across product analytics, employee experience, public-health surveillance, and policy analysis. We help leadership identify where the same pattern appears in their own context and how the methodological commitments developed in well-documented cases like this one can be transferred without rebuilding the methodological reasoning from scratch.

If this maps to what you're carrying, let's talk.

Most engagements start with a 30-minute conversation about the specific operating-model question on your desk this quarter.