AI Governance ยท Spoke Read

Data Classification for AI Tools: A 5-Tier Framework for CEOs

34.8% of corporate data entered into AI tools is now sensitive, up from 10.7% two years prior. The classification scheme that turns the risk surface into a single operator decision per document.

By Harrison Painter May 10, 2026 Updated May 10, 2026 8 min read

Data classification for AI is the practice of sorting your organization's information into sensitivity tiers and specifying which AI tools may process which tiers under which controls. The 5-tier scheme runs Public, Internal, Confidential, Restricted, and Regulated or Prohibited. Tier 1 may enter any AI tool. Tier 5 may enter zero consumer AI tools and only enters enterprise AI under a signed Business Associate Agreement with tenant isolation. The classification scheme converts the AI input decision from an employee judgment call into a documented organizational rule. NIST AI Risk Management Framework GenAI Profile (NIST-AI-600-1) is the regulator-grade standard underneath it. The discipline pays for itself the first time it prevents a single PHI submission to consumer ChatGPT.

The data going into AI tools is getting more sensitive, fast

Cyberhaven Labs analyzed seven million workers across 2025 and found that 34.8% of corporate data employees put into AI tools is sensitive, up from 27.4% one year earlier and 10.7% two years prior. The most common sensitive categories: source code at 18.7%, R&D materials at 17.1%, sales and marketing data at 10.7%. The trend line is accelerating.

The IBM 2025 Cost of a Data Breach Report measured the consequence. 20% of organizations suffered a shadow AI breach in 2025. Those breaches added an average $670,000 to breach costs, took six days longer to detect, and exposed customer PII in 65% of cases and intellectual property in 40%. 97% of AI-breached organizations lacked AI access controls. 63% had no AI governance policy.

Healthcare carries the heaviest exposure for the fourteenth year running: $7.42M average breach cost in 2025, 279-day average resolution, $185 per record. HIPAA requires tracking 100% of patient data access. Only 35% of healthcare organizations can see their AI usage. Every untracked ChatGPT query containing patient information is a federal HIPAA violation regardless of whether external disclosure ever occurs.

What is data classification for AI use

Data classification for AI use is the practice of categorizing information by sensitivity and specifying which AI tools may process which categories under which controls. NIST defines data classification through a three-tier impact-based scheme (low, moderate, high impact) based on potential damage from unauthorized disclosure. Industry practice extends it to four or five tiers. The NIST AI Risk Management Framework GenAI Profile (NIST-AI-600-1, July 2024) instructs organizations to identify PII, PHI, PCI, secrets, and regulated data, and define AI-allowed versus AI-blocked categories before AI touches any data.

The categories that show up most often in regulated industries:

  • PII (Personally Identifiable Information). Any data that can identify a specific individual; direct identifiers (name, SSN, driver's license) and indirect identifiers (date of birth plus ZIP code).
  • PHI (Protected Health Information). A subset of PII regulated under HIPAA; any individually identifiable health information held by a covered entity or business associate.
  • PCI (Payment Card Industry data). Cardholder data regulated under PCI DSS; primary account number, cardholder name, expiration date, service code, sensitive authentication data.
  • GLBA Nonpublic Personal Information (NPI). Personally identifiable financial information collected by a financial institution that is not publicly available.
  • FERPA-protected student records. Educational records from agencies receiving federal funding.
  • Attorney-client privileged material. Communications between attorney and client made for the purpose of obtaining legal advice.

Classification is the act of stamping each document with the highest applicable category, then routing it through the AI policy that governs that category. The output is not a label. It is a routing decision: this document goes into Claude Enterprise; this one stays out of any AI tool; this one needs legal sign-off first.

The 5-tier data classification scheme for AI

The scheme below is the load-bearing operator deliverable. Each tier specifies what data falls in it, what AI tools are allowed, and what controls are required.

Tier 1: PUBLIC. Marketing copy already published. Public press releases. Public website content. Public job descriptions. Public regulatory filings.
Allowed in: any AI tool, including consumer tiers (ChatGPT Free, Claude Free, Gemini consumer, Copilot consumer).
Controls: standard.

Tier 2: INTERNAL. Internal drafts. Internal memos. Meeting agendas without confidential content. Internal training materials. Internal process documentation.
Allowed in: enterprise AI tier with signed Data Processing Agreement (DPA); training-data opt-out enabled; tenant isolation at the organization level. Examples: ChatGPT Enterprise, ChatGPT Team, Microsoft 365 Copilot for Enterprise, Claude Team, Claude Enterprise, Gemini for Workspace Enterprise.
Controls: DPA signed; admin-managed account; audit logging enabled.

Tier 3: CONFIDENTIAL. Customer lists without PII. Contracts without counterparty PII. Sales pipeline data. Financial projections. Board materials. M&A non-public information. Vendor pricing. Internal strategy documents.
Allowed in: enterprise AI tier with signed DPA, tenant isolation, audit logging, AND admin-controlled retention windows.
Controls: as Tier 2 plus a documented data classification stamp on each document; named human approver on first-time use of a new AI workflow with this tier of data.

Tier 4: RESTRICTED. PII (customer or employee). Financial records subject to GLBA. Payment card data subject to PCI DSS. Employee records subject to state privacy laws. Attorney-client communications without privilege waiver. Source code subject to confidentiality obligations. Trade secrets.
Allowed in: enterprise AI tier with signed DPA AND signed BAA where applicable AND tenant isolation AND admin-controlled retention AND access controls scoped to named users.
Controls: Tier 3 plus DLP scanning on input; named compliance approver on each new workflow; documented audit trail tying every input to a specific authorized purpose.

Tier 5: REGULATED OR PROHIBITED. PHI without a signed BAA. Attorney-client privileged material. Classified government data. Export-controlled technical data (ITAR/EAR). Data subject to standing court protective orders. FERPA-protected student records without consent.
Allowed in: zero consumer AI tools under any circumstance; enterprise tier permitted only with signed BAA (PHI), explicit privilege waiver (legal), or specific contractual authorization (regulated). Default posture: AI use prohibited or air-gapped on-premises LLM only.
Controls: hard technical block at the DLP layer; legal sign-off per workflow; audit trail with retention matching the underlying regulation (HIPAA: 6 years).

The classification is not a label. It is a routing decision: this document goes into Claude Enterprise; this one stays out of any AI tool; this one needs the legal team's sign-off first.

Indiana operators: where the load lands

Indiana sits at the intersection of three regulated-data verticals where AI input discipline is load-bearing.

Healthcare and life sciences. Eli Lilly is headquartered in Indianapolis with a $4.5B 2026 manufacturing investment and is actively hiring AI Governance Specialists, Privacy Counsel, and HIPAA-focused legal staff in Indiana. Roche Diagnostics' North American operations are based in Indianapolis. IU Health, Community Health Network, Eskenazi Health, and Parkview operate as the largest in-state hospital systems. Every one of these organizations is a HIPAA-covered entity with thousands of employees who carry phones with consumer ChatGPT installed. The HIPAA breach math: $185 per record, 279-day average resolution, $7.42M average healthcare breach cost in 2025. Indiana HB 1620 (introduced 2025; available bill trackers show it did not become law) adds a state-level disclosure requirement for AI use in patient-facing healthcare interactions.

Financial services. OneAmerica Financial Partners (Indianapolis-headquartered mutual holding company), Old National Bank (Evansville-headquartered, Indiana's largest bank), Salin Bank, and Centier Bank operate under GLBA. GLBA's Privacy Rule requires clear notices about information-sharing practices, including AI-driven analytics and automated profiling. Employee disclosure of customer NPI to consumer AI tools is a notification-triggering event.

Manufacturing IP exposure. Cummins (Columbus), Allison Transmission (Indianapolis), Subaru of Indiana (Lafayette), and Toyota Material Handling (Columbus) all hold proprietary process documentation and engineering trade secrets that should never enter consumer AI tools. The Cyberhaven 18.7% source-code finding applies here directly. Once trade-secret material is pasted into a consumer LLM that retains training rights, the data may lose trade-secret status under the Indiana Uniform Trade Secrets Act, which requires the holder to take reasonable efforts to maintain secrecy.

State posture. The Indiana Management Performance Hub operates a three-tier AI risk classification (High, Moderate, Low Risk) anchored to the NIST AI Risk Management Framework, governed by the Office of the Chief Data Officer and Chief Privacy Officer. The state model is the right starting point for public-sector deployments. The 5-tier operator scheme above extends it for the document-level decisions a CEO actually has to make Monday morning.

Where CEOs get this wrong

Five recurring failure patterns. Sequenced from most common to most catastrophic.

One. "We use ChatGPT Enterprise so we are safe." Enterprise tier provides tenant isolation and a Data Processing Agreement, but it does NOT automatically discharge HIPAA, GLBA, FERPA, or PCI DSS obligations. HIPAA in particular requires a signed Business Associate Agreement; OpenAI's enterprise offering does not include a HIPAA BAA by default. Microsoft 365 Copilot for Enterprise is BAA-eligible only when the underlying services are HIPAA-eligible AND the BAA is properly executed. Anthropic offers HIPAA-ready Enterprise plans behind a click-to-accept BAA, but coverage explicitly does NOT extend to Claude Free, Pro, Max, Team, Workbench, Console, Cowork, or Claude for Office. The license tier is the load-bearing detail.

Two. Treating ALL data as restricted. The over-restriction failure. Productivity collapses, employees route around the policy with personal devices, shadow AI proliferates. The Cyberhaven 34.8% sensitive-data-into-AI-tools finding suggests this is already happening at organizations whether the policy permits it or not.

Three. Treating NO data as restricted. The catastrophic failure. Samsung 2023 is the canonical case: within a 20-day window in March 2023, Samsung semiconductor engineers pasted three sets of confidential material into ChatGPT (faulty source code, a defect-detection algorithm, internal meeting recordings transcribed for minute generation). Samsung banned all generative AI tools on company devices within one month. The IBM 2025 data shows 65% of shadow AI breaches expose customer PII at $670K above standard breach cost.

Four. Failing to update classification as AI tools change. The Microsoft Copilot Zombie Data incident in 2025 proves the threat: Copilot returned cached snapshots of repository data from periods when the repos were public, even after they had been made private or deleted. A tool that was safe last quarter may not be safe this quarter. Quarterly review of the classification scheme against the AI tool inventory is the operating discipline.

Five. No incident playbook for accidental disclosure. When an employee pastes PHI into ChatGPT by mistake, there is no rewind. The breach has occurred. The playbook required: immediate revocation of the share link if applicable; HIPAA breach notification analysis (60-day window for breaches affecting 500+ individuals); state breach notification analysis (Indiana's law triggers at any unauthorized acquisition); documentation for OCR or the state attorney general; audit of the underlying control failure.

How data classification maps to The 7 Levels of AI Proficiency

Data classification for AI is a Level 4 Commander baseline competency. The design of the classification scheme itself is Level 5 Captain work.

A Level 4 Commander can read a document, classify it against the 5-tier scheme, choose the right AI tool, and apply the right controls. A Level 5 Captain designs the data architecture that makes classification tractable across hundreds of employees and thousands of documents per day: which DLP layer scans inputs, which retention windows govern audit logs, which named approvers sign off on Tier 4 workflows, and how the scheme updates as new AI tools enter the inventory.

Below Level 4, the work defaults to one of two errors: over-restriction (productivity collapses, shadow AI fills the void) or under-restriction (regulated data leaks). Both are observable in the Cyberhaven and IBM data. The 5-tier scheme pulls a team out of the binary error and into a documented per-document decision. Related reading: Level 4: The Commander and Level 5: The Captain.

Sources

Frequently asked questions

What data should I not put into ChatGPT?

Never enter Tier 5 data: PHI without a signed BAA, attorney-client privileged material, classified government data, export-controlled technical data, or FERPA-protected student records without consent. For Tier 4 data (PII, financial records, source code subject to confidentiality, trade secrets), use only the enterprise tier with a signed Data Processing Agreement, tenant isolation, and audit logging. The consumer tier of ChatGPT (free or Plus) defaults to using inputs for model training and is appropriate only for Tier 1 (Public) and selected Tier 2 (Internal) data.

Is it safe to use Microsoft Copilot with PHI?

Only Microsoft 365 Copilot for Enterprise is HIPAA BAA-eligible, and only when the underlying services (Exchange Online, SharePoint Online, OneDrive for Business, Microsoft Teams) are HIPAA-eligible and the BAA is properly executed for your tenant. The free consumer Copilot is NOT covered by Microsoft's HIPAA BAA under any circumstance. Even with the BAA in place, organizations remain responsible for risk analysis, workforce training, audit controls, and the substantive HIPAA Security Rule requirements.

What is the difference between ChatGPT and ChatGPT Enterprise for data privacy?

ChatGPT Free and ChatGPT Plus default to using inputs for model training; conversation data is retained per OpenAI's standard data policies; no Data Processing Agreement is provided. ChatGPT Team and ChatGPT Enterprise default to NOT using inputs for training, provide tenant isolation, offer a signed DPA, and support admin-managed retention windows. Enterprise adds SSO, advanced admin controls, and longer context windows. Neither tier includes a HIPAA BAA by default; healthcare-covered entities must consult OpenAI sales for BAA terms.

Do I need a Business Associate Agreement (BAA) for AI tools?

Yes, if your organization is a HIPAA-covered entity (most healthcare providers, health plans, and healthcare clearinghouses) or a business associate (vendors who handle PHI on behalf of covered entities) and the AI tool will process PHI. Without a signed BAA, every PHI submission is a HIPAA violation regardless of whether disclosure occurs. Anthropic offers a click-to-accept BAA on HIPAA-ready Enterprise plans. Microsoft's enterprise tier is BAA-eligible. OpenAI's enterprise BAA is available through sales engagement.

What is data classification for AI use?

Data classification for AI use is the practice of categorizing information by sensitivity (typically Public, Internal, Confidential, Restricted, Regulated/Prohibited) and specifying which AI tools may process which categories under which controls. The NIST AI Risk Management Framework GenAI Profile (NIST-AI-600-1) specifies that organizations should identify personally identifiable information, protected health information, payment card industry data, secrets, and regulated data, and define AI-allowed versus AI-blocked categories before AI touches any data.

How do I train employees on what data is allowed in AI?

Build training around concrete examples drawn from your own document corpus, not generic categories. Show employees three to five real documents from each tier; have them practice classifying new documents against the scheme. Pair the training with a one-page quick-reference card that names the AI tools your organization licenses and the tier each is approved for. Refresh quarterly. Tie the training to onboarding and to any new AI tool rollout. The Level 4 Commander competency in The 7 Levels of AI Proficiency is the relevant capacity benchmark.

What if an employee accidentally puts PHI into ChatGPT?

Treat this as a HIPAA breach event until ruled out. Immediate steps: revoke any share link associated with the conversation; preserve the conversation log; notify the HIPAA Privacy Officer; conduct a four-factor breach risk assessment under 45 CFR 164.402 (nature of PHI, recipient identity, likelihood of acquisition, mitigation taken). If the assessment cannot conclude low probability of compromise, breach notification is required: covered individuals within 60 days, OCR within 60 days for breaches affecting 500+ individuals, media notification for breaches affecting 500+ in a state or jurisdiction.

Does using Claude for Work or Anthropic Enterprise satisfy HIPAA?

Only Anthropic's first-party API and HIPAA-ready Enterprise plan are BAA-eligible. The HIPAA-ready configuration is enabled through organization settings with click-to-accept BAA acceptance. Anthropic explicitly does NOT cover Claude Free, Pro, Max, Team, Workbench, Console, Cowork, or Claude for Office under the BAA. BAAs signed before December 2, 2025 cover only API usage; BAAs signed after that date can cover both API and the HIPAA-ready Enterprise plan. Even with BAA in place, HIPAA's substantive security requirements still apply.

This article is informational only. It is not legal advice. Consult counsel before making compliance decisions on HIPAA, GLBA, FERPA, PCI DSS, the EU AI Act, Indiana HB 1620, or any other regulated-data obligation discussed above. Updated May 10, 2026.

Harrison Painter
Harrison Painter
AI Business Strategist. Founder, LaunchReady.ai and AI Law Tracker.

Harrison helps teams build AI systems that cut cost and grow revenue. Nearly twenty years of business experience. 2.8M YouTube views. Founder of LaunchReady.ai and the 7 Levels of AI Proficiency framework. Author of You Have Already Been Replaced by AI and The White-Collar Factory is Closing.

Connect on LinkedIn

Find your AI Proficiency level

The free 7 Levels of AI Proficiency assessment places you across seven stages of AI capability. Under ten minutes. Research-backed scoring.

Get the weekly briefing

LaunchReady Indiana delivers AI news, compliance updates, and case studies for Indiana leaders. Every Tuesday. Five minutes.

Subscribe free