Cyberhaven Platform Overview

Cyberhaven is a data security platform that protects sensitive data from unauthorized access and transfer. It monitors how users access, use, and move data across endpoints, browsers, and cloud services, then applies policies to stop risky activity.

Who this guide is for

This introduction is for security architects, security engineers and policy authors, security analysts, and insider‑risk or compliance teams who need a high‑level understanding of how Cyberhaven works and how the documentation is organized.

After reading this section, you should be able to explain how Cyberhaven compares to traditional DLP and DSPM tools, describe the main data pipeline (collect → analyze → act), and identify the core objects you configure in the Console.

Cyberhaven at a glance

Cyberhaven addresses gaps in legacy DLP tools by tracking data movement across endpoints, browsers, and SaaS applications. It records how data is created, used, copied, and shared, then builds lineage that shows where the data came from, how it changed, and where it went. Analysts and administrators use this context to enforce policies, detect risky behavior, and investigate incidents.

Core capabilities

Data Detection and Response (DDR): Monitors and controls data in motion and data in use in real time. Policies can block, warn, or monitor activity based on data classification, user, application, and destination.
Data Security Posture Management (DSPM): Scans data at rest in connected repositories to discover, classify, and catalog sensitive content. Results build an inventory of important data assets and where they are stored.
Security for AI: Tracks how users interact with GenAI applications. It shows which AI tools are in use, how much sensitive data flows into and out of them, and where policy gaps exist.
Linea AI (optional add‑on): Uses Cyberhaven lineage to detect anomalous dataflows, even without explicit policies. It scores risk, prioritizes incidents, and generates natural‑language summaries to speed investigations.

How Cyberhaven works

Cyberhaven processes data in three planes that form a single pipeline.

Collection: Endpoint Sensors, Browser Extensions, and Connectors collect telemetry and, when configured, content. This covers user activity on devices, in browsers, and in SaaS or cloud services.
Analysis: Backend services classify data using content inspection, synced labels, and document tags. They store activity in a graph, build lineage across systems, and enrich events with AI‑based risk assessments such as Linea AI.
Action: The policy engine evaluates user actions against configured policies. It decides whether to monitor, warn, or block and creates incidents and summaries in the Console for follow‑up.

DDR events and incidents appear in Risks Overview and Incidents, while DSPM scan results appear in Discovery.

Key concepts at a glance

Data states and classification

Cyberhaven classifies data across three states so you can apply consistent policies everywhere.

Data in motion: Data moving between applications, cloud storage, and external devices. Examples include uploads, downloads, email attachments, and transfers to removable media.
Data in use: Data that users are actively working with on endpoints, in browsers, and in SaaS applications. Examples include viewing, editing, printing, and copy‑paste actions.
Data at rest: Data stored on endpoints and in connected cloud repositories. Cyberhaven scans these locations so it can classify files before they move.

Classification uses signals from Content Identifiers, Exact Data Matching (EDM), and Document Tags, and contextual metadata such as user, application, and destination.

Events

An event is a single user or system action involving data. Examples include copy, upload, download, open, share, print, and send.

Each event records metadata such as the user, application, source path, destination, hash, and timestamp. Cyberhaven stores events in a graph backend, which powers lineage, search, policies, and analytics.

Dataflows

A dataflow is a chain of events that describes how a specific item of data moves and changes over time. For example: download → rename → compress → upload.

Cyberhaven evaluates policies and calculates risk at the dataflow level. This ensures enforcement and investigations reflect the full journey of the data, not just a single step.

Data lineage

Data lineage is the end‑to‑end history of a piece of data from its origin to all copies, transformations, and destinations.

Cyberhaven uses a graph database to connect related events and reconstruct this history. The Console presents lineage with detailed event metadata and a legend to help interpret sources, actions, and destinations.

Datasets

A dataset classifies data based on its origin or attributes such as, source location, application, file type, content, or labels. Datasets are key inputs to policies and analytics.

You can build datasets from content rules, synced labels (for example, Microsoft Purview sensitivity labels), or other metadata. Policies then reference these datasets to decide which data to protect.

Policies

Policies define when dataflows from specific datasets are allowed, monitored, warned, or blocked.

Cyberhaven supports two main policy types:

Data Protection Policies: Real‑time enforcement policies. These can block or warn on user actions, create incidents, send notifications, and capture screenshots.
Content Inspection Policies: Policies that control when to inspect or capture content for data in motion. They do not warn or block users but expand visibility into sensitive content.

Incidents

An incident is a grouped set of risky dataflows. Incidents are created when a policy matches or when Linea AI flags an anomalous flow.

The Incidents page shows incident status, policy response, user response, and a detailed incident flow. Analysts can expand an incident to see lineage, matched content attributes (where available), and any AI‑generated summaries and risk scores.

Linea AI

Linea AI works alongside the policy engine to improve detection and triage.

Linea AI can:

Detect anomalous dataflows based on historical behavior, even without matching policies or datasets.
Assign an AI‑assessed risk level to incidents so teams can focus on the most critical issues.
Generate natural‑language summaries and recommendations to guide next steps.

Linea AI runs inside each customer’s private Google Cloud Platform environment. It does not use customer data to train external models.

Licensing

Cyberhaven is available in Standard, Advanced, and Enterprise editions. All editions include core DLP and insider‑risk capabilities. Advanced and Enterprise add features such as full Linea AI capabilities, Security for AI, and extended analytics.

For details on which capabilities are included in your environment, see Licensing and Platform Features.

Who this guide is for​

Cyberhaven at a glance​

Core capabilities​

How Cyberhaven works​

Key concepts at a glance​

Data states and classification​

Events​

Dataflows​

Data lineage​

Datasets​

Policies​

Incidents​

Linea AI​

Licensing​

What to read next​