Skip to main content

Data Provenance

Data Provenance labels use AI to infer who owns a file or object by combining lineage clues with content analysis. In DSPM EA the taxonomy is fixed. Cyberhaven provides definitions for internal, personal, public, or unknown data and customers can review how each determination is made without editing the label logic.

Types of provenance

  • Internal: Company owned content based on lineage and content signals.
  • Personal: Content that belongs to an individual and not the company.
  • Public: Content from public sources.
  • Unknown: Cases where ownership is not clear.

What you can do

  • View provenance assignments for any item to understand whether it is internal corporate data, personal data, public, or unknown.
  • Open a label to review the underlying AI description used for the decision so you know why the label was applied.
  • Combine provenance with other labels (for example, Data Type or Data Pattern) in Data Sensitivity rules or custom label sets.
  • Browse the label list to see counts for each type and filter or search to find a specific label fast.
  • Open a label detail to view recent match counts and unique locations so you know how often it is applied.

Limitations

  • The taxonomy and AI prompts are read only in EA; tuning and extensibility will follow in later releases.
  • Provenance uses both lineage (where the file came from) and content signals; ensure sensors are deployed to capture lineage for best accuracy.