Version: 25.06

Detection Rules Architecture

This overview explains how Content Identification detection rules are structured and how the different components work together to identify and classify sensitive content. Understanding this architecture is essential for creating effective content detection systems.

Component Overview

Content Identification uses a hierarchical structure of components that work together to detect and classify content:

Architecture Layers

Layer 1: Rule Packs

Rule Packs are the top-level containers that organize and package classification rules for deployment.

Key Characteristics:

XML files containing one or more classification rules
Must have unique ID and version information
Include metadata for management and deployment
Can contain shared resources used by multiple rules

Layer 2: Classification Rules

Classification Rules define the logic for identifying specific types of sensitive content.

Rule Types:

Entity Rules: Detect specific data types (SSN, credit cards, etc.)
Evidence Rules: Look for supporting evidence
Proximity Rules: Check for related content nearby
Affinity Rules: Detect content relationships
Similarity Rules: Find similar content patterns
Pattern Rules: Match specific text patterns

Layer 3: Policy Elements

Policy Elements evaluate metadata and structural properties of content.

Layer 4: Matching Elements

Matching Elements analyze the actual content for sensitive data patterns.

Data Flow Architecture

The detection process follows a structured data flow:

Evaluation Context

The evaluation context provides the data environment that rules operate within:

Rule Execution Model

Rules are executed using a multi-phase approach:

Phase 1: Pre-filtering

Quick metadata checks
File type validation
Size and format constraints

Phase 2: Content Analysis

Text extraction and normalization
Pattern recognition
Data structure analysis

Phase 3: Rule Evaluation

Policy element evaluation
Matching element processing
Logical condition resolution

Phase 4: Post-processing

Confidence calculation
Action determination
Result aggregation

Integration Points

The detection system integrates with various platform components:

Performance Considerations

Optimization Strategies

Rule Ordering: Most selective rules first
Early Termination: Stop on definitive matches
Caching: Reuse analysis results
Parallel Processing: Concurrent rule evaluation

Scalability Factors

Rule Complexity: Simpler rules perform better
Content Size: Larger files require more processing
Pattern Density: More patterns increase overhead
Context Depth: Deeper analysis impacts performance

Best Practices

Rule Design

Start with broad patterns, refine for precision
Use policy elements to filter before content analysis
Combine multiple weak indicators for stronger detection
Test rules with representative content samples

Architecture Planning

Group related rules in logical rule packs
Design for maintainability and updates
Consider localization and regional requirements
Plan for rule versioning and deployment

Performance Optimization

Profile rule performance regularly
Monitor false positive/negative rates
Optimize frequently-used patterns
Balance accuracy with processing speed

Component Overview​

Architecture Layers​

Layer 1: Rule Packs​

Layer 2: Classification Rules​

Layer 3: Policy Elements​

Layer 4: Matching Elements​

Data Flow Architecture​

Evaluation Context​

Rule Execution Model​

Phase 1: Pre-filtering​

Phase 2: Content Analysis​

Phase 3: Rule Evaluation​

Phase 4: Post-processing​

Integration Points​

Performance Considerations​

Optimization Strategies​

Scalability Factors​

Best Practices​

Rule Design​

Architecture Planning​

Performance Optimization​