A knowledge base that gets smarter with every resolved incident.

Standard Oracle support produces a resolved ticket. The Oracle AI platform produces a resolved ticket plus a vectorized knowledge record that makes the next identical incident resolve in seconds.

Why standard chunking fails for Oracle diagnostics.

Retrieval-augmented generation works well for prose documents. Oracle diagnostic outputs are something different. A TKPROF trace file, a SQLHC report, or concurrent manager diagnostic output is structured, verbose, and hierarchical. A single diagnostic run can produce thousands of lines. Important signals — error codes, PASS/WARN/FAIL results, table-level statistics — are embedded in specific positions within a predictable structure.

Simple character-count chunking destroys this structure. An error message spanning two lines gets split across chunks. A WAIT event explaining a performance issue gets separated from the SQL that caused it. The retrieved chunk answers a different question than the one being asked.

The Oracle AI platform implements Semantic and Structure-Aware Chunking specifically designed for Oracle diagnostic output — preserving the diagnostic hierarchy, respecting Oracle-specific document boundaries, and enriching every stored chunk with metadata that enables precision retrieval rather than pure semantic similarity matching.

Vector Database Selection

The vector database is the platform's long-term memory. Selection criteria for an Oracle ERP environment: self-hosting, metadata filtering support, and scale.

✓ Recommended

Milvus

Open-source, self-hosted, designed for enterprise-scale vector workloads on Kubernetes or Docker. The recommended choice for Oracle environments where data sovereignty is required.

  • Billions of vectors — no practical scale limit for Oracle diagnostic history
  • Advanced metadata filtering — query by Oracle version, module, and error code simultaneously
  • Self-hosted on Docker or Kubernetes — stays behind the corporate firewall
  • No managed service dependency — no data leaves the environment
  • Production-grade performance at diagnostic knowledge base scale
Alternative

pgvector

Postgres extension adding vector search to an existing PostgreSQL instance. Recommended when the knowledge base is under 10 million vectors and operational simplicity matters.

  • Single database for workflow state and vector memory
  • Familiar Postgres tooling for backup and administration
  • Lower operational overhead for smaller deployments
  • Performance degrades above 10 million vectors
  • Less advanced metadata filtering than Milvus
Decision rule: Start with pgvector if the Oracle environment has fewer than 50 active users and diagnostic history is expected to stay under 10 million vectors in 12 months. Migrate to Milvus when that threshold is crossed or when advanced metadata filtering becomes a retrieval accuracy requirement.

Hierarchical Chunking Strategy

Oracle diagnostic outputs are processed through a three-level hierarchical chunking model. Each level serves a distinct retrieval purpose.

Level 1
Document

Full Diagnostic Session

The complete diagnostic run — every section, every check result, every SQL output — is stored as a single parent record. This is the authoritative record of what the platform examined. It is never retrieved directly for LLM context (too large), but it is the source of truth for audit purposes and the parent reference for all child chunks.

Level 2
Sections

Oracle-Specific Section Boundaries

Regex-based splitters break diagnostic output at Oracle-specific structural markers: PARSING IN CURSOR, EXEC, STAT, SQL ID markers, and module boundaries. Each section becomes an independent chunk with a reference back to the Level 1 parent. Section-level chunks are the primary retrieval target for module-specific queries.

Level 3
Semantic

Recursive Semantic Splitting

Within each section, a recursive character splitter produces chunks of 300–500 tokens with a 15% overlap between adjacent chunks. The overlap is non-negotiable: Oracle error messages routinely span two or three lines, and a clean split would separate the error code from its context. The 15% overlap ensures any two-line Oracle message is fully represented in at least one chunk.

Context Enrichment — Metadata Schema

Raw vector similarity alone is insufficient for Oracle diagnostic retrieval. An ORA-01652 resolution for EBS R11i is not applicable to an R12.2.10 environment. Every chunk stored in Milvus is enriched with a structured metadata envelope that enables filtered retrieval.

FieldExample ValuePurpose in Retrieval
oracle_error_codeORA-01652Primary filter — retrieval is always scoped to the exact error code before semantic similarity ranking. Prevents cross-error false matches entirely.
erp_moduleAPModule scoping — ensures a GL period close resolution is not returned for an AP invoice hold query.
erp_versionEBS R12.2.10Version filtering — Oracle data structures and fix paths differ significantly across EBS versions and between EBS and Fusion.
source_scriptSYS-03Script lineage — identifies which diagnostic script from scripts.williamagreen.com produced this chunk.
resolution_typeTablespace ExtensionResolution classification — distinguishes Configuration Fix, SQL Optimization, Tablespace Extension, Data Correction, and Oracle SR Escalation.
oracle_doc_id123456.1Oracle Support Note reference — when a MOS note was retrieved and applied, its Doc ID is stored for future retrieval.
environment_idPROD-US-01Environment scoping — critical for organizations running multiple EBS environments where configuration differences affect resolution paths.
Oracle AI RAG Retrieval Pipeline — 5 stages: Input Vectorization, Milvus Vector Search, Metadata Filter (ORA-01652 + EBS R12.2.10 + AP), Top 3 Cases plus MOS Note, LLM Final Synthesis

The 5-stage RAG retrieval pipeline. Stage 3 — the Metadata Filter — is what separates Oracle-grade retrieval from generic RAG: it scopes results to the exact ERP version, module, and error code.

The n8n RAG Retrieval Pipeline

The RAG pipeline runs inside n8n as part of the intelligence layer. It activates in Step 4 of every incident resolution — after diagnostic scripts have run and the LLM has produced its initial root cause analysis.

1
n8n → Embeddings Node

Input Vectorization

n8n converts the root cause analysis output — not the raw diagnostic data, but the structured analysis — into a vector embedding. Using the analyzed output ensures the search query reflects intent rather than raw numbers.

2
n8n → Milvus Query

Vector Search with Metadata Filter

n8n queries Milvus using the generated embedding, returning the top N semantically similar chunks. Results are immediately filtered by the current incident's ERP version and module — a result from the wrong version is ranked below a less similar result from the exact matching environment.

Milvus Search
collection.search(
  data=[query_vector],
  anns_field="embedding",
  param={"metric_type": "COSINE", "nprobe": 16},
  limit=10,
  expr="erp_version == 'EBS R12.2.10' and erp_module == 'AP'"
)
3
Milvus → Context Injection

Top 3 Historical Cases Injected

n8n retrieves the top 3 matching historical cases and assembles them alongside the Oracle Support Notes retrieved by the Playwright agent. The most recent validated internal fix takes precedence over generic documentation in the final LLM context.

4
n8n → LLM Final Synthesis

Resolution Path Generated

The assembled context is passed to the LLM with a structured prompt constraining output to the evidence presented. The model is explicitly instructed to base its recommendation on the retrieved cases and support notes — preventing hallucinated resolution paths not grounded in validated Oracle fixes.

RAG Performance Metrics

Three metrics are tracked continuously to ensure the RAG pipeline remains grounded and useful as the knowledge base grows.

Context Precision
Does the retrieved Oracle Support Note actually address the detected error code? Measures relevance of what was retrieved.
ORA-01652 → TEMP note ✓
ORA-01652 → UNDO note ✗
Context Recall
Did the pipeline find a historical match when one existed? Measures coverage — whether the knowledge base is actually being used when relevant history is present.
Prior resolution in DB → retrieved ✓
Prior resolution in DB → not retrieved ✗
Faithfulness
Does the AI's final recommendation strictly follow the retrieved Support Note? Detects hallucinated resolution steps not grounded in retrieved evidence.
Recommendation cites Note ✓
Suggests undocumented fix ✗
Target benchmark: 40% historical case match rate within 90 days of deployment. In mature deployments covering the same environment for 12 or more months, match rates above 70% are expected for common error codes. The platform gets faster as it learns.

Continue Exploring the Platform

Start the Conversation → Read: Why We Chose Milvus

Ready to talk about autonomous Oracle support?

We're talking to Oracle teams who want to be part of shaping the platform.

Get in Touch →