Short answer: Yes — very feasible.
Below I give a complete, careful, production-ready plan (end-to-end) with concrete SQL + Python snippets, recommended hyperparameters, evaluation/monitoring, pitfalls & mitigations, and rollout steps — everything you need to build topic clustering + automated RCA/summarization for ServiceNow incidents so you can reduce reassignment counts.
I’ll present this as a staged pipeline. Read through once for the full flow, then use the code snippets in each stage.
1) High-level architecture (one-line)
- Aggregate and clean journal rows per incident in Snowflake → produce
document units (per-incident or per-chunk).
- Create embeddings for those documents (Cortex embeddings) and store them.
- Cluster embeddings (unsupervised) to find topic buckets.
- Build cluster-level summaries and extract resolution steps / candidate RCA using Cortex (LLM).
- Map clusters to assignment groups / possible RCA labels (rule suggestion + supervised model).
- Deploy routing/suggestion system with human-in-the-loop and measurement (A/B test to reduce reassignment).
2) Detailed step-by-step plan (with code & params)
Step 0 — prerequisites & safety
- Tables:
incident (incident metadata) and journal (one row per message).
- Make sure you have access to Snowflake Cortex functions (
COMPLETE_JSON, EMBED_TEXT) or another embedding provider.
- Decide on storage: keep original
journal raw, and create new processed tables for reproducibility.
- Define privacy policy: redact PII if needed before embeddings/LLM calls.
Step 1 — Preprocessing & aggregation (Snowflake SQL)