Intelligent Process Decomposition: A Pragmatic Approach to BPMN Generation

In our previous article, we explained why Large Language Models struggle to generate valid BPMN diagrams. The problems are substantial: structural complexity, context limitations, the gap between knowledge and execution, non-determinism, and layout challenges.

But these challenges can be solved. We developed a first prototype, that works with current technology by addressing each limitation systematically.

The specification overhead problem

LLMs need BPMN specification knowledge to generate valid diagrams. But loading the complete BPMN 2.0 specification into every prompt creates substantial overhead, increasing costs, latency, and token consumption while reducing space for actual process descriptions. Models struggle to retain information in the middle of long contexts, making massive prompts impractical.

We solved this with a knowledge base strategy. Instead of embedding complete BPMN documentation in prompts, we maintain a dedicated knowledge base with curated extracts: valid elements, element rules and constraints, and layout guidelines.

At generation time, the system queries this knowledge base to retrieve only the specification fragments needed for the specific request. This applies Retrieval-Augmented Generation principles (Gao et al., 2023) to the foundational knowledge required for artifact generation. The result: precisely calibrated prompts containing required knowledge without unnecessary overhead.

Matching complexity to capability

Not all processes need the same architectural approach. Simple workflows require straightforward generation. Complex processes demand sophisticated decomposition. Forcing everything through a single pattern wastes resources or produces poor results.

We implemented adaptive complexity routing that evaluates incoming process descriptions based on logical process blocks. This determines which of two processing paths is most appropriate:

DIRECT Path (≤3 logical processes): Simple processes execute through direct generation, where the system provides a single comprehensive prompt containing the user’s description, relevant BPMN elements from the knowledge base, and generation instructions. The model receives sufficient context to generate a complete diagram in one pass, avoiding unnecessary architectural overhead.

ORCHESTRATION Path (>3 logical processes): Complex processes follow a multi-stage pipeline coordinated through LangChain (LangChain, n.d.), decomposing the generation tasks into specialized stages. This decomposition aligns with the findings that breaking down questions into subtasks improves the fidelity of model reasoning (Anthropic, 2023).

The routing prevents both extremes: avoiding heavyweight orchestration for simple problems while ensuring complex workflows receive appropriately sophisticated handling.

The Orchestration Pipeline for Complex Processes

For complex processes, the system employs a coordinated pipeline with three specialized stages.

First, process analysis. A ProcessAnalyzer receives the natural language description and queries the knowledge base to identify all relevant BPMN elements and their interconnections. It produces a structured representation identifying subprocess boundaries, required participants, gateway types, and event triggers. Most critically, it decomposes complex processes into manageable subprocesses that each fit within reliable context windows.

Second, subprocess generation. Individual SubprocessGenerators handle each subprocess independently, receiving focused context about their specific portion of the overall process. Each generator produces valid BPMN XML for its subprocess, working within context limits where attention remains reliable. This decomposition directly addresses the context window limitations that cause single-pass generation to fail on complex processes.

Third, integration and validation. A Subprocess Integrator assembles the complete diagram, ensuring proper connections between subprocesses, consistent element IDs, and valid overall structure. Validators then check syntactic correctness, semantic compliance, and layout quality. The system attempts automatic repair for common issues before final output.

Our Model in an overview:

Template-Based Refinement Through Similarity Matching

Specification knowledge alone provides insufficient guidance for generating layouts that align with professional standards. We address this with embedding-based template retrieval.

The system maintains a case library of validated BPMN diagrams paired with the descriptions that produced them. We embed the user descriptions in a vector database alongside references to corresponding valid BPMN outputs. Research shows that providing structurally similar demonstrations is crucial for effective learning.

When users submit new process descriptions, the system embeds the description, retrieves semantically similar historical descriptions, and passes corresponding BPMN templates to the integrator. The integrator operates in adaptation mode: “This is a structurally similar process; adapt this template to match current requirements.”

This approach combines specification knowledge with concrete examples of valid, professional-quality outputs that match the process domain. The benefits include reduced logical and visual errors, improved adherence to layout conventions, and faster generation for processes resembling previously solved cases.

Potential and Research Directions

This knowledge base-augmented, adaptively routed approach addresses the practical constraints that limit single-pass generation:

Specification precision without overhead: Knowledge base queries provide specification accuracy without bloating prompts or overwhelming context windows.
Architectural flexibility: Complexity routing prevents over-engineering simple cases while ensuring complex processes receive sufficient structure. Resources go where they’re needed.
Quality through examples: Template retrieval brings professional-standard outputs within reach without requiring exhaustive specification prompts or extensive manual refinement.

The integration of knowledge bases, complexity routing, specialized agents, and case libraries represents a methodological shift. Rather than asking “how can a single model better understand BPMN?”, we ask “how can we systematically provide models with the precise knowledge, structure, and examples they need?”

This perspective suggests that reliable automated diagram generation may depend less on fundamental LLM improvements and more on sophisticated knowledge engineering. The current implementation demonstrates viability across simple and moderately complex processes, with ongoing work showing substantial potential for production-quality generation.

References

Anthropic. (2023). Question decomposition improves the faithfulness of model-generated reasoning. https://www.cdn.anthropic.com/8154fb1d828cdc390dc1fa442d84034948679c47/question-decomposition-improves-the-faithfulness-of-model-generated-reasoning.pdf

Gao, Y., et al. (2023). Retrieval-augmented generation for large language models: A survey. arXiv. https://arxiv.org/pdf/2312.10997

LangChain. (n.d.). LangGraph overview. LangChain documentation. Retrieved January 14, 2026, from https://docs.langchain.com/oss/python/langchain/overview#langgraph

Moritz Maier is a tenure track professor for process analysis and digitization at the Institute for Data Applications and Security (IDAS) within the Department of Technology and Computer Science at the Bern University of Applied Sciences.

Stefan Grösser is Professor of Decision Sciences and Policy and heads the Management Science, Innovation and Sustainability research group at BFH Technology & Informatics. He lectures in the Master of Engineering (MSE) program and works on several research projects in the fields of simulation methodology (system dynamics, agent-based modeling, machine learning), decision-making using artificial intelligence (decision-making and management science), and circular economy (circular economy, circular business models). His industries of focus are the solar, energy, and healthcare sectors. He also contributes to modern learning technologies.