Designing effective agents

Select the agent’s orchestration model

The orchestration layer controls how the agent decides which knowledge sources and tools to use, and when. When you are building an agent in Copilot Studio, you can use either generative orchestration or classic (deterministic) orchestration. This design choice affects the capability, performance, and cost of your agent, so it is important to understand how each option works, as well as the benefits and limitations of each.

The following concepts are important in understanding how orchestration models work:

  • An intent represents the goal or purpose behind the user input. It answers the question, “What does the user want to do?” Intents are used by the agent to understand the user’s request and trigger the right answer or action.

  • An entity is a specific piece of information that can be extracted from an unstructured user input. It represents a known data type or category—such as a date, location, or product name—that the agent needs to complete a task.

  • Entities are used in slot filling, which is the process of the agent collecting all the information it requires before it can act.

Consider the following example: A retailer has an agent on its website that can answer user questions about products and assist with returns. To process a return, the agent needs to know the product, size, purchase location, and condition. A customer visits the website and enters the following information into the chat:

“Hi, I’d like to return the AS49 waterproof jacket I bought from the Chicago store. It’s a men’s size L and it still has the tags on.”

The orchestration model would identify the following intent and entities from this input:

  • Intent: Product return

  • Entities:

    • Product name: AS49 waterproof jacket

    • Purchase location: Chicago store

    • Size: Men’s L

    • Condition: Tags on

The way in which your agent determines the intent, identifies the entities, and performs the slot filling will differ depending on whether you are using generative or classic orchestration.

Generative orchestration

When you create a new agent in Copilot Studio, it will be configured to use generative orchestration by default. Generative orchestration uses generative AI to reason over the instructions, knowledge, and tools in the agent to determine what to do next.

Generative orchestration works with intents by using a large language model to decide what to do based on the user input. It reasons, “Given this user input, what is the best action or response to take?” It can generate an answer from knowledge, trigger a topic, call a tool, or ask a clarifying question to get more information from the user if the input is ambiguous. It can chain together multiple topics, tools, and knowledge, and provide a consolidated answer based on the combined outputs of all these things. It can also handle vague or complex inputs that include multiple intents, because it doesn’t rely on matching against a strict set of examples.

Generative orchestration handles entity extraction and slot filling by using the LLM and reasoning to understand which information is needed and whether it has already been provided. It looks at the meaning of the user input in context and extracts relevant information based on the task it needs to execute. It can reason to extract entities without needing structured entity labels or definitions. It can also automatically ask follow-up questions to prompt the user for the additional information it needs. When you build an agent that uses generative orchestration, you don’t need to build conversational flows that specifically ask for each piece of information in turn.

Generative orchestration is more “clever” and powerful than classic orchestration. However, it is less predictable than a classic orchestration model, where you manually train and classify intents. It also has a higher consumption cost than using classic orchestration.

If you plan to use generative orchestration, it is important to write meaningful instructions as well as clear descriptions for every trigger, knowledge source, and tool you add to your agent. Generative orchestration uses these descriptions to understand the purpose of each tool or component, and to reason over them to decide what to use.

Classic orchestration

You can choose to build your agent in Copilot Studio with classic orchestration, which uses a natural language understanding (NLU) model to orchestrate the agent. Classic orchestration is less sophisticated than generative orchestration and has some limitations, which we will explore in this section. However, it is more predictable, controllable, and cost-effective, and it can be a good choice for the right use case.

Classic orchestration uses an NLU model to identify intents by classifying the user input into predefined categories. Unlike the generative orchestration model, which begins by reasoning over the best course of action to take, classic orchestration starts by determining which intent the user input matches.

When you build your agent using classic orchestration, you set up intents using topics—specifically, by providing 5–10 trigger phrases that the NLU uses to understand the intent for each topic. When the orchestration model receives the user input, it will classify the intent and match it to the most suitable topic.

If the model doesn’t find a suitable topic match (or if you haven’t authored any topics in the agent), then it will use Conversational boosting—the fallback topic—to answer the question. If you have added a knowledge source to your agent (as you did for the quick agent you built in Chapter 1), it will use that knowledge source to generate an answer using generative AI. With classic orchestration, generative AI is used to reason over knowledge to provide answers, but not to determine the course of action taken by the agent itself.

Intent classification when you are using classic orchestration is more controlled than when you are using generative orchestration, because you are training the NLU with the trigger phrases you configure for each topic. If you design your agent using classic orchestration, you will need to carefully map out the intents so that they are disambiguated. You will also need to understand the likely user inputs and author trigger phrases that match the anticipated user intents.

Classic orchestration extracts entities from the user input based on the defined list of entities to which it has access. Copilot Studio provides out-of-the-box entities for commonly used pieces of information, such as age, location, and color. In addition, you can configure your own custom entities for things specific to your industry or business, such as product name, size, and condition, as shown in the earlier example.

If you design your agent using classic orchestration, you should understand and consider the limitations of this model, and factor these into your design and configurations:

  • An agent designed with classic orchestration can detect only one intent per user input. Returning to the earlier example, suppose the user asked, “I want to return the AS49 waterproof jacket I bought from the Chicago store and what are the opening hours of that store.” The agent would not be able to detect and handle both intents (initiate return, ask about opening hours).

  • If the user input includes multiple instances of an entity, this model won’t be able to disambiguate them. For example, if the user states, “I want to return the AS49 waterproof jacket and the HL23 hiking boots,” two products are mentioned.

  • The only customization you can do when working with the out-of-the-box NLU model is to configure topics (by providing a few sample trigger phrases) and to configure custom entities. The agent will learn to match those phrases in the topics or extract entities from user input using the general-purpose NLU model. You can’t control or customize the actual underlying AI model.

  • You need to author question nodes in topics to prompt the user for any required information. This kind of model will not automatically identify or prompt the user for missing entities.

  • You need to author message nodes in topics to respond to the user after they have provided information.

  • If you want your agent to use tools to take actions, you need to set up a topic to explicitly call that tool after the intent has been detected and that topic triggered.

  • You cannot build an autonomous agent using classic orchestration.

Generative orchestration is the default option for new agents created in Copilot Studio. If you want to switch to classic orchestration, you can make the change with a simple toggle switch.

To switch to classic orchestration

  1. Open your agent and find the Orchestration section. You will see that the option for using generative AI to determine how best to respond to users and events is set to Enabled.

  2. Switch this toggle to Disabled. Copilot will take a moment to process and save the change.

  3. When this process is finished, the changes will be saved, and the generative orchestration toggle will be set to Disabled.

Consider using classic orchestration when your agent use case is relatively straightforward, or when you are working with explicitly designed processes, such as password resets, leave requests, address updates, or policy checks. It is the ideal choice for scenarios where you need full control and predictability, whether for compliance or for validation. It is also a more cost-effective solution than generative orchestration.

Azure CLU integration

If you want to build and train your own language model and use that in Copilot Studio instead of the one provided, you can switch to a custom conversational language understanding (CLU) model built in Azure. Building your own CLU model allows you to define exactly how intents work and which entities to extract, and to train the model with as many examples as you want. You can include domain- or industry-specific vocabulary, complex phrases, and synonyms. You can also create entity lists or custom extraction rules.

The custom CLU model will improve how well intents and entities are recognized, but it still has the limitation that it will detect only one intent per user input. Unlike the NLU model, it can support multiple values for a single entity. You can train the entity recognizer to extract both values separately, so it could handle our earlier example where the user asked to return two different products in one input.

To integrate a CLU model with a Copilot Studio agent, you must have a fully trained CLU model that includes intents for the system topics and any custom topics you have authored in your Copilot Studio agent. You will map the CLU intents to those topics. The entities you have built in your Azure CLU will be imported into Copilot Studio and can be used alongside the prebuilt, out-of-the-box entities.

Using a CLU model is a good choice when you have the skills to develop a custom model in Azure for a complex, domain-specific, or industry-specific language, or when the intents sound so similar that the built-in NLU model cannot understand them well enough. It will provide more precise and accurate responses, particularly for specialized language, and it is a good option for improving intent and entity recognition in languages other than English. However, you need to invest the time and effort in scaling, managing, and monitoring your CLU model, and must also pay the associated consumption costs in Azure.