2025/03/28 14:38:19

使用 Java SDK 实现对话式 AI 引擎

本文向新手介绍如何使用声网 Java SDK 实现对话式 AI 引擎。

Java SDK 旨在帮助开发者更轻松集成声网 RESTful API。该 SDK 具有如下特性：

简化通信流畅：通过封装 RESTful API 的请求和响应处理，让你与 RESTful API 的通信更加简单。
保障可用性：遇到 DNS 解析失败、网络错误、请求超时等网络问题时，会自动切换到最佳域名，确保 REST 服务的可用性。
API 易用性：提供简洁易懂的 API，让你能轻松实现创建对话式智能体、停止对话式智能体等常用功能。
其他优势：基于 Java 语言编写，具有高效性、并发性、可扩展性。

前提条件

开始前请确保：

Java 1.8 或以上版本。
已参考开通服务在声网控制台完成以下步骤：
- 为你的项目开通声网对话式 AI 引擎。
- 获取 App ID：声网随机生成的字符串，用于识别你的项目和调用对话式智能体 RESTful API。
- 获取客户 ID 和客户密钥：用于在调用对话式 AI 引擎 RESTful API 时进行 HTTP 安全认证。
- 生成临时 Token：Token 也称为动态密钥，用于在加入 RTC 频道时对用户鉴权。临时 Token 的有效期为 24 小时。在生产环境中，你需要参考使用 Token 鉴权在你的 App 服务端生成 Token。
已参考实现音视频互动集成 v4.5.1 及以上版本的实时互动 SDK，并在你的 App 中实现基本的实时音视频功能。
已获取大语言模型 (LLM) 供应商的 API key 和回调 URL。
已参考文本转语音 (TTS) 供应商的官方文档获取身份认证信息（token、appid 等）并了解相关参数配置方式。

创建项目并安装 SDK

在你的 IDE 中创建一个 Java 项目。以 IntelliJ IDEA 为例，设置你的项目名称、保存路径，并将 Build system 设为 Maven。完成创建后，点击 Create。

在项目的 pom.xml 文件中的 <dependencies> 区域添加如下行，将 REST Client 的依赖添加到项目中：

XML
<dependencies>
    <dependency>
        <groupId>io.agora</groupId>
        <artifactId>agora-rest-client-core</artifactId>
        <version>0.3.0</version>
    </dependency>
</dependencies>

实现对话式 AI 引擎

本节介绍实现对话式 AI 引擎的核心 Java 代码。你可以根据你的需求选择一种方式阅读本节内容：

快速跑通：如果只想快速跑通示例代码，不关心实现细节。你可以复制下面的完整示例代码到 Main.java 文件中，并参考定义变量配置相关参数，之后跳到与智能体对话章节继续跑通。

Main.java 完整示例代码

Java
package io.agora;


import io.agora.rest.AgoraException;
import io.agora.rest.core.BasicAuthCredential;
import io.agora.rest.core.Credential;
import io.agora.rest.core.DomainArea;
import io.agora.rest.services.convoai.ConvoAIClient;
import io.agora.rest.services.convoai.ConvoAIConfig;
import io.agora.rest.services.convoai.ConvoAIServiceRegionEnum;
import io.agora.rest.services.convoai.req.JoinConvoAIReq;
import io.agora.rest.services.convoai.req.ListConvoAIReq;
import io.agora.rest.services.convoai.res.JoinConvoAIRes;
import io.agora.rest.services.convoai.res.ListConvoAIRes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;

public class Main {
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    // 声网参数配置
    public static final String APP_ID = "<your appId>";
    public static final String CNAME = "<your cname>";
    public static final String AGENT_RTC_UID = "<your agent rtc uid>";
    public static final String USERNAME = "<the username of basic auth credential>";
    public static final String PASSWORD = "<the password of basic auth credential>";
    public static final String AGENT_RTC_TOKEN = "<your agent rtc token>";

    // LLM 参数配置
    public static final String LLM_URL = "<your LLM URL>";
    public static final String LLM_API_KEY = "<your LLM API Key>";
    public static final String LLM_MODEL = "<your LLM model>";

    // TTS 参数配置（以火山 TTS 为例）
    public static final String TTS_BYTEDANCE_TOKEN = "<your bytedance tts token>";
    public static final String TTS_BYTEDANCE_APP_ID = "<your bytedance tts app id>";
    public static final String TTS_BYTEDANCE_CLUSTER = "<your bytedance tts cluster>";
    public static final String TTS_BYTEDANCE_VOICE_TYPE = "<your bytedance tts voice type>";

    public static void main(String[] args) throws Exception {
        Credential credential = new BasicAuthCredential(USERNAME, PASSWORD);
        ConvoAIConfig config = ConvoAIConfig.builder().
                appId(APP_ID).
                credential(credential).
                domainArea(DomainArea.CN).
                serverRegion(ConvoAIServiceRegionEnum.CHINESE_MAINLAND).
                build();

        ConvoAIClient convoAIClient = ConvoAIClient.create(config);

        String name = APP_ID + ":" + CNAME;

        JoinConvoAIRes joinConvoAIRes;
        try {
            joinConvoAIRes = convoAIClient.join(JoinConvoAIReq.builder()
                    .name(name)
                    .properties(JoinConvoAIReq.Properties.builder()
                            .token(AGENT_RTC_TOKEN)
                            .channel(CNAME)
                            .agentRtcUId(AGENT_RTC_UID)
                            .remoteRtcUIds(new ArrayList<String>() {
                                {
                                    add("*");
                                }
                            })
                            .enableStringUId(false)
                            .idleTimeout(120)
                            .advancedFeatures(JoinConvoAIReq.AdvancedFeatures.builder()
                                    .enableAIVad(true)
                                    .build())
                            .llmPayload(JoinConvoAIReq.LLMPayload.builder()
                                    .url(LLM_URL)
                                    .apiKey(LLM_API_KEY)
                                    .params(new HashMap<String, Object>() {
                                        {
                                            put("model", LLM_MODEL);
                                            put("max_tokens", 1024);
                                            put("username", "Jack");
                                        }
                                    })
                                    .systemMessages(new ArrayList<Map<String, Object>>() {
                                        {
                                            add(new HashMap<String, Object>() {
                                                {
                                                    put("content", "You are a helpful chatbot。");
                                                    put("role", "system");
                                                }
                                            });
                                        }
                                    })
                                    .maxHistory(30)
                                    .greetingMessage("Hello,how can I help you?")
                                    .build())
                            .ttsPayload(JoinConvoAIReq.TTSPayload.builder()
                                    .vendor(JoinConvoAIReq.TTSVendorEnum.BYTEDANCE)
                                    .params(JoinConvoAIReq.BytedanceTTSVendorParams.builder().
                                            token(TTS_BYTEDANCE_TOKEN).
                                            cluster(TTS_BYTEDANCE_CLUSTER).
                                            voiceType(TTS_BYTEDANCE_VOICE_TYPE).
                                            appId(TTS_BYTEDANCE_APP_ID).
                                            speedRatio(1.0F).
                                            volumeRatio(1.0F).
                                            pitchRatio(1.0F).
                                            emotion("happy").
                                            build())
                                    .build())
                            .vadPayload(JoinConvoAIReq.VADPayload.builder()
                                    .interruptDurationMs(160)
                                    .prefixPaddingMs(300)
                                    .silenceDurationMs(480)
                                    .threshold(0.5F)
                                    .build())
                            .asrPayload(JoinConvoAIReq.ASRPayload.builder()
                                    .language("zh-CN")
                                    .build())
                            .build())
                    .build()).block();
        } catch (AgoraException e) {
            logger.error("Failed to start the agent,err:{}", e.getMessage());
            return;
        } catch (Exception e) {
            logger.error("Unknown exception,err:{}", e.getMessage());
            return;
        }

        if (joinConvoAIRes == null) {
            logger.error("Failed to start the agent");
            return;
        }

        logger.info("Start the agent successfully, joinConvoAIRes:{}", joinConvoAIRes);

        String agentId = joinConvoAIRes.getAgentId();

        Thread.sleep(3000);

        // List the agent
        ListConvoAIRes listConvoAIRes;
        try {
            listConvoAIRes = convoAIClient.list(ListConvoAIReq.builder()
                    .channel(CNAME)
                    .state(2)
                    .build()).block();
        } catch (AgoraException e) {
            logger.error("Failed to list the agent,err:{}", e.getMessage());
            return;
        } catch (Exception e) {
            logger.error("Unknown exception,err:{}", e.getMessage());
            return;
        }

        if (listConvoAIRes == null) {
            logger.error("Failed to list the agent");
            return;
        }

        logger.info("List the agent successfully, listConvoAIRes:{}", listConvoAIRes);

        Thread.sleep(120000);

        // Stop the agent
        try {
            convoAIClient.leave(agentId).block();
            logger.info("Leave the agent successfully, agentId:{}", agentId);
        } catch (AgoraException e) {
            logger.error("Failed to leave the agent,err:{}", e.getMessage());
        } catch (Exception e) {
            logger.error("Unknown exception,err:{}", e.getMessage());
        }

    }
}

了解实现细节：如果你想了解实现对话式 AI 引擎的各个核心步骤，或需要根据你的需求修改示例代码（例如使用 string 型 UID，或使用其他 TTS 供应商），可以继续阅读本节内容。

引入 SDK

在 Main.java 中添加如下代码，引入 Java SDK 中需要的类：

Java
package io.agora;

import io.agora.rest.AgoraException;
import io.agora.rest.core.BasicAuthCredential;
import io.agora.rest.core.Credential;
import io.agora.rest.core.DomainArea;
import io.agora.rest.services.convoai.ConvoAIClient;
import io.agora.rest.services.convoai.ConvoAIConfig;
import io.agora.rest.services.convoai.ConvoAIServiceRegionEnum;
import io.agora.rest.services.convoai.req.JoinConvoAIReq;
import io.agora.rest.services.convoai.req.ListConvoAIReq;
import io.agora.rest.services.convoai.res.JoinConvoAIRes;
import io.agora.rest.services.convoai.res.ListConvoAIRes;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;

定义变量

在 Main.java 文件中，添加如下代码，定义创建对话式智能体所需的关键参数：

Java
// 声网参数配置
public static final String APP_ID = "<your appId>";
public static final String CNAME = "<your cname>";
public static final String AGENT_RTC_UID = "<your agent rtc uid>";
public static final String USERNAME = "<the username of basic auth credential>";
public static final String PASSWORD = "<the password of basic auth credential>";
public static final String AGENT_RTC_TOKEN = "<your agent rtc token>";

// LLM 参数配置
public static final String LLM_URL = "<your LLM URL>";
public static final String LLM_API_KEY = "<your LLM API Key>";
public static final String LLM_MODEL = "<your LLM model>";

// TTS 参数配置（以火山 TTS 为例）
public static final String TTS_BYTEDANCE_TOKEN = "<your bytedance tts token>";
public static final String TTS_BYTEDANCE_APP_ID = "<your bytedance tts app id>";
public static final String TTS_BYTEDANCE_CLUSTER = "<your bytedance tts cluster>";
public static final String TTS_BYTEDANCE_VOICE_TYPE = "<your bytedance tts voice type>";

其中，需要配置的参数可以分为三类：

声网参数配置：
- APP_ID：你的声网 App ID。
- CNAME：对话式智能体加入的 RTC 频道名。你需要填入创建临时 Token 时使用的频道名。
- AGENT_RTC_UID：对话式智能体在 RTC 频道内的用户 ID。默认使用 Int 型 UID，例如 123，如需使用 String 型 UID，请将后续步骤中的 enableStringUId 设置为 true。
- USERNAME 和 PASSWORD：你的声网客户 ID 和客户密钥。
- AGENT_RTC_TOKEN：对话式智能体加入 RTC 频道使用的 Token。填入临时 Token 即可。
LLM 参数配置：
- LLM_URL：LLM API 回调地址。
- LLM_API_KEY：LLM API Key。
- LLM_MODEL：大语言模型名。
TTS 参数配置：示例代码中以火山 TTS 为例，你可以根据实际情况选择其他 TTS 供应商，并参考 TTSPayload 了解具体配置方式。

创建并初始化 Client

在 main 方法中创建 BasicAuthCredential 实例以配置 HTTP Basic Auth 信息，初始化 ConvoAIConfig 实例，最后调用 create 方法创建对话式 AI 引擎 REST Client：

Java
public static void main(String[] args) throws Exception {
    // 创建 HTTP Basic Auth 认证凭证
    Credential credential = new BasicAuthCredential(USERNAME, PASSWORD);
    // 初始化 REST Client 配置
    ConvoAIConfig config = ConvoAIConfig.builder().
            appId(APP_ID).
            credential(credential).
            domainArea(DomainArea.CN).
            serverRegion(ConvoAIServiceRegionEnum.CHINESE_MAINLAND).
            build();

    // 创建对话式 AI 引擎 REST Client
    ConvoAIClient convoAIClient = ConvoAIClient.create(config);
    
    // ... 后续代码
}

创建对话式智能体

使用 join 方法创建对话式智能体，并传入 LLM 和 TTS 的配置参数：

Java
String name = APP_ID + ":" + CNAME;

JoinConvoAIRes joinConvoAIRes;
try {
    joinConvoAIRes = convoAIClient.join(JoinConvoAIReq.builder()
            .name(name)
            .properties(JoinConvoAIReq.Properties.builder()
                    .token(AGENT_RTC_TOKEN)
                    .channel(CNAME)
                    .agentRtcUId(AGENT_RTC_UID)
                    .remoteRtcUIds(new ArrayList<String>() {
                        {
                            add("*");
                        }
                    })
                    .enableStringUId(false)
                    .idleTimeout(120)
                    .advancedFeatures(JoinConvoAIReq.AdvancedFeatures.builder()
                            .enableAIVad(true)
                            .build())
                    .llmPayload(JoinConvoAIReq.LLMPayload.builder()
                            .url(LLM_URL)
                            .apiKey(LLM_API_KEY)
                            .params(new HashMap<String, Object>() {
                                {
                                    put("model", LLM_MODEL);
                                    put("max_tokens", 1024);
                                    put("username", "Jack");
                                }
                            })
                            .systemMessages(new ArrayList<Map<String, Object>>() {
                                {
                                    add(new HashMap<String, Object>() {
                                        {
                                            put("content", "You are a helpful chatbot。");
                                            put("role", "system");
                                        }
                                    });
                                }
                            })
                            .maxHistory(30)
                            .greetingMessage("Hello,how can I help you?")
                            .build())
                    .ttsPayload(JoinConvoAIReq.TTSPayload.builder()
                            .vendor(JoinConvoAIReq.TTSVendorEnum.BYTEDANCE)
                            .params(JoinConvoAIReq.BytedanceTTSVendorParams.builder().
                                    token(TTS_BYTEDANCE_TOKEN).
                                    cluster(TTS_BYTEDANCE_CLUSTER).
                                    voiceType(TTS_BYTEDANCE_VOICE_TYPE).
                                    appId(TTS_BYTEDANCE_APP_ID).
                                    speedRatio(1.0F).
                                    volumeRatio(1.0F).
                                    pitchRatio(1.0F).
                                    emotion("happy").
                                    build())
                            .build())
                    .vadPayload(JoinConvoAIReq.VADPayload.builder()
                            .interruptDurationMs(160)
                            .prefixPaddingMs(300)
                            .silenceDurationMs(480)
                            .threshold(0.5F)
                            .build())
                    .asrPayload(JoinConvoAIReq.ASRPayload.builder()
                            .language("zh-CN")
                            .build())
                    .build())
            .build()).block();
} catch (AgoraException e) {
    logger.error("Failed to start the agent,err:{}", e.getMessage());
    return;
} catch (Exception e) {
    logger.error("Unknown exception,err:{}", e.getMessage());
    return;
}

if (joinConvoAIRes == null) {
    logger.error("Failed to start the agent");
    return;
}

logger.info("Start the agent successfully, joinConvoAIRes:{}", joinConvoAIRes);

String agentId = joinConvoAIRes.getAgentId();

列出当前对话式智能体

使用 list 方法获取当前运行的对话式智能体列表：

Java
Thread.sleep(3000);

// List the agent
ListConvoAIRes listConvoAIRes;
try {
    listConvoAIRes = convoAIClient.list(ListConvoAIReq.builder()
            .channel(CNAME)
            .state(2)
            .build()).block();
} catch (AgoraException e) {
    logger.error("Failed to list the agent,err:{}", e.getMessage());
    return;
} catch (Exception e) {
    logger.error("Unknown exception,err:{}", e.getMessage());
    return;
}

if (listConvoAIRes == null) {
    logger.error("Failed to list the agent");
    return;
}

logger.info("List the agent successfully, listConvoAIRes:{}", listConvoAIRes);

停止对话式智能体

使用 leave 方法可以停止对话式智能体：

Java
Thread.sleep(120000);

// Stop the agent
try {
    convoAIClient.leave(agentId).block();
    logger.info("Leave the agent successfully, agentId:{}", agentId);
} catch (AgoraException e) {
    logger.error("Failed to leave the agent,err:{}", e.getMessage());
} catch (Exception e) {
    logger.error("Unknown exception,err:{}", e.getMessage());
}

与智能体对话

本节介绍如何让智能体和用户加入同一个 RTC 频道，并实现对话互动。

用户加入频道

在你的 App 中使用和智能体不同的用户 ID 和相同的 Token 和频道名加入 RTC 频道。

信息

你也可以使用实时互动 Web Demo 加入 RTC 频道。完成初始化设置后，在左侧菜单选择快速开始-音视频通话，填入和智能体不同的用户 ID 和相同的 Token 和频道名加入 RTC 频道。

智能体加入频道

在 IntelliJ IDEA 中，点击右上角的 Run 按钮 (shift + F10)，启动智能体。

运行成功后，你会看到类似如下日志：

Shell
[main] INFO io.agora.Main - Start the agent successfully, joinConvoAIRes:JoinConvoAIRes{agentId='1NT29X0WPCExxxxxWIOYAX0PDQLA5EX', createTs=1742892557, status='RUNNING'}
[main] INFO io.agora.Main - List the agent successfully, listConvoAIRes:ListConvoAIRes{data=Data{count=1, list=[Agent{agentId='1NT29X0WPCExxxxxWIOYAX0PDQLA5EX', startTs=1742892560, status='RUNNING'}]}, meta=Meta{total=1, cursor=''}, status='ok'}
[main] INFO io.agora.Main - Leave the agent successfully, agentId:1NT29X0WPCExxxxxWIOYAX0PDQLA5EX

之后，智能体会加入 RTC 频道，并向用户发送问候语。

注意

智能体加入频道后，如果频道内没有其他用户，一段时间后（示例代码中设置为 120 秒）智能体会自动离开频道。

用户与智能体对话

双方加入频道后，用户可以直接和智能体对话，智能体会使用语音回答。

参考信息

示例项目

声网提供了开源的示例项目供你参考，你可以前往下载或查看其中的源代码。

agora-rest-client-java

前提条件​

创建项目并安装 SDK​

实现对话式 AI 引擎​

引入 SDK​

定义变量​

创建并初始化 Client​

创建对话式智能体​

列出当前对话式智能体​

停止对话式智能体​

与智能体对话​

用户加入频道​

智能体加入频道​

用户与智能体对话​

参考信息​

示例项目​

API 参考​