Baidu is all-in on AI, but Sora-style advances remain premature, says Robin Li
Baidu unveiled a slate of new AI technology at its annual event, though plans for a Sora rival remain on hold.
At this year’s edition of Baidu World, Baidu’s annual event, the spotlight fell on what defines a valuable artificial intelligence application. Robin Li, founder and CEO of Baidu, opened with a keynote addressing the Chinese tech giant’s vision for large language models (LLMs) and generative AI, positioning the company’s latest advances as foundational steps toward a more integrated AI landscape.
Ernie, Baidu’s flagship LLM, now handles more than 1.5 billion calls each day—a figure that Li highlighted as a sign of widespread demand. Imagining a future where this number could grow tenfold, Li pushed the audience to consider the ripple effects of such scale. Ernie, he noted, has experienced nearly tenfold growth in the last six months alone, underscoring an increasing appetite for accessible AI.
During his address, Li outlined several insights on AI’s evolving role. First, retrieval-augmented generation (RAG) has gained traction as a key approach across the industry, particularly in curbing “hallucinations,” where AI outputs information that is confidently incorrect. Baidu’s progress in this area, Li said, has been substantial, with RAG becoming a more reliable method for delivering accurate AI responses. He added that Baidu’s RAG advancements over the past two years have enabled more trustworthy output, steering clear of the inconsistencies that have plagued other models.
Li also spoke to the growing prevalence of AI agents as the preferred interface for accessing content, information, and services—a trend that he likened to the role of websites in the early PC era. These agents, he explained, are built to be more human-like in their interactions, marking a shift toward interfaces that can intuitively respond to user needs. Agents emerged as one of the most discussed topics at Baidu World, with Li highlighting four primary application areas: corporate functions such as customer service, role-based uses like virtual personas for live streaming, task-oriented functionalities like generating industry reports, and specialized applications tailored to specific sectors.
On the commercial side, Li introduced Baidu’s image-based RAG technology, or iRAG, an advancement aimed at generating instant, high-fidelity images without the distortions that often affect AI-generated visuals. Positioned as a solution for real-time, precise content, iRAG is designed to meet today’s demands for responsive, image-based content.
Refining development with Miaoda and Comate
Baidu also unveiled Miaoda, a no-code application development platform set to launch in Q1 2025. Unlike typical no-code platforms, Miaoda’s development process is driven by multiple intelligent agents working in tandem. For instance, in creating a web page, one agent handles coding and deployment, while another writes the content. Meanwhile, a retrieval bot scours the internet for up-to-date information, and an image-generation agent sources visuals. Overseeing this process is a quality assurance agent, equipped with reflective capabilities to test, identify bugs, and collaborate with the coding agent to refine the output.
The multiagent framework is integral not only to Miaoda but also to Baidu’s professional development tool, Comate. Now in its third version, Comate automates tasks like code review and completion through agent-driven processes, allowing developers to focus more on creative problem-solving and innovation. According to Baidu CTO Wang Haifeng, the goal is to improve efficiency and output quality across the development workflow.
Bridging Baidu Wenku and Wangpan
In September 2024, the Baidu Wenku business was realigned under the mobile ecosystem group (MEG) division and integrated with Baidu Wangpan, a move that aims to enhance synergies between the two platforms. Wang Ying, Baidu’s vice president and head of the MEG division, said that users have consistently faced two main challenges: limited cross-format editing capabilities and the separation of public knowledge in Wangpan from personal content in Wenku, making it difficult to build an integrated knowledge base.
To address these issues, Baidu Wangpan has launched a canvas feature, which Li described as an intelligent whiteboard agent. This feature enables users to select, interact with, and organize content from both Wenku and Wangpan within a single interface. Leveraging a mixture of experts (MoE) and multimodal models, the canvas feature supports cross-modal content creation, allowing users to process and combine text, images, and videos for use across platforms such as WeChat Moments, Xiaohongshu, or for generating professional reports with embedded visual data.
As the AI tool market continues to explore monetization strategies, Wang Ying believes Baidu Wenku and Wangpan are naturally positioned for a subscription-based model. “The AI capabilities extend our product’s functional boundaries,” she told 36Kr. “More options mean more value for users, which drives up retention and subscriptions.”
Tackling hallucinations before taking on Sora
During the event, Li addressed Baidu’s cautious approach in developing multimodal models, emphasizing the challenges of integrating RAG with image processing. “Multimodal models aren’t widely used because the hallucination issue remains unresolved,” he said, setting the tone for Baidu’s cautious approach to Sora.
Rather than rush to market, Baidu has prioritized refining hallucination-prone aspects of multimodal applications. During Baidu World, the company introduced iRAG, an image generation technology crafted to deliver more realistic AI-generated visuals.
Explaining iRAG’s workflow, CTO Wang detailed a multistep process for producing highly accurate images. The model is said to first analyze user requirements to set precise enhancement parameters, before it retrieves relevant elements and applies localized attention to maintain key image features, with global attention ensuring clarity and sharpness.
Xiaodu’s debut in wearable AI
Building on last year’s upgrades to Xiaodu, which integrated a large model as its core AI, Baidu has now introduced its first AI-powered glasses under the brand. Weighing just 45 grams—lighter than the industry average of 49 grams—the glasses come with a 16-megapixel ultrawide lens and AI-powered anti-shake features for stable, high-quality images. To enhance audio recognition and reduce sound leakage, the glasses feature a four-microphone array and open speakers.
What sets the Xiaodu AI Glasses apart from typical eyewear is their AI-driven functionality. Powered by the Ernie LLM and the DuerOS AI-native operating system, the glasses offer hands-free capabilities such as first-person recording, Q&A, object recognition, real-time translation, smart reminders, and playlist management. Baidu vice president and Xiaodu CEO Li Ying announced that the glasses are slated for release in the first half of 2025.
#BaiduAI #GenerativeAI #MultimodalTechnology #AIInnovation #Ifvex
- Art
- Causes
- Crafts
- Dance
- Drinks
- Film
- Fitness
- Food
- Jogos
- Gardening
- Health
- Início
- Literature
- Music
- Networking
- Outro
- Party
- Religion
- Shopping
- Sports
- Theater
- Wellness