Baidu is all-in on AI, but Sora-style advances remain premature, says Robin Li

0
185

Baidu unveiled a slate of new AI technology at its annual event, though plans for a Sora rival remain on hold.

At this year’s edition of Baidu World, Baidu’s annual event, the spotlight fell on what defines a valuable artificial intelligence application. Robin Li, founder and CEO of Baidu, opened with a keynote addressing the Chinese tech giant’s vision for large language models (LLMs) and generative AI, positioning the company’s latest advances as foundational steps toward a more integrated AI landscape.

Ernie, Baidu’s flagship LLM, now handles more than 1.5 billion calls each day—a figure that Li highlighted as a sign of widespread demand. Imagining a future where this number could grow tenfold, Li pushed the audience to consider the ripple effects of such scale. Ernie, he noted, has experienced nearly tenfold growth in the last six months alone, underscoring an increasing appetite for accessible AI.

During his address, Li outlined several insights on AI’s evolving role. First, retrieval-augmented generation (RAG) has gained traction as a key approach across the industry, particularly in curbing “hallucinations,” where AI outputs information that is confidently incorrect. Baidu’s progress in this area, Li said, has been substantial, with RAG becoming a more reliable method for delivering accurate AI responses. He added that Baidu’s RAG advancements over the past two years have enabled more trustworthy output, steering clear of the inconsistencies that have plagued other models.

Li also spoke to the growing prevalence of AI agents as the preferred interface for accessing content, information, and services—a trend that he likened to the role of websites in the early PC era. These agents, he explained, are built to be more human-like in their interactions, marking a shift toward interfaces that can intuitively respond to user needs. Agents emerged as one of the most discussed topics at Baidu World, with Li highlighting four primary application areas: corporate functions such as customer service, role-based uses like virtual personas for live streaming, task-oriented functionalities like generating industry reports, and specialized applications tailored to specific sectors.

On the commercial side, Li introduced Baidu’s image-based RAG technology, or iRAG, an advancement aimed at generating instant, high-fidelity images without the distortions that often affect AI-generated visuals. Positioned as a solution for real-time, precise content, iRAG is designed to meet today’s demands for responsive, image-based content.

Refining development with Miaoda and Comate

Baidu also unveiled Miaoda, a no-code application development platform set to launch in Q1 2025. Unlike typical no-code platforms, Miaoda’s development process is driven by multiple intelligent agents working in tandem. For instance, in creating a web page, one agent handles coding and deployment, while another writes the content. Meanwhile, a retrieval bot scours the internet for up-to-date information, and an image-generation agent sources visuals. Overseeing this process is a quality assurance agent, equipped with reflective capabilities to test, identify bugs, and collaborate with the coding agent to refine the output.

The multiagent framework is integral not only to Miaoda but also to Baidu’s professional development tool, Comate. Now in its third version, Comate automates tasks like code review and completion through agent-driven processes, allowing developers to focus more on creative problem-solving and innovation. According to Baidu CTO Wang Haifeng, the goal is to improve efficiency and output quality across the development workflow.

Bridging Baidu Wenku and Wangpan

In September 2024, the Baidu Wenku business was realigned under the mobile ecosystem group (MEG) division and integrated with Baidu Wangpan, a move that aims to enhance synergies between the two platforms. Wang Ying, Baidu’s vice president and head of the MEG division, said that users have consistently faced two main challenges: limited cross-format editing capabilities and the separation of public knowledge in Wangpan from personal content in Wenku, making it difficult to build an integrated knowledge base.

To address these issues, Baidu Wangpan has launched a canvas feature, which Li described as an intelligent whiteboard agent. This feature enables users to select, interact with, and organize content from both Wenku and Wangpan within a single interface. Leveraging a mixture of experts (MoE) and multimodal models, the canvas feature supports cross-modal content creation, allowing users to process and combine text, images, and videos for use across platforms such as WeChat Moments, Xiaohongshu, or for generating professional reports with embedded visual data.

As the AI tool market continues to explore monetization strategies, Wang Ying believes Baidu Wenku and Wangpan are naturally positioned for a subscription-based model. “The AI capabilities extend our product’s functional boundaries,” she told 36Kr. “More options mean more value for users, which drives up retention and subscriptions.”

Tackling hallucinations before taking on Sora

During the event, Li addressed Baidu’s cautious approach in developing multimodal models, emphasizing the challenges of integrating RAG with image processing. “Multimodal models aren’t widely used because the hallucination issue remains unresolved,” he said, setting the tone for Baidu’s cautious approach to Sora.

Rather than rush to market, Baidu has prioritized refining hallucination-prone aspects of multimodal applications. During Baidu World, the company introduced iRAG, an image generation technology crafted to deliver more realistic AI-generated visuals.

Explaining iRAG’s workflow, CTO Wang detailed a multistep process for producing highly accurate images. The model is said to first analyze user requirements to set precise enhancement parameters, before it retrieves relevant elements and applies localized attention to maintain key image features, with global attention ensuring clarity and sharpness.

Xiaodu’s debut in wearable AI

Building on last year’s upgrades to Xiaodu, which integrated a large model as its core AI, Baidu has now introduced its first AI-powered glasses under the brand. Weighing just 45 grams—lighter than the industry average of 49 grams—the glasses come with a 16-megapixel ultrawide lens and AI-powered anti-shake features for stable, high-quality images. To enhance audio recognition and reduce sound leakage, the glasses feature a four-microphone array and open speakers.

What sets the Xiaodu AI Glasses apart from typical eyewear is their AI-driven functionality. Powered by the Ernie LLM and the DuerOS AI-native operating system, the glasses offer hands-free capabilities such as first-person recording, Q&A, object recognition, real-time translation, smart reminders, and playlist management. Baidu vice president and Xiaodu CEO Li Ying announced that the glasses are slated for release in the first half of 2025.

 

#BaiduAI #GenerativeAI #MultimodalTechnology #AIInnovation #Ifvex

Commandité
Commandité
Rechercher
Commandité
Catégories
Lire la suite
Autre
العالمي
العالمي
Par Mohmed Nasser 2024-09-19 10:54:21 1 2KB
Autre
Top 10 Hologram Sticker Manufacturers & Suppliers in Delhi
  Hologram stickers have become indispensable for businesses seeking to protect their...
Par Mohiuddin Khosru 2024-09-16 08:57:48 0 3KB
Film
《庆余年第二季》一部备受瞩目的古装剧
《庆余年第二季》作为备受瞩目的古装传奇剧,自2024年影视网站播出以来,便以其精彩的剧情、精湛的演技和宏大的制作赢得了广泛好评。该剧承接第一季的精彩内容,继续讲述范闲(张若昀饰)在家族、江湖、庙...
Par Si123 Luo 2024-09-09 03:51:39 0 2KB
Health
Going Green with Temperature Pumps: A Sustainable Alternative for Your Home
Unlocking Sustainability: The Great things about Energy-Efficient Temperature Sends Lately, the...
Par Ahmed Ali 2024-10-19 06:46:51 0 1KB
Autre
在命运与迷雾中挣扎前行的孤独旅程:《孤舟》
《孤舟》作为一部备受瞩目的国产剧,由才华横溢的导演林黎胜精心执导,定于2024年震撼上映,无疑为观众带来了一场视觉与心灵的双重盛宴。这部剧集以其独特的叙事风格、深邃的情感描绘以及错综复杂的剧情设...
Par Abv 134 2024-09-05 03:47:28 0 2KB