Microsoft AI Research Presents SIGMA: An Open-Source Platform for Mixed Reality and AI Innovation

  • Microsoft Research launches SIGMA, a platform that integrates mixed reality with advanced AI technologies.
  • SIGMA uses HoloLens 2 to guide users through procedural tasks with either a large language model like GPT-4 or a predefined task library.
  • The system can answer open-ended questions and identify objects in real-time using advanced vision models.
  • Built on the open-source Platform for Situated Intelligence (psi), SIGMA supports rapid prototyping and multimodal AI system development.
  • The client-server architecture helps overcome computing limitations of the HoloLens 2, paving the way for expansion to other devices.
  • SIGMA is still evolving but provides a solid base for research into the intersection of mixed reality and AI.
  • Dynamics 365 Guides is another tool by Microsoft designed for frontline workers, enhancing their workflow with AI and mixed reality.

Main AI News:

Recent progress in generative AI and large language, vision, and multimodal models lays the groundwork for open-domain knowledge, inference, and content generation, enabling open-ended task assistance. However, generating relevant instructions and content is just the start of building AI systems that collaborate seamlessly with humans in the real world. This collaboration encompasses mixed-reality task assistants, interactive robots, smart manufacturing, autonomous vehicles, and more.

For effective cooperation, AI systems must continuously perceive and reason in a multimodal stream about their environment, transcending mere object detection and tracking. Understanding objects’ functions, relationships, spatial constraints, and how these factors evolve over time is crucial for physical teamwork.

These systems should reason not only about the physical world but also about humans. This requires judgments about cognitive states and social norms of collaborative behavior in real time, as well as lower-level assessments of body posture, voice, and actions.

Combining mixed-reality and AI technologies, such as large language and vision models, Microsoft Research unveils SIGMA. This interactive system uses HoloLens 2 to guide users through tasks procedurally. A large language model like GPT-4 or a series of manually defined steps in a task library can dynamically generate tasks. SIGMA can also use its extensive language model to answer open-ended questions from users and employ vision models like Detic and SEEM to identify and highlight task-related objects in the user’s field of view.

The system’s client-server architecture supports these goals. A lightweight client application on the HoloLens 2 transmits multiple multimodal data streams to a more powerful desktop server, including RGB, depth, audio, and gaze tracking. The server sends instructions and data back to the client, ensuring basic functionality and content display on the device. This design overcomes the headset’s computing limitations and enables future expansion to other mixed-reality devices.

SIGMA is built on Platform for Situated Intelligence (psi), an open-source architecture designed for developing and researching multimodal AI systems. The \psi framework provides performant streaming, logging infrastructure, and fast prototyping. Its data replay infrastructure facilitates data-driven application development, while Platform for Situated Intelligence Studio offers comprehensive tools for visualization, debugging, and maintenance.

Though SIGMA’s current functionality is still developing, it serves as a platform for future research into mixed reality and AI convergence. Collected datasets help researchers explore perception challenges ranging from computer vision to speech recognition.

SIGMA exemplifies Microsoft’s ongoing commitment to exploring new AI and mixed-reality technologies. The company also offers Dynamics 365 Guides, an enterprise-ready mixed-reality tool that provides procedural support to frontline workers through Copilot. In private preview, Copilot in Dynamics 365 Guides empowers users with workflow-relevant information, seamlessly blending AI and mixed reality.

By sharing this system, Microsoft aims to help researchers bypass basic engineering challenges in building full-stack applications, enabling them to focus on new frontiers in their field.

Conclusion:

The introduction of SIGMA by Microsoft signifies a substantial leap in combining mixed reality and artificial intelligence, reinforcing the company’s position at the cutting edge of these technologies. This move not only enhances Microsoft’s product offerings but also encourages further research and development in AI and mixed reality fields. By providing an open-source platform, Microsoft is lowering entry barriers for other researchers, potentially speeding up innovation and possibly leading to new commercial applications. For the market, this strategy not only boosts Microsoft’s competitive edge but also fosters a more collaborative and innovation-driven ecosystem in mixed-reality and AI technologies.

Source