Adding voice commands and spatial mapping to our Unity model in HoloLens

Jul 19, 2016

Augmented Reality, HoloLens, Robotics, Unity3D

In the last post we added a gaze cursor and some command support for our ABB industrial robot inside HoloLens. The next step I took was to add spatial mapping, allowing the user to select the base of the robot and move it around within the spatially mapped environs.

The Holograms 101 tutorial provided very straightforward instructions on how to implement spatial mapping. I once again had to export the "wireframe" material asset from the provided project to get it across into my own, but that was a minor detail.

Unfortunately when testing the capability – just as with voice commands – it didn't work. In an attempt to track down the issue, I went and launched the app from the debugger, at which point I spotted this information in the debug output:

Capability 'spatialPerception' is required, please enable it in Package.appxmanifest in order to enable spatial mapping functionality.
(Filename: C:\buildslave\unity\build\PlatformDependent/MetroPlayer/MetroCapabilities.cpp Line: 126)
　
Capability 'microphone' is required, please enable it in Package.appxmanifest in order to enable speech recognition functionality.
(Filename: C:\buildslave\unity\build\PlatformDependent/MetroPlayer/MetroCapabilities.cpp Line: 126)
　

So it was that I realised that – just like with the original issue that stopped our robot hologram from appearing in 3D – there were a couple of project settings I needed to enable to allow both voice and spatial mapping to work properly. You can find them in Unity under File –> Build Settings… –> Player Settings… –> Capabilities. The settings are called "Microphone" and "SpatialPerception".

Rebuilding the project magically enabled both the voice commands and spatial mapping features I'd added by following Holograms 101.

Here's the SpeechManager.cs file, modified from the one in the tutorial:

using System.Collections.Generic;
using System.Linq;
using UnityEngine;
using UnityEngine.Windows.Speech;
 
public class SpeechManager : MonoBehaviour
{
  KeywordRecognizer keywordRecognizer = null;
  Dictionary<string, System.Action> keywords = new Dictionary<string, System.Action>();
 
  // Use this for initialization
 
  void Start()
  {
    keywords.Add("Halt", () => this.BroadcastMessage("OnStop"));
 
    keywords.Add("Move", () => this.BroadcastMessage("OnStart"));
 
    keywords.Add("Quick", () => this.BroadcastMessage("OnQuick"));
 
    keywords.Add("Slow", () => this.BroadcastMessage("OnSlow"));
 
    keywords.Add("Stop", () =>
    {
      var focusObject = GazeGestureManager.Instance.FocusedObject;
      if (focusObject != null)
      {
        // Call the OnStop method on just the focused object.
        focusObject.SendMessage("OnStop");
      }
    });
 
    keywords.Add("Spin", () =>
    {
      var focusObject = GazeGestureManager.Instance.FocusedObject;
      if (focusObject != null)
      {
        // Call the OnStop method on just the focused object.
        focusObject.SendMessage("OnStart");
      }
    });
 
    // Tell the KeywordRecognizer about our keywords.
    keywordRecognizer = new KeywordRecognizer(keywords.Keys.ToArray());
 
    // Register a callback for the KeywordRecognizer and start recognizing!
    keywordRecognizer.OnPhraseRecognized += KeywordRecognizer_OnPhraseRecognized;
    keywordRecognizer.Start();
  }
 
  private void Ke

ywordRecognizer_OnPhraseRecognized(PhraseRecognizedEventArgs args)
  {
    System.Action keywordAction;
    if (keywords.TryGetValue(args.text, out keywordAction))
    {
      keywordAction.Invoke();
    }
  }
}
 

Here's the correspondingly updated PartCommands.cs file (Rotate.cs can stay as it was before):

using UnityEngine;
using System;
 
public class PartCommands : MonoBehaviour
{
  // Called by GazeGestureManager when the user performs a Select gesture
 
  void OnSelect()
  {
    CallOnParent(
        r =>
        {
          if (r.isStopped)
          {
            r.speed = -r.speed;
          }
          r.isStopped = !r.isStopped;
        }
    );
  }
 
  void OnStart()
  {
    CallOnParent(r => r.isStopped = false);
  }
 
  void OnStop()
  {
    CallOnParent(r => r.isStopped = true);
  }
 
  void OnQuick()
  {
    CallOnParent(r => r.isFast = true);
  }
 
  void OnSlow()
  {
    CallOnParent(r => r.isFast = false);
  }
 
  void CallOnParent(Action<Rotate> f)
  {
    var rot = this.gameObject.GetComponentInParent<Rotate>();
    if (rot)
    {
      f(rot);
    }
  }
}
 

Now to see it in action… here's a video showing both voice control and spatial mapping in our robot project:

6 responses to “Adding voice commands and spatial mapping to our Unity model in HoloLens”

Loic Jourdan

July 19, 2016 at 2:53 pm

Hi Kean,
Pretty amazing how fast you progresses...
Do you think we can expect to see real-time shadows in AR soon? I've seen a few videos showing shadows in AR but no clues on how they were computed... I don't think they are computed from real lights, which would be really great! (I haven't investigated that much though)

Thanks!

Loic

Suggestion from my kid: implement a voice-controlled animated pokemon (...yes, he's been caught by this game)

Reply
1. Kean Walmsley
  
  July 19, 2016 at 3:07 pm
  
  Hi Loic,
  
  This particular model uses an environment map for lighting, but I do think it'd be amazing to do real-time shadows (it would be so cool to have light sensors on the device that inform that process :-).
  
  And yes - I'm pretty sure Pokemon Go is on its way to HoloLens... will be interesting to see what more they do with this platform!
  
  Best,
  
  Kean
  
  Reply
volsry

July 21, 2016 at 3:24 am

Hi Kean,
I use PaletteSet produced a menu, but when set to automatically load, it always appears in the following functional areas, how should I set up to make it appear below it? (Poor English, with translation, I hope you can understand, thank you)

Reply
1. Kean Walmsley
  
  July 21, 2016 at 6:24 am
  
  Hi volsry,
  
  Please post support requests to the AutoCAD .NET forum: forums.autodesk.com/. You will need to explain a bit more about the mechanism you've used (but please do that on the forum, rather than here).
  
  Regards,
  
  Kean
  
  Reply
James Maeding

July 21, 2016 at 6:46 pm

Kean, so did google and Microsoft both come up with real time scanning of surroundings for their VR offerings? Or are they sharing technology? The google project tango seemed amazing, and I'm surprised MS also has a similar technology. Maybe the scanning is not as difficult as I am thinking so lots of vendors will have their own real time modeling system.
BTW, Autodesk bought netfabb, which all 3d printer nerds use for fixing watertight meshes. I wish the netfabb abilities could be used with Civil3D for simplifying surfaces.
thx

Reply
1. Kean Walmsley
  
  July 21, 2016 at 7:14 pm
  
  Hi James,
  
  They each have different technology. Although - thinking about it - Johnny Lee did leave the Microsoft Kinect team to join (and drive?) Project Tango at Google. That said, Kinect v2 ended up being quite different from v1, and from Project Tango, too, I expect.
  
  Anyway, two Kinect-like sensors (which could probably be considered Kinect v3) are inside HoloLens, which is in itself pretty astounding.
  
  Yes - was happy to hear about netfabb. We'll see how the technology gets used.
  
  Best,
  
  Kean
  
  Reply