Adding speech recognition to our stereoscopic Google Cardboard viewer

Speech recognition, not at its bestI nearly named this post "Creating a stereoscopic viewer for Google Cardboard using the Autodesk 360 viewer – Part 4", leading on from the series introduction and then parts 1, 2 & 3. But then I decided this topic deserved it's very own title. 🙂

The seed for this post was sown during the VR Hackathon, at the beginning of which I had an inspiring chat with Theo Armour. Not only does Theo have a name worthy of a gladiator – and it turns out there is a list of gladiator names on the Internet, just one more reason I love it – he has an inspiring view of technology and what it can bring us. Jeremy Tammik collaborated with Theo at the recent AEC Hackathon in New York, so I'm sure he knows what I'm talking about.

Anyway, firstly it turns out Theo is the person behind jaanga.com, and it was he who put together the template that got me started with Google Cardboard and the Autodesk Viewing & Data service. So I already owe Theo a debt of thanks for that.

Secondly, Theo has been thinking about where to go next with VR, specifically with regards to user input. A problem with holding a set of goggles in your hands is that it's hard to do very much with them otherwise. Google Cardboard does have a "button" on the side, which is really just a movable washer connected to a fixed magnet that influences the phone's magnetometer, but as you can only access that from a native Android app – not an HTML page – then it's basically useless for our purposes.

I'd been looking at Leap Motion to help with this, which implies having one or more hands free but also adds a platform dependency: not only is their mobile SDK currently Android-specific, it's also supported on a limited set of devices with sufficiently powerful processors, such as the Nexus 5. I'm still planning on pre-ordering a Nexus 6 and getting it working with that, but I'm also keen to move things forward in the meantime and consider solutions that don't reduce the possible audience for this application.

Theo was clearly very excited about the potential for getting access to speech recognition in HTML5 apps. My initial reaction was "wow – surely you can't do that from HTML5!?!" but Theo was keen to pursue this direction. Before I left San Francisco, Theo very kindly invited me for a nice apéro at the ferry building – I was taking the ferry back to Marin on Wednesday afternoon – where he unveiled a working prototype of his HTML5 viewer with functioning speech recognition. Too cool!

Theo's demo app makes use of annyang, a simple JavaScript API that sits on top of the HTML5 Speech Recognition API that it turns out is exposed by most modern browsers. Who knew?

So I went and shamelessly copied Theo's approach, extending it for the Autodesk 360 viewer sample. I initially focused on implementing commands such as "explode", "combine", and zooming "in" and "out" – as well as "reset" and "reload" – but I also managed to find a way to make it work with the command definitions used to create our UI buttons for the front page. So you can also now switch models by saying the name of the model you want to load. A very handy enhancement.

It's worth noting that it's really best to load the first model via the UI – this allows us to force the page to fullscreen, as some UI interaction is needed for that – but after that you can simply use speech to load subequent models.

Google Chrome does keep asking for permission to access the microphone, which is a little annoying, but it turns out that loading the page via "https" allows the browser to remember this. You just get the occasional beep, which is rather less annoying.

The interesting part of the HTML app is, as usual, the JavaScript code. So here that is:

var viewerLeft, viewerRight;

var updatingLeft = false, updatingRight = false;

var leftLoaded, rightLoaded, cleanedModel;

var leftPos, baseDir, upVector, initLeftPos;

var initZoom;

var expFac = 0, exp = 0;

var targExp = 0.5, xfac = 0.05, zfac = 0.3;

var direction = true;

 

var buttons = {

  'robot arm' : function () {

    launchViewer(

      'dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6c3RlYW1idWNrL1JvYm90QXJtLmR3Zng='   

    );

  },

  'front loader' : function () {

    launchViewer(

      'dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6c3RlYW1idWNrL0Z
yb250JTIwTG9hZGVyLmR3Zng='
,

      new THREE.Vector3(0, 0, 1)

    );

  },

  'suspension' : function () {

    launchViewer(

      'dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6c3RlYW1idWNrL1N1c3BlbnNpb24uZHdm'

    );

  },

  'house' : function () {

    launchViewer(

      'dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6c3RlYW1idWNrL2hvdXNlLmR3Zng='

    );

  },

  'V8 engine' : function () {

    launchViewer(

      'dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6c3RlYW1idWNrL1Y4RW5naW5lLnN0cA=='

    );

  },

  'morgan' : function () {

    launchViewer(

      'dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6c3RlYW1idWNrL1NwTTNXNy5mM2Q=',

      new THREE.Vector3(0, 0, 1),

      function () {

        zoom(

          viewerLeft,

          -48722.5, -54872, 44704.8,

          10467.3, 1751.8, 1462.8

        );

      }

    );

  }

}

 

var commands = {

  'explode': function () {

    if (checkViewers()) {

      expFac = expFac + 1;

      explode(true);

    }

  },

  'combine': function () {

    if (checkViewers()) {

      if (expFac > 0) {

        expFac = expFac - 1;

        explode(false);

      }

    }

  },

  'in': function () {

    if (checkViewers()) {

      zoomInwards(-zfac);

    }

  },

  'out': function () {

    if (checkViewers()) {

      zoomInwards(zfac);

    }

  },

  'reset': function () {

    if (checkViewers()) {

      expFac = 0;

      explode(false);

 

      if (initLeftPos) {

        var trg = viewerLeft.navigation.getTarget();

        var up = viewerLeft.navigation.getCameraUpVector();

 

        leftPos = initLeftPos.clone();

        zoom(

          viewerLeft,

          initLeftPos.x, initLeftPos.y, initLeftPos.z,

          trg.x, trg.y, trg.z, up.x, up.y, up.z

        );

      }

    }

  },

  'reload': function () {

    location.reload();

  }

};

 

function initialize() {

 

  // Populate our initial UI with a set of buttons, one for each

  // function in the Buttons object

 

  var panel = document.getElementById('control');

  for (var name in buttons) {

   
var fn = buttons[name];

 

    var button = document.createElement('div');

    button.classList.add('cmd-btn');

 

    // Replace any underscores with spaces before setting the

    // visible name

 

    button.innerHTML = name;

    button.onclick = (function (name) {

      return function () { name(); };

    })(fn);

 

    // Add the button with a space under it

 

    panel.appendChild(button);

    panel.appendChild(document.createTextNode('\u00a0'));

  }

 

  if (annyang) {

 

    // Add our buttons and commands to annyang

 

    annyang.addCommands(buttons);

    annyang.addCommands(commands);

 

    // Start listening

 

    annyang.start();

  }

}

 

function checkViewers() {

  if (viewerLeft && viewerRight)

    return viewerLeft.running && viewerRight.running;

  return false;

}

 

function launchViewer(docId, upVec, zoomFunc) {

 

  // Reset some variables when we reload

 

  if (viewerLeft) {

    viewerLeft.uninitialize();

    viewerLeft = null;

  }

  if (viewerRight) {

    viewerRight.uninitialize();

    viewerRight = null;

  }

  updatingLeft = false;

  updatingRight = false;

  leftPos = null;

  baseDir = null;

  upVector = null;

  initLeftPos = null;

  initZoom = null;

  expFac = 0;

  exp = 0;

  direction = true;

 

  // Assume the default "world up vector" of the Y-axis

  // (only atypical models such as Morgan and Front Loader require

  // the Z-axis to be set as up)

 

  upVec =

    typeof upVec !== 'undefined' ?

      upVec :

      new THREE.Vector3(0, 1, 0);

 

  // Ask for the page to be fullscreen

  // (can only happen in a function called from a

  // button-click handler or some other UI event)

 

  requestFullscreen();

 

  // Hide the controls that brought us here

 

  var controls = document.getElementById('control');

  controls.style.visibility = 'hidden';

 

  // Bring the layer with the viewers to the front

  // (important so they also receive any UI events)

 

  var layer1 = document.getElementById('layer1');

  var layer2 = document.getElementById('layer2');

  layer1.style.zIndex = 1;

  layer2.style.zIndex = 2;

 

  // Store the up vector in a global for later use

 

  upVector = upVec.clone();

 

  // The same for the optional Initial Zoom function

 

  initZoom =

    typeof zoomFunc !== 'undefined' ?

      zoomFunc :

      null;

 

  // Get our access token from the internal web-service API

 

  $.get(

    window.location.protocol + '//' +

    window.location.host + '/api/token',

    function (accessToken) {

 

      // Specify our options, including the provided document ID

 

      var options = {};

      options.env = 'AutodeskProduction';

      options.accessToken = accessToken;

      options.document = docId;

 

      // Create and initialize our two 3D viewers

 

      var elem = document.getElementById('viewLeft');

      viewerLeft = new Autodesk.Viewing.Viewer3D(elem, {});

 

      Autodesk.Viewing.Initializer(options, function () {

        viewerLeft.initialize();

        loadDocument(viewerLeft, options.document);

      });

 

      elem = document.getElementById('viewRight');

      viewerRight = new Autodesk.Viewing.Viewer3D(elem, {});

 

      Autodesk.Viewing.Initializer(options, function () {

        viewerRight.initialize();

        loadDocument(viewerRight, options.document);

      });

    }

  );

}

 

function loadDocument(viewer, docId) {

 

  // The viewer defaults to the full width of the container,

  // so we need to set that to 50% to get side-by-side

 

  viewer.container.style.width = '50%';

  viewer.resize();

 

  // Let's zoom in and out of the pivot - the screen

  // real estate is fairly limited - and reverse the

  // zoom direction

 

  viewer.navigation.setZoomTowardsPivot(true);

  viewer.navigation.setReverseZoomDirection(true);

 

  if (docId.substring(0, 4) !== 'urn:')

    docId = 'urn:' + docId;

 

  Autodesk.Viewing.Document.load(docId,

    function (document) {

 

      // Boilerplate code to load the contents

 

      var geometryItems = [];

 

      if (geometryItems.length == 0) {

        geometryItems =

          Autodesk.Viewing.Document.getSubItemsWithProperties(

            document.getRootItem(),

            { 'type': 'geometry', 'role': '3d' },

            true

          );

      }

      if (geometryItems.length > 0) {

        viewer.load(document.getViewablePath(geometryItems[0]));

      }

 

      // Add our custom progress listener and set the loaded

      // flags to false

 

      leftLoaded = rightLoaded = cleanedModel = false;

      viewer.addEventListener('progress', progressListener);

    },

    function (errorMsg, httpErrorCode) {

      var container = document.getElementById('viewerLeft');

      if (container) {

        alert('Load error ' + errorMsg);

      }

    }

  );

}

 

// Progress listener to set the view once the data has started

// loading properly (we get a 5% notification early on that we

// need to ignore - it comes too soon)

 

function progressListener(e) {

 

  // If we haven't cleaned this model's materials and set the view

  // and both viewers are sufficiently ready, then go ahead

 

  if (!cleanedModel &&

    ((e.percent > 0.1 && e.percent < 5) || e.percent > 5)) {

 

    if (e.target.clientContainer.id === 'viewLeft')

      leftLoaded = true;

    else if (e.target.clientContainer.id === 'viewRight')

      rightLoaded = true;

 

    if (leftLoaded && rightLoaded && !cleanedModel) {

 

      if (initZoom) {

 

        // Iterate the materials to change any red ones to grey

 

        // (We only need this for the Morgan model, which has

        // translation issues from Fusion 360... which is also

        // the only model to provide an initial zoom function)

 

        for (var p in viewerLeft.impl.matman().materials) {

          var m = viewerLeft.impl.matman().materials[p];

          if (m.color.r >= 0.5 && m.color.g == 0 && m.color.b == 0) {

            m.color.r = m.color.g = m.color.b = 0.5;

            m.needsUpdate = true;

          }

        }

        for (var p in viewerRight.impl.matman().materials) {

          var m = viewerRight.impl.matman().materials[p];

          if (m.color.r >= 0.5 && m.color.g == 0 && m.color.b == 0) {

            m.color.r = m.color.g = m.color.b = 0.5;

            m.needsUpdate = true;

          }

        }

 

        // If provided, use the "initial zoom" function

 

        initZoom();

      }

 

      setTimeout(

        function () {

          initLeftPos = viewerLeft.navigation.getPosition();

 

          //TOREMOVE

          //viewerLeft.autocam.setCurrentViewAsFront();

 

          transferCameras(true);

        },

        500

      );

 

      watchTilt();

 

      cleanedModel = true;

    }

  }

  else if (cleanedModel && e.percent > 10) {

 

    // If we have already cleaned and are even further loaded,

    // remove the progress listeners from the two viewers and

    // watch the cameras for updates

 

    unwatchProgress();

 

    watchCameras();

  }

}

 

function requestFullscreen() {

 

  // Must be performed from a UI event handler

 

  var el = document.documentElement,

      rfs =

        el.requestFullScreen ||

        el.webkitRequestFullScreen ||

        el.mozRequestFullScreen;

  rfs.call(el);

}

 

// Add and remove the pre-viewer event handlers

 

function watchCameras() {

  viewerLeft.addEventListener('cameraChanged', left2right);

  viewerRight.addEventListener('cameraChanged', right2left);

}

 

function unwatchCameras() {

  viewerLeft.removeEventListener('cameraChanged', left2right);

  viewerRight.removeEventListener('cameraChanged', right2left);

}

 

function unwatchProgress() {

  viewerLeft.removeEventListener('progress', progressListener);

  viewerRight.removeEventListener('progress', progressListener);

}

 

function watchTilt() {

  if (window.DeviceOrientationEvent)

    window.addEventListener('deviceorientation', orb);

}

 

// Event handlers for the cameraChanged events

 

function left2right() {

  if (!updatingRight) {

    updatingLeft = true;

    transferCameras(true);

    setTimeout(function () { updatingLeft = false; }, 500);

  }

}

 

function right2left() {

  if (!updatingLeft) {

    updatingRight = true;

    transferCameras(false);

    setTimeout(function () { updatingRight = false; }, 500);

  }

}

 

// And for the deviceorientation event

 

function orb(e) {

 

  if (e.alpha && e.gamma) {

 

    // Remove our handlers watching for camera updates,

    // as we'll make any changes manually

    // (we won't actually bother adding them back, afterwards,

    // as this means we're in mobile mode and probably inside

    // a Google Cardboard holder)

 

    unwatchCameras();

 

    // Our base direction allows us to make relative horizontal

    // rotations when we rotate left & right

 

    if (!baseDir)

      baseDir = e.alpha;

 

    if (checkViewers()) {

 

      var deg2rad = Math.PI / 180;

 

      // gamma is the front-to-back in degrees (with

      // this screen orientation) with +90/-90 being

      // vertical and negative numbers being 'downwards'

      // with positive being 'upwards'

 

      var vert = (e.gamma + (e.gamma <= 0 ? 90 : -90)) * deg2rad;

 

      // alpha is the compass direction the device is

      // facing in degrees. This equates to the

      // left - right rotation in landscape

      // orientation (with 0-360 degrees)

 

      var horiz = (e.alpha - baseDir) * deg2rad;

 

      orbitViews(vert,
horiz);

    }

  }

}

 

function transferCameras(leftToRight) {

 

  // The direction argument dictates the source and target

 

  var source = leftToRight ? viewerLeft : viewerRight;

  var target = leftToRight ? viewerRight : viewerLeft;

 

  var pos = source.navigation.getPosition();

  var trg = source.navigation.getTarget();

 

  // Set the up vector manually for both cameras

 

  source.navigation.setWorldUpVector(upVector);

  target.navigation.setWorldUpVector(upVector);

 

  // Get the new position for the target camera

 

  var up = source.navigation.getCameraUpVector();

 

  // Get the position of the target camera

 

  var newPos = offsetCameraPos(source, pos, trg, leftToRight);

 

  // Save the left-hand camera position: device tilt orbits

  // will be relative to this point

 

  leftPos = leftToRight ? pos : newPos;

 

  // Zoom to the new camera position in the target

 

  zoom(

    target, newPos.x, newPos.y, newPos.z, trg.x, trg.y, trg.z,

    up.x, up.y, up.z

  );

}

 

function getDistance(v1,v2) {

  var diff = new THREE.Vector3().subVectors(v1, v2);

  return diff.length();

}

 

function offsetCameraPos(source, pos, trg, leftToRight) {

 

  // Use a small fraction of the distance for the camera offset

 

  var disp = getDistance(pos, trg) * 0.04;

 

  // Clone the camera and return its X translated position

 

  var clone = source.autocamCamera.clone();

  clone.translateX(leftToRight ? disp : -disp);

  return clone.position;

}

 

function orbitViews(vert, horiz) {

 

  // We'll rotate our position based on the initial position

  // and the target will stay the same

 

  var pos = leftPos.clone();

  var trg = viewerLeft.navigation.getTarget();

 

  // Start by applying the left/right orbit

  // (we need to check the up/down value, though)

 

  if (vert < 0)

    horiz = horiz + Math.PI;

 

  var zAxis = upVector.clone();

  pos.applyAxisAngle(zAxis, horiz);

 

  // Now add the up/down rotation

 

  var axis = new THREE.Vector3().subVectors(trg, pos).normalize();

  axis.cross(zAxis);

  pos.applyAxisAngle(axis, -vert);

 

  // Zoom in with the lefthand view

 

  var up = viewerLeft.navigation.getCameraUpVector();

 

  zoom(

    viewerLeft,

    pos.x, pos.y, pos.z,

    trg.x, trg.y, trg.z

  );

 

  // Get a camera slightly to the right

 

  var pos2 = offsetCameraPos(viewerLeft, pos, trg, true);

 

  // And zoom in with that on the righthand view, too

 

  zoom(

    viewerRight,

    pos2.x, pos2.y, pos2.z,

    trg.x, trg.y, trg.z,

    up.x, up.y, up.z

  );

}

 

function explode(outwards) {

  if (outwards != direction)

    direction = outwards;

 

  setTimeout(

    function
() {

      exp = exp + (direction ? xfac : -xfac);

      setTimeout(function () { viewerLeft.explode(exp); }, 0);

      setTimeout(function () { viewerRight.explode(exp); }, 0);

      if ((direction && exp < targExp * expFac) ||

        (!direction && exp > targExp * expFac))

        explode(direction);

    },

    50

  );

}

 

function zoomAlongCameraDirection(viewer, factor) {

 

  var pos = leftPos.clone();

  var trg = viewer.navigation.getTarget();

 

  var disp = trg.clone().sub(pos).multiplyScalar(factor);

  pos.sub(disp);

 

  return pos;

}

 

function zoomInwards(factor) {

 

  leftPos = zoomAlongCameraDirection(viewerLeft, factor);

}

 

// Set the camera based on a position, target and optional up vector

 

function zoom(viewer, px, py, pz, tx, ty, tz, ux, uy, uz) {

 

  // Make sure our up vector is correct for this model

 

  viewer.navigation.setWorldUpVector(upVector, true);

 

  viewer.navigation.setView(

    new THREE.Vector3(px, py, pz),

    new THREE.Vector3(tx, ty, tz)

  );

 

  if (ux && uy && uz) {

      var up = new THREE.Vector3(ux, uy, uz);

    viewer.navigation.setCameraUpVector(up);

  }

}

Here's a video of how it works.

You'll note that the odd command gets dropped – and that's in a relatively noise-free environment – but I think you'll find it's mostly a very helpful addition to viewer's feature-set. Thanks again to Theo for the inspiration!

photo credit: Filmstalker via photopincc

Update:

I fixed a logic error in the code: the zoom was being applied to the camera position post tilt transformation, and so would end up being rotated. The above, updated code works much better than the version posted originally.

Leave a Reply

Your email address will not be published. Required fields are marked *