Skip to main content
A good capture is deliberate, well-narrated, and unhurried. The data it produces (pose, hands, object positions, transcript) is only as useful as the care taken to record it. This guide covers what to do before, during, and after a session to maximize capture quality.

Before you start

Set up your environment

  • Light the subject well. Natural or overhead lighting works best. Avoid recording with a bright window directly behind the subject; this creates silhouetting that degrades video quality and hurts object recognition.
  • Clear the workspace. Remove objects and clutter that aren’t part of the procedure. A clean field of view helps the tracker focus on the right objects and makes the footage easier to review.
  • Know the procedure in advance. Walk through what you’re going to do before starting the recording. Captures of someone working from memory mid-session produce hesitant, disjointed footage that’s hard to learn from. Think of it like a rehearsed demonstration, not a first attempt.

Check your tracked objects

If the skill you’re recording against has tracked objects configured, confirm they all appear in the pre-recording sheet before tapping Start. Any object not shown there won’t be tracked during the session — and this list can’t be changed once recording begins. Position the objects where they’ll be used before starting. Getting them into frame early gives the tracker more time to acquire a lock.

During the session

Move slowly and deliberately

This is the single most important thing you can do for a good capture. Pose tracking works best when head movements are smooth and intentional. Fast pans, sudden turns, and shaky movements introduce noise into the pose data and can cause brief tracking losses. The same applies to hand tracking; quick, jerky motions are harder to reconstruct than smooth, deliberate ones. A good rule of thumb: move at roughly half the speed you would in a normal work context. It will feel slow. That’s correct. When transitioning between steps (turning to pick something up, moving to a different part of the workspace), turn your whole body rather than just your head, and do it smoothly.

Narrate everything out loud

On-device transcription produces a timestamped record of everything you say during the session. This transcript becomes part of the capture data and is used to generate captions and training outputs. A well-narrated capture is dramatically more useful than a silent one. Narrate as you go, not after. Describe what you’re about to do just before doing it, then describe what you did. For example:
“I’m picking up the torque wrench; this one is set to 25 foot-pounds. I’m positioning it on the left-side bolt here, and applying steady pressure clockwise until I feel the click.”
Speak clearly and at a normal pace. Avoid filler words. If you make a mistake, narrate that too (“I grabbed the wrong tool there, let me swap to the correct one”) so reviewers understand what they’re seeing. Don’t worry about sounding scripted. A clear, informative narration is far more valuable than a natural-sounding but vague one.

Pause between steps

After completing each discrete step, pause for 1–2 seconds before moving to the next. This gives the tracker time to settle, produces clean step boundaries in the data, and makes the footage easier to review and annotate later. Think in terms of chapters: complete a step, hold briefly, move on.

Keep the subject in frame

If you’re demonstrating a procedure on an object or surface, keep it in your field of view as much as possible. The stereo cameras capture what you’re looking at; if the subject drifts to the edge of your view, the useful data goes with it. For bench work, position yourself so the work surface is roughly centered in your view at a comfortable working distance. Avoid leaning in very close (below ~30cm) or stepping back far; both extremes degrade tracking quality.

Let object tracking acquire a lock

When you first encounter a tracked object, look directly at it for 1–2 seconds before touching or moving it. This gives the tracker time to acquire a lock. You’ll see the highlight appear when it does. If a tracked object loses its lock mid-session (the highlight disappears), look directly at it again before continuing with that part of the procedure.

Common mistakes to avoid

MistakeWhy it mattersFix
Moving too fastIntroduces noise into pose and hand dataSlow down intentionally; half speed is a good target
Working in silenceTranscript is empty or sparse, outputs are lower qualityNarrate continuously, including transitions and decisions
Poor lightingDegrades video and object recognitionEnsure good overhead or front lighting before starting
Rushing through stepsNo natural pause points in the dataPause 1–2 seconds between each discrete step
Not checking tracked objects firstObjects not tracked even though they’re presentConfirm objects loaded in the pre-session sheet
Looking away from the subjectSubject drops out of frame at key momentsKeep work surface centered in your view throughout
Recording a first attemptHesitation and mistakes produce noisy dataRehearse the procedure before recording

Multiple takes

Don’t rely on a single capture. Recording 2–3 takes of the same procedure gives reviewers options and makes it easier to catch steps that were unclear or poorly framed in one recording. Takes don’t need to be perfect. A take that clearly shows a specific sub-step well is valuable even if the rest isn’t usable.

Session length

There’s no hard limit on session length, but shorter, focused captures are generally more useful than long continuous ones. If a procedure has natural break points (phases, sub-tasks, or tool changes), consider recording each phase as a separate session rather than one long take. Shorter sessions are also faster to review and upload.