Miraxis AI

Robotics Research Engineer, Video Perception & Auto-labeling

Miraxis AI United States

Save

Direct message the job poster from Miraxis AI

Miraxis builds AI-assisted data generation systems for Physical AI and robotics. We work with egocentric video, multimodal demonstrations, and human-in-the-loop review workflows to produce high-quality training and evaluation data for embodied AI systems.

Our platform uses foundation models to draft annotations, extract visual evidence, identify uncertainty, and route difficult cases to human review. Model outputs are treated as evidence and decision-support signals, not as automatic ground truth.



Miraxis AI was founded by Artem Sokolov, who is also the Founder and CEO of Humanoid AI. https://thehumanoid.ai/te%d0%b0m



/

Role Summ

aryWe are hiring a Robotics Research Engineer for Video Perception & Auto-labeling, to build the visual evidence layer of our egocentric video annotation pipeli

ne.You will own the perception artifacts that make model-assisted annotation inspectable: masks, boxes, tracks, dense visual features, clip embeddings, similarity links, deduplication flags, evidence spans, and uncertainty cu

es.This role is about making visual evidence reliable, searchable, reviewable, and useful to humans and downstream evaluation systems. You will work with models such as SAM 3.1, DINOv3, and Cosmos-Embed1, and you will help turn raw video into structured evidence that supports annotation and revi


ew.
Responsibili

  • tiesIntegrate promptable segmentation and tracking models into the annotation evidence pipel
  • ine.Produce masks, boxes, tracks, object IDs, confidence scores, and evidence references for hands, tools, objects, surfaces, fixtures, and state-changing entit
  • ies.Integrate dense visual feature models for visual similarity, mask-quality support, object and scene features, out-of-distribution checks, and transition-evidence supp
  • ort.Integrate video-text embedding models for clip search, inverse video search, semantic deduplication, scenario mining, hard-negative mining, and gold-sample candidate select
  • ion.Build durable evidence artifacts that attach perception outputs to claim-level annotati
  • ons.Create uncertainty and failure flags for occlusion, hidden hands, weak track continuity, fast motion, camera shake, object confusion, poor visibility, track drift, and ambiguous cont
  • act.Build evidence packaging for reviewers, model evaluation, annotation correction, and downstream audit workfl
  • ows.Support reviewer workflows so humans can inspect, correct, or reject masks, tracks, and evidence spans efficien
  • tly.Build visualization and review hooks using Rerun, Foxglove or equivalent tool
  • ing.Work with world-model engineers to provide masks, tracks, features, and embeddings for transition-evidence and latent-residual experime
  • nts.Work with platform engineers to version every model output, including checkpoint, config, preprocessing, prompt or query, frame range, artifact hash, and storage refere
  • nce.Ensure perception and embedding outputs remain evidence and routing features only. They should not automatically determine annotation truth or replace human review for high-risk ca


ses.Required Qualificat

  • ionsStrong Python and PyTorch experie
  • nce.Hands-on experience with modern computer vision pipelines for vi
  • deo.Strong understanding of segmentation, detection, tracking, feature extraction, video preprocessing, and GPU infere
  • nce.Experience with masks, bounding boxes, track IDs, frame sampling, temporal smoothing, video decoding, and artifact generat
  • ion.Experience with promptable segmentation, open-vocabulary perception, or foundation-model-based visual evidence extract
  • ion.Experience turning computer vision outputs into durable product artifacts, including schemas, object storage paths, visualization artifacts, database references, and versioned metad
  • ata.Experience with embedding search or vector databases such as FAISS, Milvus, Qdrant, Weaviate, or simi
  • lar.Ability to evaluate retrieval quality, deduplication quality, track quality, mask quality, and reviewer usefuln
  • ess.Comfort working with human-in-the-loop correction workfl


ows.Preferred Qualificat

  • ionsExperience with SAM, SAM 2, SAM 3, Grounding DINO, DINOv2, DINOv3, CLIP-style video retrieval, Cosmos-Embed, ByteTrack, XMem, CoTracker, TAPIR, or similar syst
  • ems.Experience with egocentric video, robotics datasets, industrial video, human demonstration data, synchronized multimodal streams, MCAP, Rerun, or Foxgl
  • ove.Experience building annotation tools, review UIs, QA sampling systems, active-learning systems, or dataset curation to
  • ols.Experience with privacy-aware video workflows, de-identification, EU data residency, or human-subject video d
  • ata.Familiarity with Physical AI evidence pipelines, claim-level provenance, review routing, and model-assisted annotat


ion.
  • Seniority level

    Not Applicable
  • Employment type

    Full-time
  • Job function

    Information Technology and Engineering
  • Industries

    Robotics Engineering

Referrals increase your chances of interviewing at Miraxis AI by 2x

See who you know
Get notified when a new job is posted.

Similar jobs

People also viewed

Similar Searches

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content