Robotics Research Engineer, Video Perception & Auto-labeling
Robotics Research Engineer, Video Perception & Auto-labeling
Miraxis AI
United States
See who Miraxis AI has hired for this role
See who Miraxis AI has hired for this role
Miraxis builds AI-assisted data generation systems for Physical AI and robotics. We work with egocentric video, multimodal demonstrations, and human-in-the-loop review workflows to produce high-quality training and evaluation data for embodied AI systems.
Our platform uses foundation models to draft annotations, extract visual evidence, identify uncertainty, and route difficult cases to human review. Model outputs are treated as evidence and decision-support signals, not as automatic ground truth.
Miraxis AI was founded by Artem Sokolov, who is also the Founder and CEO of Humanoid AI. https://thehumanoid.ai/te%d0%b0m
/
Role Summ
aryWe are hiring a Robotics Research Engineer for Video Perception & Auto-labeling, to build the visual evidence layer of our egocentric video annotation pipeli
ne.You will own the perception artifacts that make model-assisted annotation inspectable: masks, boxes, tracks, dense visual features, clip embeddings, similarity links, deduplication flags, evidence spans, and uncertainty cu
es.This role is about making visual evidence reliable, searchable, reviewable, and useful to humans and downstream evaluation systems. You will work with models such as SAM 3.1, DINOv3, and Cosmos-Embed1, and you will help turn raw video into structured evidence that supports annotation and revi
ew.
Responsibili
- tiesIntegrate promptable segmentation and tracking models into the annotation evidence pipel
- ine.Produce masks, boxes, tracks, object IDs, confidence scores, and evidence references for hands, tools, objects, surfaces, fixtures, and state-changing entit
- ies.Integrate dense visual feature models for visual similarity, mask-quality support, object and scene features, out-of-distribution checks, and transition-evidence supp
- ort.Integrate video-text embedding models for clip search, inverse video search, semantic deduplication, scenario mining, hard-negative mining, and gold-sample candidate select
- ion.Build durable evidence artifacts that attach perception outputs to claim-level annotati
- ons.Create uncertainty and failure flags for occlusion, hidden hands, weak track continuity, fast motion, camera shake, object confusion, poor visibility, track drift, and ambiguous cont
- act.Build evidence packaging for reviewers, model evaluation, annotation correction, and downstream audit workfl
- ows.Support reviewer workflows so humans can inspect, correct, or reject masks, tracks, and evidence spans efficien
- tly.Build visualization and review hooks using Rerun, Foxglove or equivalent tool
- ing.Work with world-model engineers to provide masks, tracks, features, and embeddings for transition-evidence and latent-residual experime
- nts.Work with platform engineers to version every model output, including checkpoint, config, preprocessing, prompt or query, frame range, artifact hash, and storage refere
- nce.Ensure perception and embedding outputs remain evidence and routing features only. They should not automatically determine annotation truth or replace human review for high-risk ca
ses.Required Qualificat
- ionsStrong Python and PyTorch experie
- nce.Hands-on experience with modern computer vision pipelines for vi
- deo.Strong understanding of segmentation, detection, tracking, feature extraction, video preprocessing, and GPU infere
- nce.Experience with masks, bounding boxes, track IDs, frame sampling, temporal smoothing, video decoding, and artifact generat
- ion.Experience with promptable segmentation, open-vocabulary perception, or foundation-model-based visual evidence extract
- ion.Experience turning computer vision outputs into durable product artifacts, including schemas, object storage paths, visualization artifacts, database references, and versioned metad
- ata.Experience with embedding search or vector databases such as FAISS, Milvus, Qdrant, Weaviate, or simi
- lar.Ability to evaluate retrieval quality, deduplication quality, track quality, mask quality, and reviewer usefuln
- ess.Comfort working with human-in-the-loop correction workfl
ows.Preferred Qualificat
- ionsExperience with SAM, SAM 2, SAM 3, Grounding DINO, DINOv2, DINOv3, CLIP-style video retrieval, Cosmos-Embed, ByteTrack, XMem, CoTracker, TAPIR, or similar syst
- ems.Experience with egocentric video, robotics datasets, industrial video, human demonstration data, synchronized multimodal streams, MCAP, Rerun, or Foxgl
- ove.Experience building annotation tools, review UIs, QA sampling systems, active-learning systems, or dataset curation to
- ols.Experience with privacy-aware video workflows, de-identification, EU data residency, or human-subject video d
- ata.Familiarity with Physical AI evidence pipelines, claim-level provenance, review routing, and model-assisted annotat
-
Seniority level
Not Applicable -
Employment type
Full-time -
Job function
Information Technology and Engineering -
Industries
Robotics Engineering
Referrals increase your chances of interviewing at Miraxis AI by 2x
See who you knowSimilar jobs
People also viewed
-
Senior, ML Engineer - Offline Perception
Senior, ML Engineer - Offline Perception
-
Senior Machine Learning/Computer Vision Engineer
Senior Machine Learning/Computer Vision Engineer
-
Machine Learning Engineer
Machine Learning Engineer
-
Senior AI/ML Engineer - Future Sensing, Embodied AI
Senior AI/ML Engineer - Future Sensing, Embodied AI
-
Research Engineer
Research Engineer
-
Computer Vision Engineer
Computer Vision Engineer
-
Autonomy Engineer - Perception
Autonomy Engineer - Perception
-
Software Engineer, Perception
Software Engineer, Perception
-
Lead Perception Engineer
Lead Perception Engineer
-
Machine Learning Engineer: Perception
Machine Learning Engineer: Perception
Similar Searches
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content