Yuanhong Yu
I am a Ph.D. student at Zhejiang University, supervised by Sida Peng and Xiaowei Zhou. Prior to this, I received my B.Eng. in Computer Science from Northwestern Polytechnical University, advised by Jiaqi Yang.
My research focuses on Computer Vision, 3D Vision, and Vision-Language Models. Alongside academic research, I actively build open-source tools at the frontier of AI agent infrastructure and developer experience.
Currently working on
Spatial Intelligence
3D understanding from vision
AI Agent Infrastructure
Developer tools ecosystem
Vision-Language Models
Multimodal understanding
Publications
BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation
Yuanhong Yu, Xingyi He, Chen Zhao, Junhao Yu, Jiaqi Yang, Ruizhen Hu, Yujun Shen, Xing Zhu, Xiaowei Zhou, Sida Peng
We present BoxDreamer, a novel framework that estimates 6-DoF object poses by predicting 3D bounding box corners from a single RGB image. Our approach achieves strong generalization to unseen object categories without requiring CAD models or category-specific training, enabling real-world deployment across diverse scenarios.
Preprints
EARTalking: End-to-end GPT-style Autoregressive Talking Head Synthesis with Frame-wise Control
Yuzhe Weng, Haotian Wang, Yuanhong Yu, Jun Du, Shan He, Xiaoyan Wu, Haoran Xu
We propose EARTalking, a novel end-to-end, GPT-style autoregressive model for interactive audio-driven talking head generation. Our method introduces a frame-by-frame, in-context, audio-driven streaming generation paradigm with Sink Frame Window Attention for variable-length video generation with identity consistency.
Open Source
Beyond research, I build open-source tools that make AI-assisted development safer and more observable. The agent ecosystem below spans security, observability, and workflow automation for Claude Code and beyond.
VibePortrait
AI-powered developer personality portrait generator. Analyzes coding conversations to create visual personality profiles with MBTI mapping and famous person matching.
View projectVibeGuard
Real-time security scanner for AI-assisted coding. Protects against prompt injection, secret leaks, and unsafe file operations.
agentop
Lightweight observability toolkit for AI agent workflows. Monitor token usage, trace execution, and debug agent behavior in real time.