Zhensong Zhang's picture

14

Zhensong Zhang

JasonCU

zhensongzhang@hotmail.com

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

upvoted a paper 6 days ago

4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

upvoted a paper 7 days ago

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

View all activity

Organizations

None yet

upvoted 2 papers 6 days ago

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Paper • 2512.03000 • Published 9 days ago • 34

4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer

Paper • 2512.05060 • Published 7 days ago • 18

upvoted 7 papers 7 days ago

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

Paper • 2512.01707 • Published 10 days ago • 7

Accelerating Streaming Video Large Language Models via Hierarchical Token Compression

Paper • 2512.00891 • Published 11 days ago • 14

CaptionQA: Is Your Caption as Useful as the Image Itself?

Paper • 2511.21025 • Published 16 days ago • 25

DeepEyesV2: Toward Agentic Multimodal Model

Paper • 2511.05271 • Published Nov 7 • 42

VisPlay: Self-Evolving Vision-Language Models from Images

Paper • 2511.15661 • Published 22 days ago • 42

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 315

ViDiC: Video Difference Captioning

Paper • 2512.03405 • Published 9 days ago • 26

upvoted 3 papers 8 days ago

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published 21 days ago • 106

WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

Paper • 2512.02425 • Published 9 days ago • 22

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published 16 days ago • 150

upvoted an article 8 months ago

Article

Hugging Face to sell open-source robots thanks to Pollen Robotics acquisition 🤖

+1

Apr 14

•

48

upvoted a paper about 1 year ago

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Paper • 2409.18042 • Published Sep 26, 2024 • 40