Han-Bit Kang
hbkang
AI & ML interests
ML
Recent Activity
updated
a collection
about 9 hours ago
OCR-dataset
updated
a collection
about 9 hours ago
OCR-dataset
updated
a collection
about 9 hours ago
cool-papers
Organizations
None yet
Makeup Transfer
interesting architecture
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 28 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
talking-head-generation
-
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Paper • 2312.13578 • Published • 29 -
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Paper • 2312.13150 • Published • 16 -
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Paper • 2312.03029 • Published • 26 -
Relightable Gaussian Codec Avatars
Paper • 2312.03704 • Published • 33
full-body-generation
-
URHand: Universal Relightable Hands
Paper • 2401.05334 • Published • 25 -
Synthesizing Moving People with 3D Control
Paper • 2401.10889 • Published • 12 -
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Paper • 2312.02973 • Published • 1 -
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Paper • 2312.05941 • Published • 1
cool-papers
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 119
OCR
-
PubTables-1M: Towards comprehensive table extraction from unstructured documents
Paper • 2110.00061 • Published • 3 -
Optimized Table Tokenization for Table Structure Recognition
Paper • 2305.03393 • Published • 1 -
Qwen3-VL Technical Report
Paper • 2511.21631 • Published • 120 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 106
ID-Preserving Generation
-
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Paper • 2404.15275 • Published -
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
Paper • 2403.13535 • Published • 23 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
GHOST 2.0: generative high-fidelity one shot transfer of heads
Paper • 2502.18417 • Published • 68
generative-model-training
-
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 61 -
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 50 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 36 -
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Paper • 2401.11605 • Published • 23
artistic rendering
-
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Paper • 2401.00935 • Published • 18 -
Derendering/InkSight-Small-p
Updated • 62 • 33 -
E^{2}GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
Paper • 2401.06127 • Published -
Acoustic Volume Rendering for Neural Impulse Response Fields
Paper • 2411.06307 • Published • 5
text-to-media
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 43 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 42 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper • 2402.05054 • Published • 28
OCR-dataset
OCR
-
PubTables-1M: Towards comprehensive table extraction from unstructured documents
Paper • 2110.00061 • Published • 3 -
Optimized Table Tokenization for Table Structure Recognition
Paper • 2305.03393 • Published • 1 -
Qwen3-VL Technical Report
Paper • 2511.21631 • Published • 120 -
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper • 2510.14528 • Published • 106
Makeup Transfer
ID-Preserving Generation
-
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
Paper • 2404.15275 • Published -
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
Paper • 2403.13535 • Published • 23 -
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Paper • 2309.05793 • Published • 50 -
GHOST 2.0: generative high-fidelity one shot transfer of heads
Paper • 2502.18417 • Published • 68
interesting architecture
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 28 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 24 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
generative-model-training
-
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Paper • 2310.00426 • Published • 61 -
A Picture is Worth a Thousand Words: Principled Recaptioning Improves Image Generation
Paper • 2310.16656 • Published • 50 -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
Paper • 2310.16825 • Published • 36 -
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers
Paper • 2401.11605 • Published • 23
talking-head-generation
-
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Paper • 2312.13578 • Published • 29 -
Splatter Image: Ultra-Fast Single-View 3D Reconstruction
Paper • 2312.13150 • Published • 16 -
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Paper • 2312.03029 • Published • 26 -
Relightable Gaussian Codec Avatars
Paper • 2312.03704 • Published • 33
artistic rendering
-
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
Paper • 2401.00935 • Published • 18 -
Derendering/InkSight-Small-p
Updated • 62 • 33 -
E^{2}GAN: Efficient Training of Efficient GANs for Image-to-Image Translation
Paper • 2401.06127 • Published -
Acoustic Volume Rendering for Neural Impulse Response Fields
Paper • 2411.06307 • Published • 5
full-body-generation
-
URHand: Universal Relightable Hands
Paper • 2401.05334 • Published • 25 -
Synthesizing Moving People with 3D Control
Paper • 2401.10889 • Published • 12 -
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Paper • 2312.02973 • Published • 1 -
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Paper • 2312.05941 • Published • 1
text-to-media
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 43 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 42 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper • 2402.05054 • Published • 28
cool-papers
-
Depth Anything V2
Paper • 2406.09414 • Published • 103 -
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels
Paper • 2406.09415 • Published • 51 -
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Paper • 2406.04338 • Published • 39 -
SAM 2: Segment Anything in Images and Videos
Paper • 2408.00714 • Published • 119