Summary Table of NLP Models (HF-based) – Vision-Type Models

Name	Full Name	Architecture	Base Model	Developed	Training Dataset	Lib. & Framework	Use Cases	HF URL	Githhub URL
BEiT	Bidirectional Encoder representation from Image Transformers	Vision Transformer	ViT	2021	ImageNet-21k, ImageNet-1k	PyTorch, Hugging Face Transformers	Image classification, semantic segmentation	https://huggingface.co/microsoft/beit-base-patch16-224	https://github.com/microsoft/unilm/tree/master/beit
BiT	Big Transfer	ResNet	ResNet	2019	JFT-300M, ImageNet-21k	TensorFlow, Hugging Face Transformers	Image classification, transfer learning	https://huggingface.co/google/bit-50	https://github.com/google-research/big_transfer
Conditional DETR	Conditional DETR	Transformer	DETR	2021	COCO	PyTorch, Hugging Face Transformers	Object detection	https://huggingface.co/microsoft/conditional-detr-resnet-50	https://github.com/Atten4Vis/ConditionalDETR
ConvNeXT	ConvNeXT	Convolutional Neural Network	ResNet	2022	ImageNet-1k	PyTorch, Hugging Face Transformers	Image classification	https://huggingface.co/facebook/convnext-tiny-224	https://github.com/facebookresearch/ConvNeXt
ConvNeXTV2	ConvNeXT V2	Convolutional Neural Network	ConvNeXT	2023	ImageNet-22k	PyTorch, Hugging Face Transformers	Image classification	https://huggingface.co/facebook/convnextv2-tiny-1k-224	https://github.com/facebookresearch/ConvNeXt-V2
CvT	Convolutional vision Transformer	Vision Transformer	ViT	2021	ImageNet-1k	PyTorch, Hugging Face Transformers	Image classification	https://huggingface.co/microsoft/cvt-13	https://github.com/microsoft/CvT
DAB-DETR	Dynamic Anchor Boxes DETR	Transformer	Conditional DETR	2022	COCO 2017	PyTorch, Hugging Face Transformers	Object detection	https://huggingface.co/IDEA-Research/dab-detr-resnet-50	https://github.com/IDEA-Research/DAB-DETR
Deformable DETR	Deformable DETR	Transformer	DETR	2020	COCO	PyTorch, Hugging Face Transformers	Object detection	https://huggingface.co/SenseTime/deformable-detr	https://github.com/fundamentalvision/Deformable-DETR
DeiT	Data-efficient image Transformers	Vision Transformer	ViT	2020	ImageNet-1k	PyTorch, Hugging Face Transformers	Image classification	https://huggingface.co/facebook/deit-base-distilled-patch16-224	https://github.com/facebookresearch/deit
Depth Anything	Depth Anything	Vision Transformer	DPT	2024	MiDaS dataset, custom large-scale dataset	PyTorch, Hugging Face Transformers	Monocular depth estimation	https://huggingface.co/LiheYoung/depth-anything-small-hf	https://github.com/LiheYoung/Depth-Anything
Depth Anything V2	Depth Anything V2	Dense Prediction Transformer (DPT)	DINOv2	2024	595K synthetic images, 62M+ real unlabeled images	PyTorch, Hugging Face Transformers	Monocular depth estimation	https://huggingface.co/LiheYoung/depth-anything-small-hf	https://github.com/LiheYoung/Depth-Anything
DepthPro	Depth Pro	Multi-scale Vision Transformer	–	2024	Mix of real and synthetic images	PyTorch	Monocular depth estimation, AR applications	–	–
DETA	Detection Transformers with Assignment	Transformer	Swin Transformer	2023	COCO	PyTorch, Hugging Face Transformers	Object detection	https://huggingface.co/jozhang97/deta-swin-large	https://github.com/jozhang97/DETA
DETR	DEtection TRansformer	Transformer	ResNet	2020	COCO	PyTorch, Hugging Face Transformers	Object detection	https://huggingface.co/facebook/detr-resnet-50	https://github.com/facebookresearch/detr
DiNAT	Dilated Neighborhood Attention Transformer	Hierarchical Vision Transformer	NAT	2022	ImageNet-1k	PyTorch, NATTEN	Image classification, object detection, segmentation	https://huggingface.co/shi-labs/dinat-mini-in1k-224	https://github.com/SHI-Labs/Neighborhood-Attention-Transformer
DINOV2	DINO v2	Vision Transformer	ViT	2023	Curated dataset from diverse sources	PyTorch, Hugging Face Transformers	Image classification, visual feature extraction	https://huggingface.co/facebook/dinov2-base	https://github.com/facebookresearch/dinov2
DINOv2 with Registers	DINO v2 with Registers	Vision Transformer	DINOv2	2025	Same as DINOv2	PyTorch, Hugging Face Transformers	Image classification, visual feature extraction	https://huggingface.co/facebook/dinov2-with-registers-base	https://github.com/facebookresearch/dinov2
DiT	Document Image Transformer	Vision Transformer	BEiT	2022	Various document datasets	PyTorch, Hugging Face Transformers	Document image analysis, layout analysis, table detection	https://huggingface.co/microsoft/dit-base	https://github.com/microsoft/unilm/tree/master/dit
DPT	Dense Prediction Transformer	Vision Transformer	ViT	2021	Various, including NYU Depth V2	PyTorch, Hugging Face Transformers	Monocular depth estimation, semantic segmentation	https://huggingface.co/Intel/dpt-large	https://github.com/isl-org/DPT
EfficientFormer	EfficientFormer	Transformer	–	2022	ImageNet-1K	PyTorch	Image Classification, Object Detection, Segmentation	https://huggingface.co/docs/transformers/model_doc/efficientformer	https://github.com/snap-research/EfficientFormer
EfficientNet	EfficientNet	Convolutional Neural Network	MobileNetV2	2019	ImageNet	TensorFlow, PyTorch	Image classification, transfer learning	https://huggingface.co/google/efficientnet-b0	https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
FocalNet	Focal Modulation Network	Vision Transformer	–	2022	ImageNet-1K, ImageNet-22K	PyTorch	Image classification, object detection, semantic segmentation	https://huggingface.co/microsoft/focalnet-tiny	https://github.com/microsoft/FocalNet
GLPN	Global-Local Path Networks	Hierarchical mix-Transformer	SegFormer	2022	NYU Depth V2, KITTI	PyTorch, Hugging Face Transformers	Monocular depth estimation	https://huggingface.co/vinvino02/glpn-kitti	https://github.com/vinvino02/GLPDepth
Hiera	Hierarchical Vision Transformer	Vision Transformer	–	2023	ImageNet-1K	PyTorch	Image and video recognition	https://huggingface.co/facebook/hiera-base-224	https://github.com/facebookresearch/hiera
I-JEPA	Image Joint Embedding Predictive Architecture	Joint Embedding Predictive Architecture	–	2024	Large-scale image datasets	PyTorch	Self-supervised image representation learning	–	–
ImageGPT	Generative Pretraining from Pixels	GPT-2-like	GPT-2	2020	ImageNet	PyTorch, Transformers	Image Generation, Image Classification	https://huggingface.co/docs/transformers/model_doc/imagegpt	https://github.com/openai/image-gpt
LeViT	LeViT	Vision Transformer	–	2018	ImageNet	PyTorch, Hugging Face Transformers	Image classification	https://huggingface.co/docs/transformers/model_doc/levit	https://github.com/huggingface/transformers
Mask2Former	Masked-attention Mask Transformer	Transformer	Swin Transformer	2022	COCO, ADE20K, Cityscapes	PyTorch, Detectron2	Instance Segmentation, Panoptic Segmentation, Semantic Segmentation	https://huggingface.co/docs/transformers/model_doc/mask2former	https://github.com/facebookresearch/Mask2Former
MaskFormer	MaskFormer	Transformer	–	2023	ADE20K, Cityscapes, COCO, Mapillary Vistas	PyTorch, Hugging Face Transformers	Semantic segmentation, instance segmentation, panoptic segmentation	https://huggingface.co/facebook/maskformer-swin-base-ade	https://github.com/facebookresearch/MaskFormer
MobileNetV1	MobileNet Version 1	Convolutional Neural Network	–	2017	ImageNet	TensorFlow, PyTorch	Mobile and embedded vision applications	https://huggingface.co/google/mobilenet_v1_0.75_192	https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilene
MobileNetV2	MobileNet Version 2	Convolutional Neural Network	MobileNetV1	2019	ImageNet	TensorFlow, Keras, PyTorch	Mobile and embedded vision applications, image classification, object detection	https://huggingface.co/google/mobilenet_v2_1.0_224	https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet
MobileViT	Mobile Vision Transformer	Vision Transformer	–	2021	ImageNet	PyTorch, Hugging Face Transformers	Image classification, object detection	https://huggingface.co/apple/mobilevit-small	https://github.com/apple/ml-cvnets
MobileViTV2	Mobile Vision Transformer Version 2	Vision Transformer	MobileViT	2023	ImageNet	PyTorch, Hugging Face Transformers	Image classification, object detection	https://huggingface.co/apple/mobilevitv2-1.0	https://github.com/apple/ml-cvnets
NAT	Neighborhood Attention Transformer	Vision Transformer	–	2022	ImageNet	PyTorch	Image classification, object detection, segmentation	https://huggingface.co/shi-labs/nat-mini-in1k-224	https://github.com/SHI-Labs/Neighborhood-Attention-Transformer
PoolFormer	PoolFormer	Transformer	–	2022	ImageNet-1K	PyTorch	Image Classification	https://huggingface.co/docs/transformers/model_doc/poolformer	https://github.com/sail-sg/poolformer
PVT	Pyramid Vision Transformer	Vision Transformer	–	2021	ImageNet	PyTorch, Hugging Face Transformers	Image classification, object detection, segmentation	https://huggingface.co/microsoft/pvt-tiny-224	https://github.com/whai362/PVT
PVTv2	Pyramid Vision Transformer Version 2	Vision Transformer	PVT	2022	ImageNet	PyTorch, Hugging Face Transformers	Image classification, object detection, segmentation	https://huggingface.co/microsoft/pvt-v2-b0-224	https://github.com/whai362/PVT
RegNet	Designing Network Design Spaces	ConvNet	–	2020	ImageNet	PyTorch, FAIR	Image Classification, Object Detection	https://huggingface.co/docs/transformers/model_doc/regnet	https://github.com/facebookresearch/pycls
ResNet	Residual Network	Convolutional Neural Network	–	2015	ImageNet	PyTorch, TensorFlow, Keras	Image classification, object detection, segmentation	https://huggingface.co/microsoft/resnet-50	https://github.com/KaimingHe/deep-residual-networks
RT-DETR	Real-Time Detection Transformer	Transformer	–	2024	COCO	PyTorch, Hugging Face Transformers	Real-time object detection	https://huggingface.co/docs/transformers/model_doc/rt_detr	https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rtdetr
RT-DETRv2	Real-Time Detection Transformer Version 2	Transformer	RT-DETR	2024	COCO	PyTorch, Hugging Face Transformers	Real-time object detection	https://huggingface.co/docs/transformers/model_doc/rt_detr	https://github.com/PaddlePaddle/PaddleDetection/tree/develop/configs/rtdetr
SegFormer	Segmentation Transformer	Vision Transformer	–	2021	ADE20K, Cityscapes	PyTorch, Hugging Face Transformers	Semantic segmentation	https://huggingface.co/docs/transformers/model_doc/segformer	https://github.com/NVlabs/SegFormer
SegGpt	Segmenting Everything In Context	Transformer	GPT	2023	SA-1B	PyTorch	Image Segmentation, Visual Grounding	https://huggingface.co/BAAI/SegGPT	https://github.com/baaivision/Painter
SuperGlue	SuperGlue	Graph Neural Network	–	2020	MegaDepth, COCO	PyTorch	Feature Matching, Image Registration	https://huggingface.co/docs/transformers/model_doc/superglue	https://github.com/magicleap/SuperGluePretrainedNetwork
SuperPoint	SuperPoint	ConvNet	–	2018	MS-COCO	PyTorch	Feature Detection, Description	https://huggingface.co/docs/transformers/model_doc/superpoint	https://github.com/magicleap/SuperPointPretrainedNetwork
SwiftFormer	SwiftFormer	Transformer-based with efficient additive attention	–	2023	ImageNet-1K	PyTorch, Hugging Face Transformers	Image classification, mobile vision applications	https://huggingface.co/MBZUAI/swiftformer-s	https://github.com/huggingface/transformers/blob/main/src/transformers/models/swiftformer/modeling_swiftformer.py
Swin Transformer	Swin Transformer	Hierarchical Transformer	–	2021	ImageNet-1K, ImageNet-22K	PyTorch, Hugging Face Transformers	Image classification, object detection, semantic segmentation	https://huggingface.co/microsoft/swin-tiny-patch4-window7-224	https://github.com/microsoft/Swin-Transformer
Swin Transformer V2	Swin Transformer V2	Hierarchical Transformer with improved training stability	Swin Transformer	2022	ImageNet-22K	PyTorch, Hugging Face Transformers	Image classification, object detection, semantic segmentation	https://huggingface.co/microsoft/swinv2-tiny-patch4-window8-256	https://github.com/microsoft/Swin-Transformer
Swin2SR	Swin2SR	Swin Transformer for Super-Resolution	Swin Transformer	2022	DIV2K, Flickr2K	PyTorch	Image super-resolution	https://huggingface.co/caidas/swin2SR-classical-sr-x2-64	https://github.com/mv-lab/swin2sr
Table Transformer	Table Transformer	Transformer-based	DETR	2022	PubTables-1M	PyTorch, Hugging Face Transformers	Table structure recognition	https://huggingface.co/microsoft/table-transformer-detection	https://github.com/microsoft/table-transformer
TextNet	TextNet	CNN-based	–	2018	SynthText, Total-Text	PyTorch	Scene text detection and recognition	https://huggingface.co/microsoft/trocr-base-printed	https://github.com/tonghe90/textnet
Timm Wrapper	PyTorch Image Models Wrapper	Various	–	2025	ImageNet	PyTorch, Hugging Face Transformers	Image classification	https://huggingface.co/docs/transformers/en/model_doc/timm_wrapper	https://github.com/huggingface/transformers
UperNet	Unified Perceptual Parsing Network	Transformer	Various (e.g., Swin, ConvNeXt)	2018	ADE20K, Cityscapes	PyTorch, Hugging Face Transformers	Semantic segmentation	https://huggingface.co/docs/transformers/model_doc/upernet	https://github.com/huggingface/transformers
VAN	Visual Attention Network	Attention-based CNN	–	2022	ImageNet-1K	PyTorch, Hugging Face Transformers	Image classification	https://huggingface.co/Visual-Attention-Network/van-base	https://github.com/Visual-Attention-Network/VAN-Classification
Vision Transformer (ViT)	Vision Transformer	Transformer	–	2020	ImageNet	PyTorch, TensorFlow, Hugging Face Transformers	Image classification	https://huggingface.co/google/vit-base-patch16-224	https://github.com/google-research/vision_transformer
ViT Hybrid	Vision Transformer Hybrid	Hybrid CNN-Transformer	–	2020	ImageNet-21K, ImageNet-1K	PyTorch, Hugging Face Transformers	Image classification	https://huggingface.co/google/vit-hybrid-base-bit-384	https://github.com/google-research/vision_transformer
ViTDet	Vision Transformer for Object Detection	Transformer-based	ViT	2022	COCO	PyTorch, Detectron2	Object detection	https://huggingface.co/facebook/vit-det-base	https://github.com/facebookresearch/detectron2
ViTMAE	Vision Transformer with Masked Autoencoders	Transformer-based	ViT	2021	ImageNet-1K	PyTorch, Hugging Face Transformers	Self-supervised learning, image classification	https://huggingface.co/facebook/vit-mae-base	https://github.com/facebookresearch/mae
ViTMatte	Vision Transformer for Image Matting	Transformer-based	ViT	2022	Adobe Image Matting Dataset	PyTorch	Image matting	https://huggingface.co/hustvl/vitmatte-small-composition-1k	https://github.com/hustvl/ViTMatte
ViTMSN	Vision Transformer with Masked Siamese Networks	Transformer-based	ViT	2022	ImageNet-1K	PyTorch	Self-supervised learning, image classification	https://huggingface.co/facebook/vit-msn-small	https://github.com/facebookresearch/msn
ViTPose	Vision Transformer for Human Pose Estimation	Transformer-based	ViT	2022	COCO	PyTorch, MMPose	Human pose estimation	https://huggingface.co/open-mmlab/vit-pose-base	https://github.com/open-mmlab/mmpose
YOLOS	You Only Look at One Sequence	Transformer-based	DETR	2021	COCO	PyTorch, Hugging Face Transformers	Object detection	https://huggingface.co/hustvl/yolos-tiny	https://github.com/hustvl/YOLOS
ZoeDepth	ZoeDepth	Transformer-based	DPT	2023	NYU Depth V2, KITTI	PyTorch	Monocular depth estimation	https://huggingface.co/shariqfarooq/ZoeDepth	https://github.com/isl-org/ZoeDepth