- Extract descriptive labels, captions, and tags
- Enable advanced semantic and object-level search
- Power downstream filtering, QA, and automation
- Improve annotation coverage and data understanding
Available Enrichment Models
Visual Layer provides a range of built-in models designed for diverse enrichment tasks:| Model Name | Task Type | Description |
|---|---|---|
| VL-Object-Detector | Object Detection | Identifies and localizes objects within images or videos by drawing bounding boxes and classifying each detected object. |
| VL-Image-Tagger | Image Classification | Assigns labels or tags to an entire image, categorizing its content for identification and analysis. |
| VL-Face-Detector | Face Detection | Detects faces and extracts facial landmarks for accurate face alignment and recognition workflows. |
| VL-Image-Captioner | Image to Text | Generates descriptive text that summarizes the content and context of the entire image input. |
| VL Advanced Captioner | Image to Text | A state-of-the-art Vision-Language model that generates detailed captions and answers questions about image content (VQA). |
| VL-Object-Captioner | Object to Text | Generates descriptive text that summarizes detected objects and their interactions in the image. |
| NVILA-Lite-2B | Image to Text | A family of open VLMs designed to optimize both efficiency and accuracy for video understanding and multi-image tasks. |
| VL-Image-Semantic-Search | Semantic Image Search | Enhances image search with conceptual queries, identifying content that matches search intent and improving discovery by understanding visual context. |
| Advanced-Object-Search | Semantic Object Search | Finds objects in images or videos based on meaning and context, beyond simple tags. Quickly retrieves relevant objects using natural language queries. |
| Radiology-Image-Search | Semantic Image Search | Enhances image search with radiology understanding, improving discovery by understanding radiology images and terms. |
Some models require pre-existing enrichments before they can be applied. These dependencies include:
- VL-Object-Captioner requires Object Detection to be applied first.
- Semantic Search models require captions or embeddings from a prior enrichment step.
Coming Soon
These models are in development and will be available in the enrichment catalog:| Model Name | Task Type | Description |
|---|---|---|
| Nv-grounding dino | Object Detection | An open vocabulary zero-shot object detection model with natural language prompts. |
| Advanced-Image-Search | Semantic Image Search | Enhanced conceptual image retrieval using complex queries, identifying content that matches search intent. |
| yolov9 | Object Detection | Object detection model for fast and accurate bounding box predictions. |
Want Early Access?
Get in Touch
Have questions or want to try out upcoming models early?
Contact us to request access or learn more.