Visual Layer Documentation: Visual Intelligence, At Scale

Available Enrichment Models

Visual Layer provides a range of built-in models designed for diverse enrichment tasks:

Model Name	Task Type	Description
VL-Object-Detector	Object Detection	Identifies and localizes objects within images or videos by drawing bounding boxes and classifying each detected object.
VL-Image-Tagger	Image Classification	Assigns labels or tags to an entire image, categorizing its content for identification and analysis.
VL-Face-Detector	Face Detection	Detects faces and extracts facial landmarks for accurate face alignment and recognition workflows.
VL-Image-Captioner	Image to Text	Generates descriptive text that summarizes the content and context of the entire image input.
VL Advanced Captioner	Image to Text	A state-of-the-art Vision-Language model that generates detailed captions and answers questions about image content (VQA).
VL-Object-Captioner	Object to Text	Generates descriptive text that summarizes detected objects and their interactions in the image.
NVILA-Lite-2B	Image to Text	A family of open VLMs designed to optimize both efficiency and accuracy for video understanding and multi-image tasks.
VL-Image-Semantic-Search	Semantic Image Search	Enhances image search with conceptual queries, identifying content that matches search intent and improving discovery by understanding visual context.
Advanced-Object-Search	Semantic Object Search	Finds objects in images or videos based on meaning and context, beyond simple tags. Quickly retrieves relevant objects using natural language queries.
Radiology-Image-Search	Semantic Image Search	Enhances image search with radiology understanding, improving discovery by understanding radiology images and terms.

Some models require pre-existing enrichments before they can be applied. These dependencies include:

VL-Object-Captioner requires Object Detection to be applied first.
Semantic Search models require captions or embeddings from a prior enrichment step.

Labels may come from user annotations, the VL-Object-Detector, or the VL-Image-Tagger.

Coming Soon

These models are in development and will be available in the enrichment catalog:

Model Name	Task Type	Description
Nv-grounding dino	Object Detection	An open vocabulary zero-shot object detection model with natural language prompts.
Advanced-Image-Search	Semantic Image Search	Enhanced conceptual image retrieval using complex queries, identifying content that matches search intent.
yolov9	Object Detection	Object detection model for fast and accurate bounding box predictions.

Introduction

Quick Start

Explore & Search

Collab & Downstream

Models & Enrichment

Advanced Creation & Management

Integrations

Troubleshooting

Explore Model Catalog

Available Enrichment Models

Coming Soon

Want Early Access?

Get in Touch

Introduction

Quick Start

Explore & Search

Collab & Downstream

Models & Enrichment

Advanced Creation & Management

Integrations

Troubleshooting

​Available Enrichment Models

​Coming Soon

​Want Early Access?

Get in Touch

Available Enrichment Models

Coming Soon

Want Early Access?