
SenseFoundry VL

Product Overview
In the AI 2.0 era, SenseTime launches this platform based on its 'Rixin' (Daily Innovation) large model ecosystem. Upgraded from the original 'Ark' City Open Platform, it addresses key challenges in traditional vision-based services including: limited interaction modes, poor generalization across diverse scenarios, high false alarm rates in complex environments, and steep learning curves for large model adoption. As SenseTime's first 'large model + urban services' application, this platform enhances intelligence and efficiency in city service scenarios, unleashing new quality productive forces.
Product Features
-
Requirement Instruction Conversion
Leveraging Agent technology to transform requirement descriptions into concrete instructions, enabling human-like interaction experiences. -
Image-Text Semantic Retrieval
Utilizing large model’s visual-semantic understanding capabilities to achieve intelligent semantic retrieval and conversational analysis of image-text content. -
Rapid Model Generation
Capitalizing on the large model’s strong generalization ability, enabling one-click generation of both general-purpose and specialized models with just a few positive/negative samples and prompt adjustments.
Product Highlights
-
Fusion Architecture & Multi-Modal Technology
Integrated visual-language large model combining: • Image-text contrastive learning, • Text understanding & generation, • Cross-modal comprehension. Hybrid architecture combining vision models with foundation models; AI agent integration providing end-to-end smart city solutions -
Domestic Hardware Optimization
Fully compatible with leading domestic chips and servers; Comprehensive computing power support for localized large model deployment -
Configurable Smart City Solutions
Highly customizable development for urban digitalization needs; Innovative prompt engineering with visual-text interface: • Ensures broad applicability, • Lowers system adoption barriers
Application Value

-
Enhancing Government Efficiency
Boosts the classification and dispatch speed of 12345 service tickets by over 10 times, with secondary review accuracy exceeding 90%. -
Precise Video Retrieval
Enables natural language-based search across 100,000+ front-end surveillance points, delivering minute-level precise retrieval. -
Rapid Scenario Deployment
New scenario analysis functions can be deployed within 1 day, with manual fine-tuning completed in just 3 days.




