World’s Top-Performing Japanese VLM at 15 Billion Parameters: Turing Releases “Heron-NVILA-Lite-15B” 2B Model Runs Fast on iPhone -Turing Also Releases part of the STRIDE-QA Dataset —One of the World’s Largest Autonomous Mobility Datasets-
Turing Inc. (Head Office: Shinagawa, Tokyo; CEO: Issei Yamamoto, hereinafter “Turing”) has released a new family of Japanese visual-language models: Heron-NVILA-Lite-15B/2B/1B. The 15B model, boasting 15 billion parameters, achieved a score of 73.5 on the Heron-Bench benchmark and surpassed other open-source models of similar scale across multiple Japanese visual-language evaluations. The 2B model, with 2 billion parameters, enables fully local and high-speed inference on iPhone devices.
This research was conducted as part of the Japanese Ministry of Economy, Trade and Industry (METI) and NEDO’s GENIAC (Generative AI Accelerator Challenge) program supporting generative AI research in Japan.
Turing has also released several major research assets, including:
- MOMIJI, the world’s largest interleaved※1 Japanese-vision dataset.
- STRIDE-QA, one of the world’s largest 3D datasets※2 for autonomous mobility, combining language and spatial-temporal information.
※1: A format that records data in a preserved sequence, allowing models to learn the connections between text, images, and the surrounding context.
※2: Based on Turing’s own research, this is the largest interleaved Japanese–image dataset known to date.
Background
Autonomous driving requires perception and decision-making systems that can instantly interpret real-world environments for safe operations. At the core of such systems are multimodal large language models (MLLMs), which acquire human-like common sense, background knowledge, and contextual understanding by learning from data such as images (vision), text (language), and embodied multimodal foundation models that build upon these MLLMs to learn from real-world sensor inputs to control outputs. However, high-quality Japanese-language training data that can simultaneously handle both vision and language is scarce, and there are few research cases involving lightweight high-performance MLLMs and embodied multimodal foundation models designed for in-vehicle implementation.
Turing has conducted multiple research and development initiatives as part of the GENIAC program, including the advancement of multimodal models, the construction of autonomous mobility datasets incorporating three-dimensional information, and the development of embodied autonomous driving models. The resulting model files and source code are now publicly available.
About Heron-NVILA-Lite-15B
Heron-NVILA-Lite-15B is a 15B-parameter open-source Japanese visual-language model designed to understand contextual and cultural background in Japanese. It scored 73.5 on the Heron-Bench benchmark for Japanese image-language tasks, outperforming other public models of similar scale (as of May 2025, per Turing’s evaluation).
This model uses an interleaved training format, where text and images are alternated during pre-training. This technique has proven effective in Japanese-language modeling.
Model: Heron-NVILA-Lite-15B on Hugging Face
Training details: Turing Tech Blog
About the Heron App for iOS
Turing also developed the Heron App for iOS, an image analysis AI app capable of fast local inference directly on smartphones. The app was optimized for offline performance on mobile devices by scaling the model down to 2 billion parameters
The app will be released on the App Store soon. Technical details and optimization strategies will be shared via Turing’s Tech Blog.
About the MOMIJI Dataset
MOMIJI (Modern Open Multimodal Japanese-filtered Dataset) is the world’s largest pre-training dataset for Japanese visual-language models in interleaved format. The dataset was released as 249 million image URLs paired with text in JSONL format. Further details will be provided in an upcoming post on our tech blog.
Hugging Face:https://huggingface.co/datasets/turing-motors/MOMIJI
About the STRIDE-QA Dataset
STRIDE-QA (SpatioTemporal Reasoning In Driving Environments QA) is one of the world’s largest 3D datasets for autonomous mobility. It was constructed by extracting 100 hours and 20,000 scenes from more than 3,500 hours of driving data collected by Turing in central Tokyo using cameras, LiDAR, and other sensors.
Each scene is annotated with consistent IDs and 3D bounding boxes for all traffic objects, such as vehicles and pedestrians. This allows for spatial tracking and continuous temporal tracking. A total of 12.63 million question–answer pairs have been generated, including both object-centric and ego-vehicle-centric questions such as “Is there a pedestrian on the crosswalk?” and “What will be the distance to the car ahead in two seconds?” These allow evaluation of an AI system’s ability to describe situations and predict future developments.
STRIDE-QA-mini, a partial version of the dataset that includes 200 scenes and approximately 120,000 Q&A pairs, has already been released to academic institutions. The full version is planned for a future release.
Hugging Face:https://huggingface.co/datasets/turing-motors/STRIDE-QA-Mini
The results presented in this press release are based on research supported by GENIAC (Generative AI Accelerator Challenge), a project led by Japan’s Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Development Organization (NEDO), aimed at strengthening domestic generative AI capabilities.
Reference press release:https://tur.ing/en/news/20241010
Company Overview
Company Name: Turing Inc.
Headquarters: East Tower 4th floor, Gate City Ohsaki, 1-11-2 Osaki, Shinagawa-ku, Tokyo
CEO: Issei Yamamoto
Founded: August 2021
Business: Development of fully autonomous driving technologies
URL: https://tur.ing/
Careers
Turing is hiring individuals passionate about transforming the world through fully autonomous Japanese driving technology. Please visit our careers page to learn more. We also host regular events such as open office sessions and Tech Talks.
Media Contact
PR Representative (Abe): pr@turing-motors.com