Autonomous EV Developer Turing Releases Multimodal Learning Library “Heron” and Large-Scale Models with up to 70 Billion ParametersCreating a Composite Japanese/English Image-Language Model
Turing Inc. (Kashiwa City, Chiba; CEO: Issei Yamamoto; hereinafter “Turing”), a company committed to the development and sales of fully autonomous electric vehicles, announces the release of the multilingual large-scale multimodal learning library “Heron” and models that were trained with it. Turing is developing AI models capable of interpreting visual information and converting it into human-like language for advanced autonomous driving applications. The newly released multimodal models have up to 70 billion parameters and utilize technology and insights from Turing, furthering the development of fully autonomous driving.
About Multimodal:
Large Language Models (LLMs) are trained using large amounts of text data to acquire broad knowledge and produce human-like responses. The inputs and outputs of these LLMs are typically confined to text, posing challenges for tasks involving visual information (images, etc.).
For example, to answer the question, “What is interesting about this photo of a cat lying on a bathroom sink?”, the model must understand both image and language inputs. This multiple input approach is referred to as a “multimodal” input.
The released multimodal models include a “pre-trained image encoder” for image recognition, a “large-scale language model,” and an “adapter” that connects these two components. After training the adapter, further training is conducted on the image encoder and the large-scale language model, allowing the system to recognize images and utilize the extensive knowledge of the language model to respond.
About Multimodal Learning Library ‘Heron’:
Turing’s multimodal learning library “Heron” connects image recognition models and large-scale language models, providing datasets, codes for additional training, and pre-trained models.
A key feature of Heron’s model training is its use of natural dialogue-based datasets. Previous multimodal models could only provide simple responses, but models developed using Heron can generate natural sentences, understanding and responding to the context of previous questions.
The training library allows flexible modification of the large-scale language models, leveraging the capabilities of existing models while being adaptable to new large-scale models as they are developed and released. The library is designed to systematically learn multimodal models, and the source code has been made available under the Apache License 2.0 for research and commercial use.
The Heron-trained multimodal models released include Llama 2-chat, ELYZA-Llama 2, and Japanese StableLM.
A demo of these models can be tested in a web browser at:
https://huggingface.co/spaces/turing-motors/heronchatblip
Turing also released large-scale Japanese image/text datasets, translating approximately 150,000 English image/text datasets with annotations and Q&A to Japanese. This release marks the first publication of a large-scale Japanese dataset designed for dialogue-style multimodal learning.
Training Library Public URL:
https://github.com/turingmotors/heron
Public URL for Multimodal Model Groups:
https://huggingface.co/turing-motors
Public URL for Training Dataset:
https://huggingface.co/datasets/turing-motors/LLaVA-Instruct-150K-JA
The Relationship between LLMs and Fully Autonomous Driving
In recent years, as AI technology has advanced, Large Language Models (LLMs) have garnered significant attention. LLMs are AI models that learn from vast amounts of text data and can generate human-like natural text or answer questions. Turing believes that achieving fully autonomous driving requires autonomous driving AI that understands the world as well as, or better than, humans. Therefore, Turing is advancing the development of multimodal models that include LLMs, which comprehend and understand the world at an extremely high level through language.
Reference Releases:
<https://prtimes.jp/main/html/rd/p/000000024.000098132.html>
<https://prtimes.jp/main/html/rd/p/000000032.000098132.html>
About Turing
Turing is startup with the stated mission, “We Overtake Tesla.” Their goal is to mass-produce fully autonomous EVs. It was co-founded in 2021 by Issei Yamamoto, developer of the AI shogi program “Ponanza” and a specially-appointed associate professor at Nagoya University, and Shunsuke Aoki, who earned his Ph.D. researching autonomous driving at Carnegie Mellon University. Turing is using deep learning AI technology to create a comprehensive autonomous driving society.
Company Overview
Company Name: Turing Inc.
CEO: Issei Yamamoto
Established: August 2021
Capital: 30 million yen (as of September 2022)
Headquarters: 4th Floor, East Tower, Gate City Osaki, 1-11-2 Osaki, Shinagawa-ku, Tokyo
Business: Development and manufacturing of fully autonomous EVs
URL: https://tur.ing
Recruitment Information
We are seeking team members to join us in creating fully autonomous driving software and EVs.
Recruitment page: https://tur.ing/jobs
Media Contact
Turing Inc. Public Relations : pr@turing-motors.com