Войти

The neural network from DeepMind has learned to solve 604 tasks of different types

1421
0
0
Image source: DeepMind

Researchers from DeepMind have developed a multimodal neural network capable of performing various types of tasks. For example, she can control a robot, play games for Atari, write text and describe photos. The article about the algorithm is published on arXiv.org, the authors also talked about it on the DeepMind website.

In 2017, researchers from Google Brain presented the neural network architecture Transformer, a distinctive feature of which was the widespread use of the attention mechanism. This allows the neural network to understand the context of words and sentences much better, which in turn has made great progress in the field of natural language processing in general. One of the most famous examples of this progress is the GPT-3 model from OpenAI. It turned out that if you train a model on a huge array of texts, it will learn a good representation of the language and how texts should look, after which it can be quickly and on a very small amount of data to train it to a specific task. Moreover, this task does not have to be textual: it turned out that GPT-3 can perform basic arithmetic operations.

In parallel with the development of universal language models, researchers are developing multimodal models that work simultaneously with different data. Researchers from DeepMind, led by Nando de Freita, have developed a new multimodal neural network Gato, which allows you to use the Transformer architecture to solve a variety of tasks.

Since Transformer was developed for language tasks, this architecture works with text tokens. Accordingly, to work with different data, Gato turns them into tokens. The developers used four tokenization schemes. The text is tokenized in a standard way, in which subwords are allocated in words and encoded with a number from 0 to 32 thousand. The images are divided into squares (16 by 16 squares), and the pixels in them are encoded from -1 to 1, and then these squares are fed into the model line by line. Discrete values are transformed into numbers from 0 to 1024, and continuous values are sampled and transformed into a number or a set of numbers from 32000 to 33024. If necessary, tokens can also be split by dividing tokens.


How the model works with different data

Image source: Scott Reed et al. / arXiv.org, 2022


After tokenization of incoming data, each token turns into embedding (in fact, a compressed vector representation of the same data) in two ways: for images, squares are passed through a convolutional neural network of the ResNet type, and for the rest of the data they are selected through a learned search table (since any token is an integer in a limited range).


Datasets used for training

Image source: Scott Reed et al. / arXiv.org, 2022


The researchers used 24 datasets with different types of data and with their help trained the model to perform 604 tasks. At the same time, the model did not achieve record results on these tasks. In some, for example, in 23 games for Atari, it copes better than people, but this is not a new result for machine learning algorithms — in 2020, DeepMind developed an algorithm that beats people in 57 games at once. In others, it clearly does not reach the level of a person, for example, in the annotation of images:


Examples of neural network image descriptions

Image source: Scott Reed et al. / arXiv.org, 2022


In fact, DeepMind demonstrated the opposite approach: instead of creating a highly specialized model that solves a specific task or a set of related tasks better than others, the developers created a universal model that solves the most tasks, but not very qualitatively.

In addition to multimodal neural networks, researchers are also working on multimodal methods of their training, that is, a single method suitable for training specialized models for working with text, images or sound. Recently we talked about such a method developed by developers from Meta.

Grigory Kopiev

The rights to this material belong to
The material is placed by the copyright holder in the public domain
  • The news mentions
Проекты
Do you want to leave a comment? Register and/or Log in
ПОДПИСКА НА НОВОСТИ
Ежедневная рассылка новостей ВПК на электронный почтовый ящик
  • Discussion
    Update
  • 26.04 00:59
  • 1043
Without carrot and stick. Russia has deprived America of its usual levers of influence
  • 26.04 00:07
  • 0
Танцы с бубном: США, Украина и военная помощь
  • 25.04 17:21
  • 2
Суд арестовал замминистра обороны РФ, на которого ФСБ пять лет собирало материалы по коррупции
  • 25.04 15:58
  • 120
A competitor of the Russian Su-75 from South Korea was presented at the exhibition for the first time
  • 25.04 13:48
  • 3
Инженеры NASA впервые за пять месяцев смогли получить данные о состоянии зонда Voyager-1
  • 25.04 13:40
  • 463
Международные расчеты, минуя доллар, по странам
  • 25.04 12:48
  • 0
«Освей»: Космическая Беларусь берется за строительство самолетов
  • 25.04 10:13
  • 478
Израиль "готовился не к той войне" — и оказался уязвим перед ХАМАС
  • 25.04 09:30
  • 2
Striking Beam: how combat lasers are developing
  • 25.04 09:27
  • 9
Российские разведывательно-ударные мультикоптеры предложат на экспорт
  • 25.04 09:12
  • 276
Космонавтика Илона Маска
  • 25.04 09:02
  • 26
Национальная политика и миграция
  • 25.04 08:46
  • 19
Commander of the US Air Force in Europe on the role of aviation in the fighting in Ukraine
  • 25.04 05:17
  • 3
Ходаковский: важно понять, для чего мы начали СВО и как будем дальше жить с побеждённой Украиной
  • 25.04 04:48
  • 2706
Как насчёт юмористического раздела?