18.05.2022

The neural network from DeepMind has learned to solve 604 tasks of different types

Sections: General industry information, Electronics and optics, Status and prospects, New development

1866

0

Image source: DeepMind

Researchers from DeepMind have developed a multimodal neural network capable of performing various types of tasks. For example, she can control a robot, play games for Atari, write text and describe photos. The article about the algorithm is published on arXiv.org, the authors also talked about it on the DeepMind website.

In 2017, researchers from Google Brain presented the neural network architecture Transformer, a distinctive feature of which was the widespread use of the attention mechanism. This allows the neural network to understand the context of words and sentences much better, which in turn has made great progress in the field of natural language processing in general. One of the most famous examples of this progress is the GPT-3 model from OpenAI. It turned out that if you train a model on a huge array of texts, it will learn a good representation of the language and how texts should look, after which it can be quickly and on a very small amount of data to train it to a specific task. Moreover, this task does not have to be textual: it turned out that GPT-3 can perform basic arithmetic operations.

In parallel with the development of universal language models, researchers are developing multimodal models that work simultaneously with different data. Researchers from DeepMind, led by Nando de Freita, have developed a new multimodal neural network Gato, which allows you to use the Transformer architecture to solve a variety of tasks.

Since Transformer was developed for language tasks, this architecture works with text tokens. Accordingly, to work with different data, Gato turns them into tokens. The developers used four tokenization schemes. The text is tokenized in a standard way, in which subwords are allocated in words and encoded with a number from 0 to 32 thousand. The images are divided into squares (16 by 16 squares), and the pixels in them are encoded from -1 to 1, and then these squares are fed into the model line by line. Discrete values are transformed into numbers from 0 to 1024, and continuous values are sampled and transformed into a number or a set of numbers from 32000 to 33024. If necessary, tokens can also be split by dividing tokens.

How the model works with different data

Image source: Scott Reed et al. / arXiv.org, 2022

After tokenization of incoming data, each token turns into embedding (in fact, a compressed vector representation of the same data) in two ways: for images, squares are passed through a convolutional neural network of the ResNet type, and for the rest of the data they are selected through a learned search table (since any token is an integer in a limited range).

Datasets used for training

Image source: Scott Reed et al. / arXiv.org, 2022

The researchers used 24 datasets with different types of data and with their help trained the model to perform 604 tasks. At the same time, the model did not achieve record results on these tasks. In some, for example, in 23 games for Atari, it copes better than people, but this is not a new result for machine learning algorithms — in 2020, DeepMind developed an algorithm that beats people in 57 games at once. In others, it clearly does not reach the level of a person, for example, in the annotation of images:

Examples of neural network image descriptions

Image source: Scott Reed et al. / arXiv.org, 2022

In fact, DeepMind demonstrated the opposite approach: instead of creating a highly specialized model that solves a specific task or a set of related tasks better than others, the developers created a universal model that solves the most tasks, but not very qualitatively.

In addition to multimodal neural networks, researchers are also working on multimodal methods of their training, that is, a single method suitable for training specialized models for working with text, images or sound. Recently we talked about such a method developed by developers from Meta.

Grigory Kopiev

The rights to this material belong to N+1
The material is placed by the copyright holder in the public domain

The news mentions

Проекты

2020-й год

Do you want to leave a comment? Register and/or Log in

Log in via Facebook

ПОДПИСКА НА НОВОСТИ

Ежедневная рассылка новостей ВПК на электронный почтовый ящик

Discussion
Update

01.12 18:23
11740

Without carrot and stick. Russia has deprived America of its usual levers of influence

01.12 18:12
5

В США перспективы российского Су-75 сравнили с «полетом фантазии»

01.12 15:50
2

Малые крылья большой страны: в России хотят создать до пяти новых моделей легких самолетов

01.12 12:16
1

01.12 12:13
1

01.12 11:56
1

Госкомвоенпром Республики Беларусь: новые ЗРК "Тор-М2" на шасси ОАО "МЗКТ" прибыли в войска

01.12 11:14
6

B-21 Raider and the limits of the technological revolution in military affairs

01.12 08:49
2

Комментарий к "B-21 Raider и пределы технологической революции в военном деле"

30.11 21:24
1

Система С-300П и ее модификации

30.11 17:20
42

Italy has ordered the first KF41 Lynx infantry fighting vehicles

30.11 05:37
3

Трамп назвал "произведением искусства" бомбардировщики, наносившие удары по Ирану

29.11 21:04
1

Scott Ritter: Without arms control, humanity is heading for destruction

29.11 20:52
0

Комментарий к "Пуститься по миру: как Киев намерен содержать армию в 800 тыс. человек"

29.11 19:36
0

Комментарий к "Почему оторванная от реальности Европа не станет поддерживать Украину в конфликте (The Telegraph UK, Великобритания)"

29.11 15:47
80

МС-21 готовится к первому полету