Войти

The neural network from DeepMind has learned to solve 604 tasks of different types

1833
0
0
Image source: DeepMind

Researchers from DeepMind have developed a multimodal neural network capable of performing various types of tasks. For example, she can control a robot, play games for Atari, write text and describe photos. The article about the algorithm is published on arXiv.org, the authors also talked about it on the DeepMind website.

In 2017, researchers from Google Brain presented the neural network architecture Transformer, a distinctive feature of which was the widespread use of the attention mechanism. This allows the neural network to understand the context of words and sentences much better, which in turn has made great progress in the field of natural language processing in general. One of the most famous examples of this progress is the GPT-3 model from OpenAI. It turned out that if you train a model on a huge array of texts, it will learn a good representation of the language and how texts should look, after which it can be quickly and on a very small amount of data to train it to a specific task. Moreover, this task does not have to be textual: it turned out that GPT-3 can perform basic arithmetic operations.

In parallel with the development of universal language models, researchers are developing multimodal models that work simultaneously with different data. Researchers from DeepMind, led by Nando de Freita, have developed a new multimodal neural network Gato, which allows you to use the Transformer architecture to solve a variety of tasks.

Since Transformer was developed for language tasks, this architecture works with text tokens. Accordingly, to work with different data, Gato turns them into tokens. The developers used four tokenization schemes. The text is tokenized in a standard way, in which subwords are allocated in words and encoded with a number from 0 to 32 thousand. The images are divided into squares (16 by 16 squares), and the pixels in them are encoded from -1 to 1, and then these squares are fed into the model line by line. Discrete values are transformed into numbers from 0 to 1024, and continuous values are sampled and transformed into a number or a set of numbers from 32000 to 33024. If necessary, tokens can also be split by dividing tokens.


How the model works with different data

Image source: Scott Reed et al. / arXiv.org, 2022


After tokenization of incoming data, each token turns into embedding (in fact, a compressed vector representation of the same data) in two ways: for images, squares are passed through a convolutional neural network of the ResNet type, and for the rest of the data they are selected through a learned search table (since any token is an integer in a limited range).


Datasets used for training

Image source: Scott Reed et al. / arXiv.org, 2022


The researchers used 24 datasets with different types of data and with their help trained the model to perform 604 tasks. At the same time, the model did not achieve record results on these tasks. In some, for example, in 23 games for Atari, it copes better than people, but this is not a new result for machine learning algorithms — in 2020, DeepMind developed an algorithm that beats people in 57 games at once. In others, it clearly does not reach the level of a person, for example, in the annotation of images:


Examples of neural network image descriptions

Image source: Scott Reed et al. / arXiv.org, 2022


In fact, DeepMind demonstrated the opposite approach: instead of creating a highly specialized model that solves a specific task or a set of related tasks better than others, the developers created a universal model that solves the most tasks, but not very qualitatively.

In addition to multimodal neural networks, researchers are also working on multimodal methods of their training, that is, a single method suitable for training specialized models for working with text, images or sound. Recently we talked about such a method developed by developers from Meta.

Grigory Kopiev

The rights to this material belong to
The material is placed by the copyright holder in the public domain
  • The news mentions
Проекты
Do you want to leave a comment? Register and/or Log in
ПОДПИСКА НА НОВОСТИ
Ежедневная рассылка новостей ВПК на электронный почтовый ящик
  • Discussion
    Update
  • 10.11 09:32
  • 1
The new "miracle weapon" Nothing will change Russia strategically. The United States should simply ignore this rhetoric," the expert believes (Neue Zürcher Zeitung, Switzerland)
  • 10.11 09:27
  • 1
Раскрыта «кошмарная» дальность гиперзвуковой ракеты армии США
  • 10.11 08:13
  • 11425
Without carrot and stick. Russia has deprived America of its usual levers of influence
  • 10.11 07:35
  • 4
Президент Филиппин обсудил покупку подводных лодок с руководством Hanwha Ocean
  • 10.11 07:33
  • 0
Ответ на ""Их точно не хватит". В США раскрыли уязвимость нового бомбардировщика"
  • 10.11 05:03
  • 1530
Корпорация "Иркут" до конца 2018 года поставит ВКС РФ более 30 истребителей Су-30СМ
  • 10.11 03:15
  • 2
Авианосец «Фуцзянь» официально вошёл в состав китайских ВМС
  • 10.11 02:00
  • 1
Российский «Стрелец» с пятью стволами направят в зону СВО
  • 10.11 01:46
  • 4
Прогнозы Илона Маска: ИИ полностью заменит работающих за компьютерами. Приложения исчезнут, а звук и изображение будет создавать ИИ
  • 09.11 15:48
  • 183
Russia has launched production of 20 Tu-214 aircraft
  • 09.11 13:43
  • 2
Vladimir Putin: all plans to create advanced weapons systems are being implemented
  • 08.11 19:15
  • 2
В США предрекли «военный кошмар» для России
  • 08.11 18:45
  • 4
The Russian Ministry of Defense is purchasing a new 300-mm MLRS Sarma
  • 08.11 17:54
  • 66
МС-21 готовится к первому полету
  • 08.11 15:51
  • 2
Will Russia be able to shoot down American Tomahawk cruise missiles? (The National Interest, USA)