Like human brains, large language models reasons for different data in general

While the models of early languages ​​could only process text, the models of contemporary large languages ​​are now performing very different tasks on different types of data. For example, LLMS can understand many languages, generate computer code, solve mathematical problems, orwer questions about images and sound.

MIT scientists examined the internal functioning of LLM to better understand how they process such data, and found evidence that they share certain similarities to the human brain.

Neurosciences believe that the human brain has a “semantic hub” in the front temporal lobe that integrates semantic information from different modality, such as visual data and tactile inputs. This semantic center is associated with “rays” of modality, which is information about the route to the center. MIT scientists have found that LLM uses a similar mechanism by abstract data processing from different modalities in a central, generalized way. For example, a model that has English as its dominant language would rely on English as a central medium to process input in Japanese, or the reason for arithmetic, computer code, etc. Scientists also show that they can intervene in the model center using text in the dominant language of the model to change their outputs.

These findings could help scientists the future LLMs training that are better able to handle different data.

“LLM are large black cabinets. They have achieved a very impressive performance, but we have a very Littler knowledge of their internal work mechanisms. I hope it can be an early step to better understand how they can improve and control them better if necessary,” says Zhaofeng Wu, electrical engineering and computer science.

His co -authors include Xinyan Velocity Yu, a postgraduate student at the University of Southern California (USC); Dani Yogatama, Associate Professor at USC; Jiasen Lu, Scientist at Apple; and head of Yoon Kim, Associate Professor EECS at MIT and member of computer science and Laboratory Artificial Intelligence Laboratory (CSail). The research will be presented at an international conference on learning representations.

Integration of different data

Scientists have established a new study on previous work, which Hined, which LLM focuses on English uses English to perform processes in different languages.

Wu and its collaborators have expanded this idea and launched an in -depth study of the mechanisms that LLM used to process different data.

LLM, which is composed of many interconnected layers, divides the input text into words or pads called chips. The model assigns the representation of each token that allows him to explore the relationships between the chips and generate another word order. In the case of images or sound, these tokens correspond to specific areas of the image or part of the audio clip.

Scientists have found that the initial layers of the model process data in their specifics or modality, such as modalities specific rays in the human brain. LLM then transforms the tokens into modalities-English representation, because the reasons for them through their inner strata, similar to the semantic brain center, a variety of information.

The model assigns similar representations to inputs with similar meanings, despite their data type, including images, sound, computer code and arithmetic problems. Although the image and its text headline are different data types because they share the same meaning, LLM would assign similar representations to them.

For example, the English dominant LLM “thinks” about the input of Chinese text in English before generating output in Chinese. The model has a similar tendency to justify nontext inputs such as computer code, mathematical problems or even multimodal data.

To test this hypothesis, scientists have undergone a few sentences with the same meaning, but a writer in two differentials using a model. They measured how similar the model of the model for every Sege were.

They then perform a second set of experience, where they fed English dominant model text in another language, such as the Chinese, and measured how similar its internal representation to English versus the Chinese. Scientists have done a similar experiment for other types of data.

They found that the representation of the model was similar to sentences with similar meanings. In addition, there were many data types that the tokens that the model had processed in its inner layers were more like tokens focused on English rather than type of input data.

“Many of these input data seem to be very different from the language, so we were very surprised to explore English tokens when a model, such as mathematical or coding expressions,” says Wu.

Using the semantic center

Scientists think that LLMS can learn this strategy of a semantic hub during training because it is an economic way to process diverse data.

“There are thousands of languages, but a lot is now shared, such as cutlery Nowleged or Fillored Nowled. The model may not duplicate it is now,” Wu says.

Scientists also tried to intervene in the internal layers of the model using English text when it processed other languages. They found that they could predict to change the outputs of the model, although the outputs are in other languages.

Scientists could use this phenomenon to encourage the model to share as much information as possible across different data types, which potentially increased efficiency.

On the other hand, there could be concepts or now that are not translated across languages ​​or data types such as culturally specific knowledge. Scientists might want LLMS to have any language processing mechanisms in these cases.

“How do you share as much as possible whenever possible, but do you also allow languages ​​to have some language processing mechanisms?

In addition, scientists could use this knowledge to improve multilingual models. The model dominating English, which learns to speak in another language, often loses part of its accuracy in English. A better understanding of the LLM semantic center could help scientists this language intervention, he says.

“Understanding Hownganga Process Inputs across languages ​​and modalities is a key issue in artificial intelligence. This document creates an interesting connection with neuroscience and shows that the proposed” semantic charge “is held in modern language models where they are semantically created in this work. “Hypothesis and experience combine and expand the findings from previous works and could influence future research into the creation of better multimodal models and studying the binding between them and the function of brain and exploring in humans.”

This research is partly financed by the Laboratory of MIT-IM WASON AI.

(Tagstotranslate) zhaofeng wu

Leave a Comment