Since RankBrain is a machine learning system, it relies on its experience with previous search queries, establishes links, makes predictions about what the user is looking for, and how best to answer their question. It’s necessary to solve ambiguities and to develop the meaning of previously unknown terms (e.g., neologisms).
Google won’t reveal, however, how the AI system mastered this challenge. SEO experts suggest that RankBrain uses word vectors to transfer search queries into a form that allows computers to interpret the meaning.
In 2013, Google released its open-source machine learning software, Word2Vec, which can be used to translate, measure, and compare semantic relations between words in a mathematical representation. This analysis is based on linguistic text corpora.
In the first step, Word2Vec creates an n-dimensional vector space, in which each word of the underlying text body ('training data') is represented as a vector in order to 'learn' the context between words. N is the number of vector dimensions in which a word is to be displayed. The more dimensions that are chosen for the word vectors, the more relations the program is able to register in relation to other words.
In the second step, the created vector space is fed into an artificial neural network (KNN) that enables it to be adapted by means of a learning algorithm. This means that words that are used in the same context also form a similar word vector. The similarity between word vectors is calculated by the so-called cosine distance as a value between -1 and +1.
In short: if you give Word2Vec an arbitrary text corpus as input, the program delivers corresponding word vectors as output. These enable the semantic proximity of the words to be assessed that are contained in the corpus. If Word2Vec is confronted with new input, the learning algorithm enables the program to adapt the vector space and therefore create new meanings or reject old assumptions: the neural network is 'trained'.