How To Find Similarity Between Vectors In Glove Embeddings

Last time, we saw how autoencoders are used to learn a latent embedding space: an alternative, low-dimensional representation of a set of data with some appealing properties: for example, we saw that interpolating in the latent space is a way of generating new examples. In particular, interpolation in the latent space generates more compelling examples than, say, interpolating in the raw pixel space.

The idea of learning an alternative representation/features/embeddings of data is a prevalent one in machine learning. You already saw how we used features computed by AlexNet as a component of a model. Good representations will make downstream tasks (like generating new data, clustering, computing distances) perform much better.

With autoencoders, we were able to learn a representation of MNIST digits. In lab 4, we use an autoencoder to learn a representation of a census record. In both cases, we used a model that looks like this:

  • Encoder: data -> embedding
  • Decoder: embedding -> data

This type of architecture works well for certain types of data (e.g. images) that are easy to generate, and whose meaning is encoded in the input data representation (e.g. the pixels). Such architectures can and has be used to learn embeddings for things like faces, books, and even molecules!

Refer to more articles:  How To Get Mining Gloves Before Lv 60 Mining

But what if we want to train an embedding on words? Words are different from images or even molecules, in that the meaning of a word is not represented by the letters that make up the word (the same way that the meaning of an image is represented by the pixels that make up the pixel). Instead, the meaning of words comes from how they are used in conjunction with other words.

word2vec models¶

A word2vec model learns embedding of words using the following architecture:

  • Encoder: word -> embedding
  • Decoder: embedding -> nearby words (context)

Specific word2vec models differ in the which “nearby words” is predicted using the decoder: is it the 3 context words that appeared before the input word? Is it the 3 words that appeared after? Or is it a combination of the two words that appeared before and two words that appeared after the input word?

These models are trained using a large corpus of text: for example the whole of Wikipedia or a large collection of news articles. We won’t train our own word2vec models in this course, so we won’t talk about the many considerations involved in training a word2vec model.

Instead, we will use a set of pre-trained word embeddings. These are embeddings that someone else took the time and computational power to train. One of the most commonly-used pre-trained word embeddings are the GloVe embeddings.

GloVe is a variation of a word2vec model. Again, the specifics of the algorithm and its training will be beyond the scope of this course. You should think of GloVe embeddings similarly to pre-trained AlexNet weights. More information about GloVe is available here: https://nlp.stanford.edu/projects/glove/

Refer to more articles:  How Many Pairs Of Compression Gloves Do I Need Reynolds

Unlike AlexNet, there are several variations of GloVe embeddings. They differ in the corpus used to train the embedding, and the size of the embeddings.

GloVe Embeddings¶

To load pre-trained GloVe embeddings, we’ll use a package called torchtext. The package torchtext contains other useful tools for working with text that we will see later in the course. The documentation for torchtext GloVe vectors are available at: https://torchtext.readthedocs.io/en/latest/vocab.html#glove

We’ll begin by loading a set of GloVe embeddings. The first time you run the code below, Python will download a large file (862MB) containing the pre-trained embeddings.

Related Posts

How Big Is Yoenis Cespedes Glove

How Big Is Yoenis Cespedes Glove

Player Profile: Yoenis Cespedes Before I get started, I just want to say sorry for not having posted for the last few days. I was in Boston…

How To Buy Softball Glove

Younger players buying a new mitt should look for a softer mitt that they can squeeze and close. Most young players also should look for lightweight options…

How Much Are Old Baseball Gloves Worth

GUEST: They’ve come from different places over a number of years, but mostly flea markets- this one definitely a flea market- or a garage sale.You may be…

Do Caregivers Wear Gloves When Assisting With Showers

Do Caregivers Wear Gloves When Assisting With Showers

When helping a client take a bath or shower, watch out for muscle strains and sprains from lifting, transferring, and reaching. Precautions must be taken for possible…

How To Keep Golf Glove Dry

Hand washing tops my list when we think of dependable methods of cleaning our cherished golf gloves. It’s a gentle process that goes a long way in…

How To Get Gloves Dave The Diver

How To Get Gloves Dave The Diver

Dave the Diver has a large number of ingredients for you to collect in the ocean, but some of them, such as the Purple Sea Urchin, are…