Python代写|AI, Ethics, and Society Homework Project #4


In this assignment, you’ll apply AI/ML algorithms related to two applications – word embeddings and
facial recognition.

Task Set #1: Here you will use distributional vectors trained using Google’s deep learning
Word2vec system.

1. Familiarize yourself with the original paper on word2vec – Mikolov et al. (2013)
compositionality.pdf). To learn more about the system and how to train your own vectors, you can
find more information here ( To learn about the
python wrapper around Word2vec, you can find more information here (https://rare

2. Install Gensim (Example: pip install gensim. | pip install –upgrade gensim)

3. Download the provided reducedvector.bin file on Canvas which is a a pre-trained Word2vec
model based on the Google News dataset (

from gensim.models import Word2Vec
import gensim.models
import nltk
newmodel = gensim.models.KeyedVectors.load_word2vec_format(<path to
reducedvector.bin>, binary=True)

4. We can compute similarity measures associated with words within the model. For example, to find
different measures of similarity based on the data in the Word2vec model, we can use:

# Find the five nearest neighbors to the word man
newmodel.most_similar(‘man’, topn=5)

# Compute a measure of similarity between woman and man
newmodel.similarity(‘woman’, ‘man’)

5. To complete analogies like man is to woman as king is to ??, we can use:
newmodel.most_similar(positive=[‘king’, ‘woman’], negative=[‘man’], topn=1)

Q1: We will use the target words – man and woman. Use the pre-trained word2vec model to rank the
following 15 words from the most similar to the least similar to each target word. For each word-target
word pair, provide the similarity score. Provide your results in table format.


Q2: The Bigger Analogy Test Set (BATS) Word analogy task has been one of the standard benchmarks
for word embeddings since 2013 ( ). A) Select any file from the
downloaded dataset ( For each row in your selected file, choose a target word from the
row and provide the measure of similarity between your target word and the other words on the row
(Remember to document the file used). B) Think of three words that identify membership in one of the
protected classes (choose only one class): race, color, religion, or national origin. For each row in your
selected BATS_3.0 file, compute the similarity between your target word and each of your three words.
Indicate when there are noticeable differences in the similarity scores based on membership in the
protected class. Provide your results in table format.