Using A.I. to generate meaningful passwords

I’ll show you how to make a password generator that gives you just that.

‍s

The concept

An overview of how it works:

Create a dictionary
Use word-vectors to generate a meaning-aware dictionary
filter the list to keep only the words of the desired lengths
pos-tag the words, and group them by tag
convert adjectives to adverbs and the other way around, for more balance between words types
Generate a password
get a random number-verb-adjective-noun or verb-number-adjective-noun combination from the dictionary
pluralize the words where needed
add semi-random upper-casing All of the above would be a bit much to cover in one blog post, so in this tutorial, we’ll create a simplified version of meaningful-passwords.

What we’ll build

Create a dictionary
Use word-vectors to generate a meaning-aware dictionary
filter the list to keep only the words of the desired lengths
Generate a password
combine three random words from the dictionary

The DictionaryBuilder

context.json

First, we create a file context.json that contains the words on which we’ll base our dictionary. It should look something like this: { “similar”: [ “awesome”, “helpful” ], “negative”: [ “bad”, “urine” ], “wordlist”: [ “awesome”, “dope”, “phat”, “great”, “nice”, “pretty”, “humble”, “friend”, “sister”, “helping”, “helpful”, “supportive”, “good”, “interesting”, “beautiful”, “rich”, “amazing”, “happy”, “tasteful”, “brave”, “bravery”, “magnificent” ]}

The logic:

Then we create a file called dictionary_builder.py, this is where we will create the DictionaryBuilder class: import gensim.downloader as apiclass DictionaryBuilder: def __init__(self, num_similar_words=100, vector_type=“glove-wiki-gigaword-50”): self.model = api.load(vector_type) self.num_similar_words = num_similar_words Now we create a function called create_dictionary which will find a list of relevant words: def create_dictionary(self): with open(‘context.json’, ‘r’) as file: context = json.load(file) dictionary = {} for word in context[‘wordlist’]: if word in self.model: dictionary = {**dictionary, **dict( self.model.most_similar(positive=[self.model[word]] + context[‘similar’], negative=context[‘negative’], topn=self.num_similar_words))} else: print(word, “is not in the word-vector model, skipping”) return self.clean(dictionary) What we just did:

We load context.json and store the lists in separate variables
Combine each word in wordlist with the words in similar and negative. This creates an offset in the meaning of the word for which a given amount of similar words is found.
Store these in an intermediate dictionary.
Skip words that are not found in the model.
Filter out some words that we don’t want, using clean, we’ll implement it soon create_dictionary uses json, let’s import it. Also, we’ll add min_length and max_length to the constructor and import re: import jsonimport reimport gensim.downloader as apiclass DictionaryBuilder: def __init__(self, num_similar_words=100, min_length=4, max_length=10, vector_type=“glove-twitter-25”): self.pos_tagged_sets = {“adj”: set(), “noun”: set(), “verb”: set()} self.model = api.load(vector_type) self.num_similar_words = num_similar_words self.min_length = min_length self.max_length = max_length Now we still have strange characters in our words and we still have very short and very long words. To filter those out we’ll create a function called clean: def clean(self, dictionary): dictionary = dictionary.keys() dictionary = [ word for word in dictionary if self.min_length <= len(word) <= self.max_length and re.match(’^[a-zA-Z]*$’, word) ] return dictionary Now let’s write the dictionary to a file: def write_dictionary(self): dictionary = self.create_dictionary() with open(‘dictionary.json’, ‘w’) as outfile: json.dump(dictionary, outfile) return dictionary That’s it for the DictionaryBuilder, it can generate dictionaries with words with meanings influenced by context.json, let’s move on to generating passwords.

Generating passwords

We can generate simple passwords from this dictionary like this: from random import choicetry: with open(‘dictionary.json’, ‘r’) as file: dictionary = json.load(file)except IOError: print(“generating dictionary”) builder = DictionaryBuilder() dictionary = builder.write_dictionary()print(choice(dictionary)+’-‘+choice(dictionary)+’-‘+choice(dictionary)) Putting all that together we get: import jsonimport refrom random import choiceimport gensim.downloader as apiclass DictionaryBuilder: def __init__(self, num_similar_words=100, min_length=4, max_length=10, vector_type=“glove-twitter-25”): self.pos_tagged_sets = {“adj”: set(), “noun”: set(), “verb”: set()} self.model = api.load(vector_type) self.num_similar_words = num_similar_words self.min_length = min_length self.max_length = max_length def write_dictionary(self): dictionary = self.create_dictionary() with open(‘dictionary.json’, ‘w’) as outfile: json.dump(dictionary, outfile) return dictionary def create_dictionary(self): with open(‘context.json’, ‘r’) as file: context = json.load(file) dictionary = {} for word in context[‘wordlist’]: if word in self.model: dictionary = {**dictionary, **dict( self.model.most_similar(positive=[self.model[word]] + context[‘similar’], negative=context[‘negative’], topn=self.num_similar_words))} else: print(word, “is not in the word-vector model, skipping”) return self.clean(dictionary) def clean(self, dictionary): dictionary = dictionary.keys() dictionary = [ word for word in dictionary if self.min_length <= len(word) <= self.max_length and re.match(’^[a-zA-Z]*$’, word) ] return dictionarytry: with open(‘dictionary.json’, ‘r’) as file: dictionary = json.load(file)except IOError: print(“generating dictionary”) builder = DictionaryBuilder() dictionary = builder.write_dictionary()print(choice(dictionary)+’-‘+choice(dictionary)+’-‘+choice(dictionary)) Now let’s install gensim, and run the password generator! $ pip instal gensim$ python simple_password_generator.pygenerating dictionarycompassion-sincerity-winning This should take about 30 seconds. After the dictionary has been created, consecutive runs will be faster: $ python simple_password_generator.pyeveryone-goodnight-heavenly Now let’s try it with a totally different context.json { “similar”: [ “monkey” ], “negative”: [ “engineering” ], “wordlist”: [ “flower”, “dog”, “leopard”, “elephant”, “jungle”, “water”, “river”, “mountain”, “human”, “insect”, “butterfly”, “termite”, “ant”, “cat”, “lion” ]} Delete dictionary.json and run the generator $ rm dictionary.json$ python simple_password_generator.pygenerating dictionaryotter-oversized-dixie$ python simple_password_generator.pycali-shepherd-voices Great! It works, we have meaningful passwords.

Improving the algorithm

The passwords can still be improved in the following ways:

Create a dictionary
pos-tag the words, and group them by tag
convert adjectives to adverbs and the other way around for more balance between the word types
Generate a password
get a random number-verb-adjective-noun or verb-number-adjective-noun combination from the dictionary
pluralize the words where needed
add semi-random uppercasing These improvements have been implemented in the open-source project on which this tutorial is based. You can refer to the source code.

Conclusion

You’ve learned how to:

use gensim to load a word-vector model based on Twitter
use word-vectors to generate a meaning-aware dictionary.
clean the dictionary to get rid of strange characters, short and long words
generate passwords based on the dictionary ‍