Python advice required

Greetings,
I’m messing around and learning Python for the first time.
I know this is not a realistic project but please bear with me.

So basically this code takes a wordlist and permutatates it “N” number of times, and then writes the output into a “powerset.txt” file.
My question is how could I prevent the generation* of numbers that aren’t let’s say Length=4 example-> (0,0,0,0) or (3,0,2,1)

*I don’t wish to clean the list after they are already created.

from itertools import permutations
import os

# GET FILE
script_dir = os.path.dirname(os.path.realpath(__file__))
wordlist_rel_path = "list.txt"
wordlist_abs_file_path = os.path.join(script_dir, wordlist_rel_path)

# READ WORD LIST FROM FILE
word_list = []
print ("do you work 2:", wordlist_abs_file_path,"\n") #test
with open(wordlist_abs_file_path) as wordlist:
     for line in wordlist:
         word_list.append(line.rstrip())

# GENERATE POWERSET
powerset_list = []
print ("do you work 3:") #test
for n in range(1, len(word_list)+1):
     for perm in permutations(word_list, n):
         powerset_list.append( "".join(perm) )
print(powerset_list)

# WRITE LIST TO FILE
powerset_rel_path = "powerset.txt"
powerset_abs_file_path = os.path.join(script_dir, powerset_rel_path)
powerset_abs_file = open(powerset_abs_file_path, 'w')
for item in powerset_list:
     powerset_abs_file.write("%s\n" % item)
powerset_abs_file.close()
print("ok")

Try removing a couple of lines using comprehension:

with open(wordlist_abs_file_path) as f:
  word_list = [line.strip() for line in f]

There’s no reason to define word_list before the with block.

Do all the path string “computation” before starting with lists and math, you could probably remove a couple of lines as well.
e.g. wordlist_filename = os.path.join(script_dir, "wordlist.txt")

Now, when it comes to permutations, (without going into math and detail of whether it’s the most efficient way to generate what you need), you can’t ask it to return sets longer than what you have on input, so length of returned permutations should go from 1 to len(word_list), and not to len(wordlist)+1 (i.e. don’t permutate it n+1 times). Also, you could use inlining and comprehension to make it shorter.

for n in range(1, len(word_list)):
  powerset_list.extend("".join(perm)
                       for perm in permutations(word_list, n))

If I run this with a word list containing four line separated digits, I get every single, double, triple, and quadruple digit permutation. If you only want the quadruple digit permutations(if I understood correctly). Then I’d omit the outer loop and set the permutation length to a fixed value.

# GENERATE POWERSET
powerset_list = []
print ("do you work 3:") #test
for perm in permutations(word_list, 4):
    powerset_list.append( "".join(perm) )
print(powerset_list)

If you need flexibility later instead of hard coding, it could be a function.

def generate(word_list, length):
    """Generate permutations of the passed length."""
    powerset_list = []
    for perm in permutations(word_list, length):
        powerset_list.append( "".join(perm) )
    return powerset_list

The style could definitely be more “pythonic,” but since you’re new to it, don’t worry about it much.

One of my favorite sections of the Python documentation is the itertools recipes:
https://docs.python.org/3/library/itertools.html#itertools-recipes

1 Like

Sorry for the late response, our ISP had problems and was down :confused:
Wow ok, thank you all for the advice, I’ll play around for a bit more and study some more.
@dinlotty : I tried that but then this happened
Lets say my word list contains 40 numbers(0-40), and if I set the N=4 I would get this result for example ->
0123 ****start
0132
0321
.
.
.
0123
0132
0321 ****end of file

The number would repeat :frowning:

There’s a couple cases I found that resulted in duplicate entries in powerset_list.

  1. Duplicate entries in list.txt
  2. Using the current empty string join method.
    powerset_list.append( "".join(perm) )
    This would result in “301284” being indistinguishable as “30,12,8,4” and “30,1,28,4”. A simple solution is to use:
    powerset_list.append( ",".join(perm))

Other than that, I used the following code to check for duplicates:

# GENERATE POWERSET
print ("do you work 3:") #test
powerset_list = []
for perm in permutations(word_list, 4):
    powerset_list.append( ",".join(perm) )
print(powerset_list)

duplicates = len(powerset_list) - len(set(powerset_list))
print("# of duplicate permutations: {}".format(duplicates))

Turning a list into a set, eliminates duplicate entries, so checking the difference in length shows how many duplicates there are.

When using a comma separated join and using a word list of 0-40 and a length of 4, no duplicates were found.

If the entire list looks like it has repeated. Make sure you haven’t accidentally changed the mode on the file open to “a” for append.

Python has set comprehensions you could use instead of the list comprehension, to eliminate duplicates from the input file.

1 Like

@dinlotty yes, you are correct, I was talking from memory rerunned the script and everything was ok.
I later found why I thought I got “duplicates”
http://shrani.si/f/3U/Dd/7bo64Wq/1/1.jpg <- this is a screenshot of the output and I wasn’t using my head and did CTRL+F of 1234 and it found 720 hits but they were false positives, example 12341 if we separate them with a comma you will see why I was mistaken 1,2,3,41. This is just one example but you got the point.

1 Like