Python 3: Recursively print structured tree including hierarchy markers using depth-first search

Printing a tree in Python is easy if the parent-child relationship should not be visualized as well, i.e. just printing all nodes with an indentation that depends on the level within the tree.

To keep the code easy, let’s first define a simple tree structure by creating a Node class that holds a value x and can have an arbitrary number of child nodes:

class Node(object):
    def __init__(self, x, children=[]):
        self.x = x
        self.children = children

To print all nodes of a tree using depth-first search, only few lines are required:

def printTree(root, level=0):
    print("  " * level, root.x)
    for child in root.children:
        printTree(child, level + 1)

#tree = Node(..., children=[Node(...., ...), Node(...,....)] # See end of the article for a bigger structure that is used for the examples in this article.
printTree(tree)

However, the output can be hard to read. When the tree has more than a few levels, it is challenging to see the relationship between parent and child nodes. A definition of the following tree is given at the end of this article if you want to try it yourself. For now, just focus on the output:

Read More

Lemmatize whole sentences with Python and nltk’s WordNetLemmatizer

Lemmatization is the process of converting words (e.g. in a sentence) to their stemming while respecting their context. For example, the sentence “You are not better than me” would become “You be not good than me”. This is useful when dealing with NLP preprocessing, for example to train doc2vec models. The python module nltk.stem contains a class called WordNetLemmatizer. In order to use it, one must provide both the word and its part-of-speech tag (adjective, noun, verb, …) because lemmatization is highly dependent on context. Read More

Calculate power set (set of all subsets) in Python without recursion

If you want to calculate a set containing all subsets of set (also called power set) you could either choose an recursive approach or try this iterative approach which is faster than the recursive one.

def get_subsets(fullset):
  listrep = list(fullset)

  subsets = []
  for i in range(2**len(listrep)):
    subset = []
    for k in range(len(listrep)):			
      if i & 1<<k:
        subset.append(listrep[k])
    subsets.append(subset)		

  return subsets

subsets = get_subsets(set([1,2,3,4]))
print(subsets)
print(len(subsets))

You can also find a shorter version at the end of the article, but to understand the principle the algorithm above is more suitable.

Read More