Python 3: Recursively print structured tree including hierarchy markers using depth-first search

Printing a tree in Python is easy if the parent-child relationship should not be visualized as well, i.e. just printing all nodes with an indentation that depends on the level within the tree.

To keep the code easy, let’s first define a simple tree structure by creating a Node class that holds a value x and can have an arbitrary number of child nodes:

class Node(object):
    def __init__(self, x, children=[]):
        self.x = x
        self.children = children

To print all nodes of a tree using depth-first search, only few lines are required:

def printTree(root, level=0):
    print("  " * level, root.x)
    for child in root.children:
        printTree(child, level + 1)

#tree = Node(..., children=[Node(...., ...), Node(...,....)] # See end of the article for a bigger structure that is used for the examples in this article.
printTree(tree)

However, the output can be hard to read. When the tree has more than a few levels, it is challenging to see the relationship between parent and child nodes. A definition of the following tree is given at the end of this article if you want to try it yourself. For now, just focus on the output:

1
   1.1
     1.1.1
       1.1.1.1
       1.1.1.2
     1.1.2
       1.1.2.1
       1.1.2.2
       1.1.2.3
         1.1.2.3.1
     1.1.3
   1.2
     1.2.1
     1.2.2
   1.3
   1.4
     1.4.1
     1.4.2

In that case it’d be better to print connections between parent nodes and their direct child nodes:

1
+- 1.1
|  +- 1.1.1
|  |  +- 1.1.1.1
|  |  +- 1.1.1.2
|  +- 1.1.2
|  |  +- 1.1.2.1
|  |  +- 1.1.2.2
|  |  +- 1.1.2.3
|  |     +- 1.1.2.3.1
|  +- 1.1.3
+- 1.2
|  +- 1.2.1
|  +- 1.2.2
+- 1.3
+- 1.4
   +- 1.4.1
   +- 1.4.2

Changing printTree to include these connections is not as easy as you might expect. The reason is that the kind of markers on line n can depend on markers on lines > n. In the example above, there are “|” in front of nodes 1.2.1 and 1.2.2 but not in front of 1.4.1 and 1.4.2. This is because 1.2 has a sibling (1.3) which is printed after recursing into 1.2.*. But 1.4 has no siblings (it is the last child of node 1), so 1.4 should not be connected to any following node.

Similarly, node 1.1.2.3 is the last child of 1.1.2. That is why there is no “|” on the third level in front of 1.1.2.3.1. But there are “|” markers on the first two levels. That’s because 1.1.2 has a later child (1.1.3) and the same is the case for 1.1.

I wanted to find a solution that can draw the entire tree including all markers in one pass. After many tries and inefficient solutions, I was able to achieve my goal. My algorithm uses a depth-first search approach and keeps records about which markers to print on each level when recursing further down by passing a boolean array. This array is constructed based on the information if a node is the last child of its parent or not. See the algorithm below and the comments for details!

I’ll first show the new version of printTree without comments so that it is easier to compare the complexity.

def printTree(root, markerStr="+- ", levelMarkers=[]):
    emptyStr = " "*len(markerStr)
    connectionStr = "|" + emptyStr[:-1]

    level = len(levelMarkers)
    mapper = lambda draw: connectionStr if draw else emptyStr
    markers = "".join(map(mapper, levelMarkers[:-1]))
    markers += markerStr if level > 0 else ""
    print(f"{markers}{root.x}")

    for i, child in enumerate(root.children):
        isLast = i == len(root.children) - 1
        printTree(child, markerStr, [*levelMarkers, not isLast])

As you can see, only to add the connections to the output, the code length has almost tripled. The same function with comments is shown now for better understanding:

def printTree(root, markerStr="+- ", levelMarkers=[]):
    """
    Recursive function that prints the hierarchical structure of a tree including markers that indicate
    parent-child relationships between nodes.

    Parameters:
    - root: Node instance, possibly containing children Nodes
    - markerStr: String to print in front of each node  ("+- " by default)
    - levelMarkers: Internally used by recursion to indicate where to
                    print markers and connections (see explanations below)
    
    Example output:

    1
    +- 1.1
    |  +- 1.1.1
    |  |  +- 1.1.1.1
    |  |  +- 1.1.1.2
    |  +- 1.1.2
    |  |  +- 1.1.2.1
    |  |  +- 1.1.2.2
    |  |  +- 1.1.2.3
    |  |     +- 1.1.2.3.1
    |  +- 1.1.3
    +- 1.2
    |  +- 1.2.1
    |  +- 1.2.2
    +- 1.3
    +- 1.4
       +- 1.4.1
       +- 1.4.2
    """
    
    # Printed in front of nodes when there should be no connection or marker
    # See for example nodes 1.1.2.3.1, 1.4.1 and 1.4.2 above
    emptyStr = " "*len(markerStr)
    
    # Printed in front of nodes that are connected by a parent that is 
    # not the direct parent.
    # See for example nodes 1.2.1 and 1.2.2. The "|  " in front of them 
    # indicates that their parent 1.2 has a sibling (1.3) and they are 
    # connected. This string is NOT drawn for the last child of a node.
    # For example, node 1.4 is the last child of node 1 and it has two 
    # children (1.4.1 and 1.4.2). Drawing the connection string in front 
    # of 1.4.1 and 1.4.2 would be wrong as it would indicate that there 
    # is another child of node 1 coming after 1.4.2.
    connectionStr = "|" + emptyStr[:-1]

    level = len(levelMarkers)   # recursion level
    # levelMarkers indicates for each level if the empty string or the 
    # connection string should be printed. The last element of levelMarkers 
    # is ignored in this recursion step because we know that a new 
    # node has to be drawn (root). The last levelMarker will be used in
    # deeper recursions.
    mapper = lambda draw: connectionStr if draw else emptyStr
    markers = "".join(map(mapper, levelMarkers[:-1]))
    markers += markerStr if level > 0 else ""
    print(f"{markers}{root.x}")

    # After root has been printed, recurse down (depth-first) the child nodes.
    for i, child in enumerate(root.children):
        # The last child will not need connection markers on the current level 
        # (see example above)
        isLast = i == len(root.children) - 1
        printTree(child, markerStr, [*levelMarkers, not isLast])

If you want to try it yourself, here is the tree that I used in this article to produce the example outputs:

tree = Node(
    "1",
    children=[
        Node(
            "1.1",
            children=[
                Node(
                    "1.1.1",
                    children=[
                        Node("1.1.1.1"),
                        Node("1.1.1.2")
                    ]
                ),
                Node(
                    "1.1.2",
                    children=[
                        Node("1.1.2.1"),
                        Node("1.1.2.2"),
                        Node(
                            "1.1.2.3",
                            children=[
                                Node("1.1.2.3.1")
                            ]
                        ),
                    ]
                ),
                Node("1.1.3"),
            ]
        ),
        Node(
            "1.2",
            children=[
                Node("1.2.1"),
                Node("1.2.2")
            ]
        ),
        Node("1.3"),
        Node(
            "1.4",
            children=[
                Node("1.4.1"),
                Node("1.4.2")
            ]
        )
    ]
)

I’m sure that would be an interesting problem for coding interview! If you know of an easier way, e.g. one that avoids carrying the levelMarkers array, please let me know in the comments!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.