Final Examples - Wed, Dec 4

Trees & Tree Processing

Tree-Structured Data

We first built trees using data abstraction.

def tree(label, branches=[]):
	return [label] + list(branches)

def label(tree):
    return tree[0]

def branches(tree):
    return tree[1:]

We later built trees using object-oriented programming.

class Tree:
    def __init__(self, label, branches=[]):
        self.label = label
        self.branches = list(branches)
    
    def is_leaf(self):
        return not self.branches

The reason trees aren't built into Python is because trees on their own are irrelevant. However, different versions of trees are used throughout computing for various tasks. Therefore, tree structures are generally built for specific use cases.

For example, we saw that all Scheme expressions are tree-structured. We also saw, in Monday's lecture, that NLP can be performed by representing sentences as trees. An additional example is HTML, which is a hierarchical language, which can be represented as trees as well.

Tree processing often involves recursive calls on subtrees. The reason recursion is discussed so heavily in this class is because tree-structured data is prevalent in CS, and processing these CS almost always involves recursion.

Solving Tree Problems

Implement bigs, which takes a Tree instance t containing integer labels. It returns the number of nodes in t whose labels are larger than all labels of their ancestor nodes.

def bigs(t):
    """Return the number of nodes in t that are larger than all their ancestors.
    
    >>> a = Tree(1, [Tree(4, [Tree(4), Tree(5)]), Tree(3, [Tree(0, [Tree(2)])])])
    >>> bigs(a)
    4
    """
    ## YOUR CODE HERE ##

A good first step is to ignore any starter code and try to understand what the problem is asking. A good second step is to draw out the tree(s) in the doctest(s) to get a clearer image of what you're dealing with.

Drawing out the tree, we see that the results should be 1, both the branches of 1 (4 and 3), and the branch 5 of 4. The total is 4, which is what our doctest says as well.

The root label is always larger than its ancestors. We need to keep track of the ancestors.

The standard tree processing procedure looks like this:

if t.is_leaf():
    return ___
else:
    return ___([___ for b in t.branches])

But this doesn't work here. The standard procedure gathers information on the children of a branch, but we need to use the parents of a branch.

def bigs(t):
    def f(a, x):
        if a.label > x:
            return 1 + sum([f(b, a.label) for b in a.branches])
        else:
            return sum([f(b, x) for b in a.branches])
    return f(t, t.label - 1)

Because Denero picks the starter code variable names for you, it's a good idea to note that f is the helper, a represents a node, and x represents the largest_ancestor.

Recursive Accumulation

Let's do the same problem with recursive accumulation.

def bigs(t):
    n = 0
    def f(a, x):
        nonlocal n
        if a.label > x:
            n += 1
        for b in a.branches:
            f(b, max(a.label, x))
    f(t, t.label - 1)
    return n

Again, it's a good idea to note that n is the number_of_bigs.

Designing Functions

From Problem Analysis to Data Definitions

Identify the information that must be represented and how it is represented in the chosen programming language. Formulate data definitions and illustrate them with examples.

Signature, Purpose Statement, Header

State what kind of data the desired function consumes and produces. Formulate a concise answer to the question what the function computes. Define a stub that lives up to the signature.

Functional Examples

Work through examples that illustrate the function’s purpose.

Function Template

Translate the data definitions into an outline of the function.

Function Definition

Fill in the gaps in the function template. Exploit the purpose statement and the examples.

Testing

Articulate the examples as tests and ensure that the function passes all. Doing so discovers mistakes. Tests also supplement examples in that they help others read and understand the definition when the need arises—and it will arise for any serious program.

For more on how to design functions, see How to Design Programs.

Application

Implement smalls, which takes a Tree instance t containing integer labels. It returns the non-leaf nodes in t whose labels are smaller than all labels of their descendant nodes.

def smalls(t):
    """Return the non-leaf nodes in t that are smaller than all their descendants.
    
    >>> a = Tree(1, [Tree(2, [Tree(4), Tree(5)]), Tree(3, [Tree(0, [Tree(6)])])])
    >>> sorted([t.label for t in smalls(a)])
    [0, 2]
    """
    result = []
    def process(t):
        ## YOUR CODE HERE ##
    process(t)
    return result

Let's start with a base case.

def process(t):
    if t.is_leaf():
        return t.label
    else:
        ## YOUR CODE HERE ##

The reason for this return value is that we're not going to include it in the result. However, the value of this is still required for process to work properly -- recursively, process uses the results of calling process, though smalls never uses the result of the first call to process, which would return the smallest label in t.

def process(t):
    if t.is_leaf():
        return t.label
    else:
        smallest = min([process(b) for b in t.branches])
        if t.label < smallest:
            result.append(t)
        return min(smallest, t.label)