A collection of computer, gaming and general nerdy things.

Sunday, February 8, 2015

Musings on Problem Solving

I just started reading a book on algorithms and data structures in Python. I've also commited myself to reading every chapter, taking notes and doing every exercise at the end of each chapter (okay, Chapter 1 is all "This is Python, this is the syntax, here's how you loop..." so I skimmed it) except for the ones I can't reasonably fit in a single IPython Notebook cell (I'm not trying shove a calculator into a notebook) or are just plain boring ("Print a statement 100 times!" -- seriously, this is an exercise).

Most of the problems are okay, made me think some. Some are ones I've done before. Others made me think about how to decompose the problem into smaller, more managable bits. And I'd like to share my process on one in particular.

Our Task

To paraphrase the exercise: We want to accept three integer inputs and determine if either a = f(b,c) or f(a,b) = c, where f is some mathematical function (maybe addition, modulo but log and pow are candidates as well). Right now, if you feel this is a complex challenge, I want you attempt to code it and come back and see how your code and my code differs. If you get stuck, don't feel bad and come along for the ride.

The Steps

The first thing I like to try to do is break a problem down into steps. If a step is ambigous, I try to decompose the step into smaller bits. Eventually, I have kinda procedural pseudo-code.

  1. Collect Operations
  2. Collect Integers
  3. Map inputs to operations
  4. Check if at least one operation returned true
  5. Put the whole thing together.

To me, those are pretty clear cut, atomic steps. Sure, they're composed of smaller steps internally, but they represent a single unit of work each function of the program needs to do. Step 5 is usually implied, but I find it's nice to list anyways.

Step 1: Collect Operations

We'll just make a list of all the operations we want to check against. Since Python has first class functions (meaning functions can be treated like data), this is really easy. We'll also make a simple root function for calculating things like cubic roots as well.

In [11]:
from operator import add, sub, mul, mod, truediv
from math import log

# roots are just fractional exponentials when you get down to it
# ...or maybe exponentials are fractional roots...
def root(base, power=2):
    return base ** (1/power)

operations = [add, sub, mul, mod, truediv, log, root, pow]

Super simple. Now, let's move on.

Step 3: Map Inputs To Operations

"Alec," you say, "You skipped a step!" And I did. Because I/O is nasty and I'd like to do it last. For now, let's assume the integers we collected are 1, 2 and 3.

In [3]:
integers = [1,2,3]

For this step we need to make our operation to a potential output. Basically: f(a, b) == c. Again, this is easy. Even if we decide to check if a == f(b,c) in the same step. Like I said before, Python's functions can be treated just like data and this includes passing them as arguments to other functions.

In [4]:
def either_side_true(op, a, b, c):
    return a == op(b, c) or op(a,b) == c

Step 4: Determine if any check returned True

There's many ways we could do this. But the key here is the word any -- which happens to be a Python built-in function. What any does is accept an iterable of values and checks if at least one of them is True. There's also its sibling all which checks that none of the values are False. What's really nice about any and all is they allow collapsing a bunch of and/or groupings in an if statement into a single expression.

Say you wanted to check if any variable in a given group was equal to something. You might do (ignoring that you'd actually have a list and do if 1 in list_of_vars:)...

In [6]:
a = 2
b = 1
c = 5

if a == 1 or b == 1 or c == 1:
    print("We've got a winner!")
We've got a winner!

...but with any we can simply do this...

In [7]:
if any(v == 1 for v in [a,b,c]):
    print("I said We've got a winner!")
I said We've got a winner!

Of course, the true power of any and all comes when you combine them with generators, which will allow you to lazily compute your check. These aren't a catch all, sometimes you want to store the output of a check or you already have a list built, in which case x in my_list is superior, but these should be tools you consider from time to time.

Now, we want to check if any of our operations returns True if a == f(c,b) or c == f(a,b).

In [8]:
if any(either_side_true(op, *integers) for op in operations):
    print("Heck yeah at least one did.")
Heck yeah at least one did.

Alternatively, we could store that as a separate function if we needed to fiddle with it more, but for this case, I feel that's overkill.

Step 2: Collect Integers

Okay, now we'll worry about actually grabbing some input from the user. We need to collect three integers, probably using STDIN. But they could come from a file, a database or anywhere. But let's focus on just grabbing them from STDIN.

In [9]:
def collect_ints_from_stdin(n=3):
    ints = []
    
    while len(ints) < n:
        try:
            v = int(input("Please type an integer: "))
        except ValueError:
            continue
        else:
            ints.append(v)
    
    return ints

Normally, I'd wince at a while loop used to build a fixed length list, but we have to trouble with users doing something like trying to enter the number a (well...you could use hex...but who does that?). Since we're just passing on the error, our list of integers could end up short!

You might also notice that I've used try/else. I am of the opinion that a try block should be isolated to just the code that could raise an exception. except handles cleaning up the exception. And then there's else which allows us to run code on the condition we didn't encounter a problem. Mentally, if you replace except ValueError with if catch(ValueError) where catch is some magic framehack or something, it begins to make sense. I'll also note, for completion's sake that there's also an optional finally clause which lets us run code regardless of an exception occuring. But the full try/except/else/finally concept is for another time.

Step 5: Put it altogther

Finally, we'll take these individual components and piece them together into a cohesive answer.

In [12]:
def answer(operations, num_ints=3, collector=collect_ints_from_stdin):
    ints = collector(num_ints)
    if any(either_side_true(op, *ints) for op in operations):
        print("Heck yeah at least one did!")
    else:
        print("D: Nope!")

If you add if __name__ == "__main__": answer(operations) and count spacing and imports, the whole thing's less than forty lines!

The Take Away

  1. Break a problem down into steps as small as possible
  2. Small, functional pieces are more manageable

Breaking down a problem is something that seems so obvious but it's a skill that needs to be practiced and honed just like any other. After all, programming isn't so much about code as solving problems. And part of solving problems is finding the smaller problems hidden inside them.

What looked liked a complicated task on the surface, ended up being just five simple tasks on the inside. That's not to say that every problem will immediately break down to simple tasks. Sometimes you'll end up with a list of steps that's nested five levels deep! But if you tackle each step one at a time, sprinkle in some Python idioms and a touch of abstraction, your application will come together.

The other, writing smaller more functional code pieces allows your code to be modular. What if tomorrow, we want to grab the integers from a database? Well, since collect_ints_from_stdin is separated from the main function, we can easily. Just build a callable that queries the database for integers and feeds them into the operations. Or if we need to change from integers to strings, we just swap out the collector and the operations and move on with our day -- or better, pass a callback to coerce the type for us!

But consider if we had just written a thirty line function that did all this? Ugh, it'd be a nightmare to untangle the pieces. Honestly, we'd probably have two functions that did the same thing, but just grabbed data from other locations. And then two more functions because maybe we need to work on strings. Quickly, this spirals out of control as the number of functions grows at (Data Types) * (Sources). We have four sources and four data types? That's 16 functions! I don't want to write the same function 16 times and I doubt you do either.

Instead we can write four functions to grab the data, pass our coercion callback and maintain the lists of operations.

Going Further

There are problems. What happens when we input 0 and our checks make it to the division, modulo, log or root operations? We get a nasty exception. :( But handling it is easy (hint: ZeroDivisionError and ValueError would needed to be caught). Also, we should be able to tell the user which operations returned true.

Of course, I'm not going to play my full hand. I'll leave these as an exercise to you to figure out.