Lab 2: Breadth-First Search

Objective

To create Python script to execute breadth first search algorithm.

Problem to be solved

The search problem we will focus on during this lab is Nick’s route-finding problem in Romania, starting in Arad to reach Bucharest. The road map of Romania, which is the state space of the problem is given as follows:

Translating the state space

First we need to define the state space in our problem.
The important information from the state space is the connections between states and the step costs between connected states.
Notice that in this problem the connections are reversible. Therefore in our state space the connection from Arad to Zerind and the connection from Zerind to Arad are identical, hence only one instance of that connection is needed.
The most straightforward way of defining the state space is by using a nested array, in which each inner array consists of the two connected states and its cost. Open a script file and define the variable state_space. The following code shows the first three elements in the nested array. Complete the definition of the variable by referring to the state space provided.
```
state_space = [
  ['Arad', 'Zerind', 75],
  ['Arad', 'Timisoara', 118],
  ['Timisoara', 'Lugoj', 111],
  ...
]
```
Define two variables initial_state and goal_state to hold the information from the problem.
```
initial_state = 'Arad'
goal_state = 'Bucharest'
```

Actions and transition model

Actions and transition model provide us the way to identify the children of a node in a search tree.
In this problem, the actions and transition model are straightforward. The actions are limited by the states connected to node and the transition model is directly defined by the action. Therefore we can create a function to search through the state space to find the children of a node, which would be suffice as actions and transition model.

To achieve this, we will loop through the state_space to check for any connection linked to the node, the other state on the connection will be the child of the node in the search tree. In the same script file, define the following function:

def expandAndReturnChildren(state_space, node):
  children = []
  for [m,n,c] in state_space:
    if m == node:
      children.append(n)
    elif n == node:
      children.append(m)
  return children

This function provides the children of a node given the state_space of the problem.

Breadth-first search algorithm

Next we will create a function to execute the search algorithm. The first algorithm we will use is the breadth-first search (BFS).
As we want to separate the problem definition from the algorithm definition, the algorithm function will have the inputs of the state space, initial state and goal state.
```
def bfs(state_space, initial_state, goal_state):
```

In this function, we need two empty arrays to store the frontier and the explored states.

def bfs(state_space, initial_state, goal_state):
  frontier = []
  explored = []

At initial stage, the initial state is our frontier.

def bfs(state_space, initial_state, goal_state):
  frontier = []
  explored = []
  frontier.append(initial_state)

When we are generating the search tree, we will continually expanding the first node (FIFO) in the frontier until we generate a node with goal state. Therefore we need to have a loop to repeatedly expanding the first node in the frontier until the goal node is generated.
```
def bfs(state_space, initial_state, goal_state):
  frontier = []
  explored = []
  found_goal = False
  frontier.append(initial_state)

  while not found_goal:
    ...
```

In the loop we will first expand the first node in the frontier to obtain the children, and move the expanded node from frontier to the explored set.

  while not found_goal:
    # expand the first in the frontier
    children = expandAndReturnChildren(state_space, frontier[0])
    # copy the node to the explored set
    explored.append(frontier[0])
    # remove the expanded node from the frontier
    del frontier[0]

Check if the children generated should be added to the frontier or discarded. If any child is expanded previously, i.e. in the explored set, or if it is generated previously but not expanded yet, i.e. in the frontier, then it should be discarded. Else, it should be added to the frontier.

    del frontier[0]
    # loop through the children
    for child in children:
      # check if a node was expanded or generated previously
      if not (child in explored) and not (child in frontier):
        # add child to the frontier
        frontier.append(child)

Before a child is added to the frontier, it should be tested for goal. If it has the goal state, the BFS algorithm should in fact be terminated and return the solution.

      if not(child in explored) and not (child in frontier):
        # goal test
        if child == goal_state:
          found_goal = True
          break
        # add child to the frontiers
        frontier.append(child)

Oops, something is not right

Now, if you are following, you may notice that we have a way to end the algorithm but we have failed to store our solution.
One way to store the solution while generating the search tree is to store the frontier as the paths from the initial state to the leaf nodes instead of just the leaf nodes. Therefore we would be able to retain the memory of the explored section of the search tree.

First we need to modify the expandAndReturnChildren function.

def expandAndReturnChildren(state_space, path_to_leaf_node):
  children = []
  for [m,n,c] in state_space:
    if m == path_to_leaf_node[-1]:
      children.append(path_to_leaf_node + [n])
    elif n == path_to_leaf_node[-1]:
      children.append(path_to_leaf_node + [m])
  return children

4. The bfs function is also modified to accommodate to this change.

def bfs(state_space, initial_state, goal_state):
  frontier = []
  explored = []
  found_goal = False
  solution = []
  frontier.append([initial_state])

  while not found_goal:
    # expand the first in the frontier
    children = expandAndReturnChildren(state_space, frontier[0])
    # copy the node to the explored set
    explored.append(frontier[0][-1])
    # remove the expanded frontier
    del frontier[0]
    # loop through the children
    for child in children:
      # check if a node was expanded or generated previously
      if not (child[-1] in explored) and not (child[-1] in [f[-1] for f in frontier]):
        # goal test
        if child[-1] == goal_state:
          found_goal = True
          solution = child
        # add children to the frontier
        frontier.append(child)
    print("Explored: ")
    print(explored)
    print("Frontier:")
    for f in frontier:
      print(f)
    print("Children: ")
    print(children)
    print("")

  return solution

Noted that in line 12 we are only saving the leaf node that has just been expanded to the explored set since the information of the path is unnecessary to be stored in the explored set.
In line 18, [f[-1] for f in frontier] is used to obtain a list of the leaf nodes in the frontier. This is the pythonic way of extracting from a list to form another list. The one-liner is equivalent to
```
leaf_nodes = []
for f in frontier:
  leaf_nodes.append(f[-1])
```
f[-1] is used since currently we are saving the path from initial state to the respective leaf node in the frontier.

Running the algorithm

Now we have two functions expandAndReturnChildren and bfs, alongside with the variables state_space, initial_state, and goal_state.

To run a script to execute the bfs function, we can have the script file structured as such:

def expandAndReturnChildren(...):
  ...

def bfs(...):
  ...

state_space = [
  ...
]

initial_state = 'Arad'
goal_state = 'Bucharest'

print('Solution: ' + str(bfs(state_space, initial_state, goal_state)))

Beware that in Python, a .py file can also be used to define a Python library/module. To prevent the commands outside the function to be executed when the file is used as a library instead of a script, we can implement an extra condition check.

def expandAndReturnChildren(...):
...

def bfs(...):
...

if __name__ == '__main__':
  state_space = [
  ...
  ]

  initial_state = 'Arad'
  goal_state = 'Bucharest'

  print('Solution: ' + str(bfs(state_space, initial_state, goal_state)))

__name__ is a special variable in Python that evaluates to the name of the current module. This variable has the value of '__main__' if it's called as the main program rather than a module or library.
This part is essentially the main function in other programming languages like C++ and Java.
Compile the program and resolve any error.

Redundant storing of states?

When you run the script from previous section, you may wonder, if there is a more efficient way of memory usage instead of saving the path to each leaf node in the frontier. Currently there is a lot of redundant states being stored in the variable frontier.

Using a class

The alternative and more space-efficient way would be to declare each node in the search tree to be an object that stores its name, parent, and children. To do so, we need to first define a class.
```
class Node:
  def __init__(self, state=None, parent=None):
    self.state = state
    self.parent = parent
    self.children = []

  def addChildren(self, children):
    self.children.extend(children)
```
In Python, to define a class, the class needs to have the function __init__ with at least one input self. This function is called when an object is initiated with this class. The internal variable self defines the properties of the object. The input arguments apart from self for the function __init__ are the parameters to be passed when initiating a new object.

In a class, additional functions can also be defined, and these will be the methods for the object of this class.

An example of initiating a new object for this class and use the function is:

a_distinct_node = Node('state name', 'parent of this node')
a_distinct_node.addChildren(['child 1', 'child 2'])

With the new class, we can update the function expandAndReturnChildren to return the children as a list of Node objects.

def expandAndReturnChildren(state_space, node):
  children = []
  for [m,n,c] in state_space:
    if m == node.state:
      children.append(Node(n, node.state))
    elif n == node.state:
      children.append(Node(m, node.state))
  return children

With the availability of the Node class, we can save the nodes as Node objects in the frontier instead of paths to the leaf nodes. When the goal state is found, the goal node is used to trace back to its predecessor (a.k.a. parent, which will be now in the explored set) which is accessible using the property .parent of the goal node. Then the predecessor to the parent of the goal node can be traced similarly. This process is thus repeated until the initial state (with no parent) to obtain the solution and its cost.

def bfs(state_space, initial_state, goal_state):
  frontier = []
  explored = []
  found_goal = False
  goalie = Node()
  solution = []
  # add initial state to frontier
  frontier.append(Node(initial_state, None))

  while not found_goal:
    # expand the first in the frontier
    children = expandAndReturnChildren(state_space, frontier[0])
    # add children list to the expanded node
    frontier[0].addChildren(children)
    # add to the explored list
    explored.append(frontier[0])
    # remove the expanded frontier
    del frontier[0]
    # add children to the frontier
    for child in children:
      # check if a node was expanded or generated previously
      if not (child.state in [e.state for e in explored]) and not (child.state in [f.state for f in frontier]):
        # goal test
        if child.state == goal_state:
          found_goal = True
          goalie = child
        frontier.append(child)
    print("Explored:", [e.state for e in explored])
    print("Frontier:", [f.state for f in frontier])
    print("Children:", [c.state for c in children])
    print("")

  solution = [goalie.state]
  path_cost = 0
  while goalie.parent is not None:
    solution.insert(0, goalie.parent)
    for e in explored:
      if e.state == goalie.parent:
        path_cost += getCost(state_space, e.state, goalie.state)
        goalie = e
        break
  return solution, path_cost

def getCost(state_space, state0, state1):
  for [m,n,c] in state_space:
    if [state0,state1] == [m,n] or [state1,state0] == [m,n]:
      return c

The compiled program will be structured as such. Compile and run the program.

class Node:
  ...

def expandAndReturnChildren(...):
  ...

def bfs(...):
  ...

def getCost(...):
  ...

if __name__ == '__main__':
  state_space = [
    ...
  ]

  initial_state = 'Arad'
  goal_state = 'Bucharest'

  [solution, cost] = bfs(state_space, initial_state, goal_state)
  print("Solution:", solution)
  print("Path Cost:", cost)

Preparation for next week

Fully understand the code as you will have to modify the code for other problem/search algorithms in next labs.
How would you modify the code to run other uninformed search algorithms such as uniform-cost search, depth-first search, etc.? Which part(s) of the code do you need to modify? Think about it before you work on that in Lab 3 next week.