In this blog post I will discuss how to implement Depth-first search (DFS) and Breadth-first search (BFS) in the functional programming language Haskell.
Before we get started by defining types for search problems and algorithms, let us import some useful library functions.
module Search where
import Data.List (find,nub)
import Data.Maybe (listToMaybe,mapMaybe)
import Debug.Trace (trace)
A classic way to show the difference between DFS and BFS uses binary trees:
But search problems and algorithms are more general than binary trees.
What is a search problem?
In general, a search problem is given by:
- a
start
value where we begin our search, - an
expand
method to compute possible new values, - a property
isDone
telling us whether a given value is or suffices as a goal.
data SearchProb a =
SearchProb { start :: a
expand :: a -> [a]
, isDone :: a -> Bool } ,
Here is a very simple search problem: How can we go from 23 to 42 by repeatedly adding 2 or 5?
findNumber :: SearchProb Int
= SearchProb
findNumber = 23
{ start = \n -> concat [ [ n+2, n+5] | n < 42 ]
, expand = (==42) } , isDone
In order to see what is happening and in which order, we use trace
:
findNumber' :: SearchProb Int
= SearchProb
findNumber' = 23
{ start = \n -> trace (show n) concat [ [ n+2, n+5] | n < 42 ]
, expand = (==42) } , isDone
Another example: How can we move from (0,0) to (4,4) in a two-dimensional grid?
(Note that here we use trace
again to print intermediate steps.)
grid :: SearchProb (Int,Int)
= SearchProb start expand isDone where
grid = (0,0)
start = \(x,y) -> trace (show (x,y)) $ [ (x+1,y) | x < 4 ] ++ [ (x,y+1) | y < 4 ]
expand = (== (4,4)) isDone
And another example, with a binary tree:
data BinaryTree a = Empty | Node a (BinaryTree a) (BinaryTree a) deriving (Show)
findInTree :: BinaryTree a -> (a -> Bool) -> SearchProb (BinaryTree a)
= SearchProb start subtrees treeIsDone where
findInTree start isDone Empty = False
treeIsDone Node x _ _) = isDone x
treeIsDone (Empty = []
subtrees Node _ a b) = [a, b]
subtrees (
exampleTree :: BinaryTree String
= Node "Hay"
exampleTree Node "Hay"
(Node "Hay" Empty Empty)
(Node "Needle A" Empty Empty))
(Node "Needle B"
(Node "Hay" Empty Empty)
(Empty)
findNeedle :: SearchProb (BinaryTree String)
= findInTree exampleTree (("Needle" ==) . take 6) findNeedle
What is a Search Algorithm?
Well, something that hopefully solves a search problem. So a search algorithm takes a search problem and then maybe returns a value:
type SearchAlgo a = SearchProb a -> Maybe a
DFS and BFS in Haskell
dfs :: SearchAlgo a
SearchProb start expand isDone) = loop start where
dfs (| isDone x = Just x
loop x | otherwise = listToMaybe $ mapMaybe loop (expand x)
To summarise the definition of loop
: If the current x
is what we wanted to find, then stop with Just x
. Otherwise, apply the expand function to x and then map the loop function over this list.
Here it is crucial that Haskell is lazy: the resulting list expand x
will not be fully evaluated before the loop function is called again.
Instead, loop
will be applied to the first element of expand x
and only if the result is Nothing
the next element will be tried.
(By the way, this means dfs
can work even if we have an infinitely branching search problem where expand x
is an infinite list.)
BFS also uses a loop.
But crucially, the parameter of loop
in bfs
is a list xs
instead of a single point x
.
bfs :: SearchAlgo a
SearchProb start expand isDone) = loop [start] where
bfs (| any isDone xs = find isDone xs
loop xs | otherwise = loop (concatMap expand xs)
DFS and BFS also differ in which function is mapped:
- In DFS we map
loop
over the result ofexpand x
. - In BFS we map
expand
over allxs
and apply loop to the result.
Examples
λ> dfs findNumber
Just 42
λ> bfs findNumber
Just 42
Well, that is not very informative, both algorithms find the same result of course.
But the way they find it is different, and we can see that if we use findNumer'
which prints intermediate steps.
λ> dfs findNumber'
23
25
27
29
31
33
35
37
39
41
43
46
44
Just 42
λ> bfs findNumber'
23
25
28
27
30
30
33
29
32
32
35
32
35
35
38
31
34
34
37
Just 42
Looking at the grid
example we can also see that our bfs implementations is very inefficient.
It visits many fields in the grid multiple times.
λ> dfs grid
(0,0)
(1,0)
(2,0)
(3,0)
(4,0)
(4,1)
(4,2)
(4,3)
Just (4,4)
λ> bfs grid
(0,0)
(1,0)
(0,1)
(2,0)
(1,1)
(1,1)
(0,2)
(3,0)
(2,1)
(2,1)
(1,2)
(2,1)
(1,2)
(1,2)
(0,3)
(4,0)
(3,1)
(3,1)
(2,2)
(3,1)
(2,2)
(2,2)
(1,3)
(3,1)
(2,2)
(2,2)
(1,3)
(2,2)
(1,3)
(1,3)
(0,4)
(4,1)
(4,1)
(3,2)
(4,1)
(3,2)
(3,2)
(2,3)
(4,1)
(3,2)
(3,2)
(2,3)
(3,2)
(2,3)
(2,3)
(1,4)
(4,1)
(3,2)
(3,2)
(2,3)
(3,2)
(2,3)
(2,3)
(1,4)
(3,2)
(2,3)
(2,3)
(1,4)
(2,3)
(1,4)
(1,4)
(4,2)
(4,2)
(4,2)
(3,3)
(4,2)
(4,2)
(3,3)
(4,2)
(3,3)
(3,3)
(2,4)
(4,2)
(4,2)
(3,3)
(4,2)
(3,3)
(3,3)
(2,4)
(4,2)
(3,3)
(3,3)
(2,4)
(3,3)
(2,4)
(2,4)
(4,2)
(4,2)
(3,3)
(4,2)
(3,3)
(3,3)
(2,4)
(4,2)
(3,3)
(3,3)
(2,4)
(3,3)
(2,4)
(2,4)
(4,2)
(3,3)
(3,3)
(2,4)
(3,3)
(2,4)
(2,4)
(3,3)
(2,4)
(2,4)
(2,4)
(4,3)
Just (4,4)
We can improve this a bit with a nub
to remove duplicates from the list passed to "loop".
slightlyBetterBfs :: Eq a=> SearchAlgo a
SearchProb start expand isDone) = loop [start] where
slightlyBetterBfs (| any isDone xs = find isDone xs
loop xs | otherwise = loop (nub $ concatMap expand xs)
λ> slightlyBetterBfs grid
(0,0)
(1,0)
(0,1)
(2,0)
(1,1)
(0,2)
(3,0)
(2,1)
(1,2)
(0,3)
(4,0)
(3,1)
(2,2)
(1,3)
(0,4)
(4,1)
(3,2)
(2,3)
(1,4)
(4,2)
(3,3)
(2,4)
(4,3)
Just (4,4)
Getting back to binary trees, we can see that DFS and BFS will find different answers:
λ> dfs findNeedle
Just (Node "Needle A" Empty Empty)
λ> bfs findNeedle
Just (Node "Needle B" (Node "Hay" Empty Empty) (Node "Hay" Empty Empty))
Finally, it is very easy to define search problems on which DFS will run forever.
λ> dfs (SearchProb (1::Int) (\n -> [n*3,n*2]) even)
^CInterrupted.
λ> bfs (SearchProb (1::Int) (\n -> [n*3,n*2]) even)
Just 2
The order in which expand
generates new values also matters for the result.
λ> bfs (1::Int) (\n -> [n*3,n*2]) (\n -> even n && n > 200)
Just 486
λ> bfs (1::Int) (\n -> [n*2,n*3]) (\n -> even n && n > 200)
Just 216
λ> dfs (1::Int) (\n -> [n*2,n*3]) (\n -> even n && n > 200)
Just 256
λ> dfs (1::Int) (\n -> [n*3,n*2]) (\n -> even n && n > 200)
^CInterrupted.