\input itemize \def\title{Binary tree insertion} \def\topofcontents{\null\vfill \titlefalse % include headline on the contents page \def\rheader{\hfil} \centerline{\titlefont Inserting into an ordered binary tree} \vfill} @*Preliminaries. Our object is to design a verified binary tree insertion routine. To make our lives easier, we will employ some simplifications: \itemize \item We will use iteration instead of recursion. \item We will use Dijkstra's language of guarded commands \item We will assume the existence of a tree data type, such that if |t| is a tree either |t| is empty (|t=emptyset|) or |t| has a left subtree, a right subtree, and a datum (|t=<>|). If |t=<>|, we may write |t.l| for |l|, |t.d| for |d|, and |t.r| for |r|. \item We will assume sequences; if |s| is a sequence then either |s| is empty (|s=empty|) or |s| is an element followed by a sequence (|s=z Z|). \item We have a membership operator |member| that will test for membership in sequences or trees. \enditemize @ We want to talk about ordered binary trees, so we'll want the notion. @c ordered.emptyset == true; ordered.<>==ordered.l & ordered.r &@| (forall y:y member l: y<=d) &@| (forall y: y member r: y>=d); @ Our mission is to insert a datum |x| into a binary tree |T| such that the resulting tree is ordered. We'll do this by creating a new tree |t| which is ordered and the membership of which is the union of the membership of |t| with $\{x\}$. We don't care exactly what happens if |x| is already in the tree, although we could add a postcondition |x member T ==> T=t|. If we state our problem formally, we have @c {PRE: ordered.T} {POST: (forall y::y member t <=> y member T | y=x) & ordered.t} @ Let us develop some notion of ``insertion at a node.'' Perhaps we can use that as a guide to weakening the postcondition and finding a loop invariant. In essence what we will want to do is select some empty subtree of |T|, and replace it with the tree |<>|. Let us generalize, and imagine that we have a variable |s| that represents |T| and points at some node of |T| ``to be replaced.'' Let |attach.t.s| denote the result of substituting |T| for the subtree whose root is that node. In particular if |s| points to the root of |T|, then |attach.t.s=t|. @ How can we compute |attach|? We've already seen that if |s| points to the root of |T|, then |attach.t.s=t|. Suppose |s| does not point to the root of |T|. Then there is an |s`| that points to the parent of the node pointed to by |s|. Then, there is some |t`| such that |attach.t.s=attach.t`.s`|, and, furthermore, either |t`=<>| or |t`=<>|, for some |u| and |d|. So, let |s| be a sequence of records, and let each record contain: \itemize \item A choice (|left| or |right|) \item A datum |d| \item A tree |u| \enditemize and define @c attach.t.empty == t; attach.t.(<> s) == attach.<>.s; attach.t.(<> s) == attach.<>.s; @ Now we can imagine our insertion problem as being broken into two parts. First, we find the right place to insert |x|. This means computing an |s| such that |attach.<>.s| is what we want. Second, we have to compute |t| so that |t=attach.<>.s|. Let's imagine that we already have an |s|. Then, we can change our postcondition by substituting |attach.t.s| for |t|, and make that our loop invariant. We then start out with |t=<>|, and write a loop that adds to |t| while removing from |s|. Since |s=empty| at the end of the loop, and since |attach.t.empty=t|, the invariant conjoined with the negation of the guard gives us the postcondition. I leave it for the reader to show that the loop body leaves the quantity |attach.t.s| unchanged. @c t := <>; {invariant (forall y::y member attach.t.s <=> y member T | y=x) & ordered.(attach.t.s)} {bound #s} do s != empty -> let choice, data, tree, S satisfy <> S = s; if choice = left -> t,s := <>, S [] choice = right -> t,s := <>, S fi od {POST: (forall y::y member t <=> y member T | y=x) & ordered.t} @ It remains for us to compute a suitable |s| in the first part of the program. Let us write |X| for |<>|. Then we must assign to |s|, establishing $$\hbox{| (forall y::y member attach.X.s <=> y member T || y=x) & ordered.(attach.X.s)|.% }$$ Since |y=x <=> y member X|, that is equivalent to $$\hbox{ | (forall y::y member attach.X.s <=> y member T || y member X) & ordered.(attach.X.s)|.}$$ Let us imagine we have a variable |t| such that |T=attach.t.s|. Then we need $$\hbox{|y member attach.X.s <=> y member attach.t.s || y member X|}$$ Now we make use of a property of |attach|, viz.\ |(forall X::y member attach.X.s <=> y member X || y member attach.emptyset.s)|. (The proof is by induction on the length of |s|.) From this we can get $$ \hbox{ |t=emptyset ==> (forall y::y member attach.X.s <=> y member attach.t.s || y member X)|.}\eqno(1)$$ This suggests the following code fragment: @c t,s := T,empty; {invariant T=attach.t.s & ordered.(attach.X.s)} do t != emptyset -> /* loop body */@; od {t=emptyset & T=attach.t.s & ordered.(attach.X.s)} /* which, by (1), implies */ {(forall y::y member attach.X.s <=> y member T | y=x) & ordered.(attach.X.s)} @ The question is now what loop body will maintain the invariant. Since in our earlier loop we made |t| larger and |s| smaller, we can imagine inverting that loop to get the new loop. We also invert the proof that the value of |attach.t.s| remains unchanged. @c {invariant T=attach.t.s & ordered.(attach.X.s)} do t != emptyset -> if <<@tfirst guard@>>>@; -> t,s := t.l, <> s [] <<@tsecond guard@>>>@; -> t,s := t.r, <> s fi od @ The question remains as to what the guards must be to make the whole thing work. Taking the first branch, the weakest precondition of |ordered.(attach.X.s)| is @c ordered.(attach.X.(<> s)) !<=> ordered.(attach.<>.s) !<= /* by a lemma to follow */ ordered.(attach.X.s) & ordered.(attach.t.s) & x <= t.d !<= ordered.(attach.X.s) & ordered.T & T=attach.t.s & x <= t.d @ So if we strengthen our invariant to include |ordered.T|, we can make the first guard |x<=t.d|, and the second guard |x>=t.d|, giving @c {invariant T=attach.t.s & ordered.(attach.X.s) & ordered.T} {bound depth.t} do t != emptyset -> if x <= t.d -> t,s := t.l, <> s [] x >= t.d -> t,s := t.r, <> s fi od {t=emptyset & T=attach.t.s & ordered.(attach.X.s)} /* which, by (1), implies */ {(forall y::y member attach.X.s <=> y member T | y=x) & ordered.(attach.X.s)} @ Now we have to finish the proof we started earlier. We use a clever trick involving inorder traversals. For reference, the inorder traversal is defined by $$\eqalign{ |in.emptyset|&|==empty|\cr |in.<>|&|==in.l d in.r|\cr }$$ Suppose that $$ (\forall s::( \exists l,r:: (\forall t:: |in.(attach.t.s)=l in.t r|))) \eqno (2)$$ (which we will show in just a moment). Given |s|, choose such an |l| and |r|. Then we have the lemma we need: @c ordered.(attach.<>.s) !<=> /* by (2) */ sorted.(l x t.d in.(t.r) r) !<= sorted.(l x r) & sorted.(l in.(t.l) t.d in.(t.r) r) & x <= t.d !<=> ordered.(attach.X.s) & ordered.(attach.t.s) & x <= t.d @ We prove (2) by induction on the length of |s|. If |s=empty|, then |l=empty| and |r=empty| satisfy |in.(attach.t.s)=l in.t r|. If |s!=empty|, then write |s=z Z|. Our induction hypothesis is |(exists l`,r`:: (forall t::in.(attach.t.Z)=l` in.t r`))|. If |z=<>|, then let |l=l`| and |r=data in.tree r`|. Then, for any |t|, $$\eqalign{ in.(attach.t.s) &= |in.(attach.t.(<> Z))|\cr &=|in.(attach.<>.Z)|\cr &=|l` in.t data in.tree r`|\cr &=|l in.t r|,\cr}$$ and that's the induction step. @*The finished program. Here we put the whole program together: @c {PRE:ordered.T} t,s,X := T, empty, <>; {invariant T=attach.t.s & ordered.(attach.X.s) & ordered.T} {bound depth.t} do t != emptyset -> if x <= t.d -> t,s := t.l, <> s [] x >= t.d -> t,s := t.r, <> s fi od; {(forall y::y member attach.X.s <=> y member T | y=x) & ordered.(attach.X.s)} t := X; {invariant (forall y::y member attach.t.s <=> y member T | y=x) & ordered.(attach.t.s)} {bound #s} do s != empty -> let choice, data, tree, S satisfy <> S = s; if choice = left -> t,s := <>, S [] choice = right -> t,s := <>, S fi od {POST: (forall y::y member t <=> y member T | y=x) & ordered.t} @*Index.