Category Archives: Data Structures

The heap data structure

A sample implementation of the heap data structure is at:

https://github.com/Khanna111/DS/tree/master/Heap/src/com/khanna111

You would find two classes that implement the “siftDown” approach that creates a heap from an input array of “n” elements in O (n) complexity.

For details on the “siftDown” approach and the complexity being O (n), please refer to:

http://en.wikipedia.org/wiki/Heapsort

To generate all N size strings (of 0s and 1s)

Goal: To generate N size strings made up of two constituents:

  • 0 and
  • 1

This program is a rudimentary means to illustrate backtracking and recursion.

If we have N=2 then these are the N size strings:

  1. 00
  2. 01
  3. 10
  4. 11

And the code in Java is provided here:

Strongly connected components of a Graph – Kosaraju in Java

The Kosaraju algorithm can be used that if the underlying store is an AdjacencyList can be blazing fast with O (V+E) time.

Implemented as per wiki:
Kosaraju

  1. Let G be a directed graph and S be an empty stack.
  2. While S does not contain all vertices:
    • Choose an arbitrary vertex v not in S. Perform a depth-first search starting at v. Each time that depth-first search finishes expanding a vertex u, push u onto S.
  3. Reverse the directions of all arcs to obtain the transpose graph.
  4. While S is nonempty:
    • Pop the top vertex v from S. Perform a depth-first search starting at v in the transpose graph. The set of visited vertices will give the strongly connected component containing v; record this and remove all these vertices from the graph G and the stack S. Equivalently, breadth-first search (BFS) can be used instead of depth-first search.

Source Code:

Recursion versus Iterative code development

Earlier we had looked at the structure of a linked list and demonstrated an iterative java program to reverse it. In this entry, the stress would be on recursively implementation.

What is recursion and why is it used?

Recursion is the process of solving a complex problem by breaking it into smaller subproblems that are easy to solve. It is a decompositional technique and a variety of computer problems make their way into easily being resolved through the art of recursion. For instance: divide and conquer algorithms such as merge sort, reversing a linked list, towers of hanoi etc.

The easiest way to provide a recursive solution to a complex problem is to break down the problem into subproblems and so on till the subproblems are easier to comprehend and consequently easier to solve. For instace, lets break down the problem of reversing a linked list.

We will examine a similar linked list to that in an earlier blog entry (A->B->C->D).

if size = 1 then there is no need to do anything

if size = 2 then traverse to the node 2 and then reverse pointer from 2 to 1

if size = 3 then traverse to node 3 and then reverse pointer from 3 to 2 and then perform the steps outlined above

and so on. This leads us to a solution of the problem recursively

static Node reverseRecursive(Node n, Node previous, Node end)
{
    if (n.next != null)
    {
        end = reverseRecursive(n.next, n, end);
    }
    else 
    {
        end = n;
    }
    n.next = previous;
    return end;
}

The primary advantage of recursion is that it is simple to implement as the mathematical concepts can be easliy translated into a recursive solution.

The disadvantage is the fact that there is a lot of overhead of the recursive method invoking itself repeatedly. This leads to method arguments, method return addresses etc being stored in stack.

Iterative program to reverse a linked list has been elucidated in an  earlier blog entry.

Linked Lists

Linked List sample codebase with reversing the list in place (using Java):

There are two ways to reverse a linked list in place. One is to utilize recursion and the other is the iterative approach. Utilizing recursion is straight forward and intuitive but suffers from the drawback of performance issues et al. We will discuss benefits of recursion vis a vis the iterative approach and vice-versa in another blog entry.

In this blog entry, we will utilize the iterative approach. The logic of reversing a linked list (singly linked list) is simple and let me explain it using this example of a linked list:

A->B->C->D->E

Here node A has a next pointer to node B and so on. Node E has the next pointer set to NULL since it is the end of this list.

What we need to do: We need to pass the head of this list (node A) to the reverse method and it should return us node E with the list reversed as in:

E->D->C->B->A

The logic is to go one node at a time and changing the direction of the arrow. This is done by setting the next pointer of node B to be A and so on. This is demonstrated in the Java program below (please see the reverse operation in bold).

 public class Reverse {

    public static void main(String[] args) throws Exception
    {
        LinkedList l = new LinkedList();

        for (int i = 0; i < 11; i++)
        {
            Node n = new Node(new String("" + i), null);
            l.addToEnd(n);
        }

        Node n = l.getStartingNode();
        while (n != null)
        {
            System.out.println("\t" + n);
            n = n.next;
        }

        n = reverse(l.getStartingNode());
        while (n != null)
        {
            System.out.println("\t" + n);
            n = n.next;
        }

    }

    static Node reverse(Node startingNode) throws Exception
    {
        Node n = startingNode;
        Node n1 = n.next;
        n.next = null;

        while (true)
        {
            if (n1 == null)
            {
                return n;
            }

            Node n2 = n1.next;
            n1.next = n;

            if (n2 == null)
            {

                return n1;
            }
            n = n1;
            n1 = n2;

        }
    }

}

class LinkedList {
    Node begin = null;
    Node last = null;

    boolean addToEnd(Node a)
    {
        if (begin == null)
        {
            begin = a;
            last = a;
        }
        else
        {
            last.next = a;
            last = a;

        }
        return true;
    }

    Node getStartingNode()
    {
        return begin;
    }

    Node getLastNode()
    {
        return last;
    }

}

class Node implements Cloneable {
    Object value;
    Node next;

    Node() {

    }

    Node(Object v, Node n) {
        this.value = v;
        this.next = n;

    }

    @Override
    public String toString()
    {
        return value.toString();
    }

    @Override
    protected Object clone() throws CloneNotSupportedException
    {
        return super.clone();
    }

}

Binary Trees

Binary Tree:

A binary tree (T) is defined as a set of elements called nodes such that:

  1. T is empty (called the empty tree or null tree), or
  2. T contains a distinguished node R, called the root of T, and the remaining nodes of T form an ordered pair of disjoint binary trees T1 and T2.

This implies that a binary tree could have just a Root node, or it could have a root node and a empty T1 but a concrete T2 and so on.

The height of the empty binary tree is -1. The level (which starts from 0) is also referred to the depth of that element. The height of the nonempty binary tree is the depth of the farthest leaf.

2-Tree, Full and Complete Binary Tree:

A binary tree is a 2-tree if each node N has either 0 or 2 children. 

A binary tree is full if

  • T is empty or
  • There is a distinguished node R with either 0 or 2 children. This implies that the height of the children is the same.

A complete binary tree is full till the next to last level and the lowest level has leaves farthest to the left. All full binary tress are complete but the converse is not true.

For a complete binary tree, we have the following formulaes:

  • The maximun no of elements => 2^(i+1) -1 where i = height of the tree.
  • The maximum no of nodes at a particular level =  2^k where k represents the level

Memory Representation: 

A binary tree can be represented in the memory in the following ways:

  1. As a linked list.
  2. As an indexed array.

I would prefer the indexed array for a complete or a nearly complete binary tree since access time to look up the children of any node is constant. If the tree is far from complete then space would be wasted and this approach would not meet the space requirements for an algorithm.

For instance, in case of the array:

  • The root node is at Tree[0]
  • For a node at Tree[k], the left child if any is at Tree[2*k+1] and the right child if any is at [2*k+2]

 External Path Length of a binary tree:

It is the sum of all root to leaf path lenghts. Formally:

If t is a non empty binary tree, then the external path length E(t), is the sum of the depths of all leaves in that tree. This result is important for sorting algorithms.

TODO: the external path length (E(t) >= (k/2) floor(logk) proof.

 Traversals:

InOrder

Characterized by LNR (Left, Node, Right). It gets its name from the fact that for the binary search tree, this algo will traverse the elements in order (ascending)

PreOrder (also called DFS)

Characterized by NLR. It produces the prefix notation. To implement iteratively use a stack and then push right tree and then left tree. Note that the order of pushing is reversed.

PostOrder

Characterized by LRN. Produces the postfix notation. 

BFS

Use a queue to implement this one.

Binary Search Tree

A binary search tree is useful since it requires only Log(n) time on average, for inserting, removing or searching for an element. The worst is of the order of n.

However, if the binary search tree is balanced then the worst is of the order of Log (n). There is no data structure that provides better search performance than that.

Java Collections and Insertion Sort

Insertion Sort:

It is used for smaller size arrays. In Java Collections, insertion sort is applied to an array of size 7 or else. This cutoff is based on empirical studies. The best choice for the cutoff is actually machine ]dependent.

Method implementation:

public void insertionSort(int x[])

{

   int off = 0, len = x.length;

   for (int i = off; i < len; i++)

      for (int j = i; j > off && x[j-1] > x[j]; j–)

          swap(x, j, j-1);

}

public void swap (int x[]; int a, int b)

{

    int t = x[a]; x[a] = x[b]; x[b] = t;

}

Analysis:

The outer for loop runs n times (where n is the length of the array).

The inner for loop runs 0, 1, 2, 3 … (n-1) for the worst case scenario wherein the array is in descending order.

Therefore the worst case scenario = > n(n-1)/2

At every inner for loop step a decision is made whether to swap. If we assume that at each step there is a 50% chance of swapping, then the average case scenario would be: (n(n-1)/2 ) / 2 approx.

The best case behavior would be visible when the array is already sorted => insertion sort would be linear in n.

Bubble Sort:

Unlike insertion sort wherein the smaller element flows towards the beginning, here the larger element flows towards the end. Although the name might be catchy but it is not used industrially – there are other sorting algorithms that provide better perfomance.