Abstract Data Structures ( HL Part of Paper 1 20%)

This topic is heavily tested in the examination and Option D. The decision of which data structures to use are often looked at first by system designers as data is the main component of most systems. That decision will depend on how you want to access the data and what operations you want to do on it. Abstract here means that the basic essential methods and structure are provided and it is up to the developer to create concrete implementations.

Recursion

Before we look at the strengths and weaknesses of each data structure we look at recursion first as many data structure methods fit naturally with the technique (and it's popular with the exam board).

recursion

A recursive algorithm or method is one that calls itself. The call code to itself is refered to as the recursive case and the condition code to stop it calling itself ( it would keep going until an out of memory message) is called the base case.

The main idea is that you use recursion to solve a problem by dividing it into smaller (more easily solved) instances of itself.

The advantage of using recursion is that sometimes it is the most natural solution (nature uses it) and there will be less code. Handy in an exam!

The disadvantage is that it uses a lot of memory. A recursive call involves the use of stacks;
for storing/pushing on/popping out data/ return addresses/return values etc and if many recursive calls are made, the memory usage can be very large. In nearly every case an iterative version can be used instead and would run probably 3 times faster.

A method to write recursive algorithm.

1.Write method header

2. Work out base case . Is it when n=1 or n=0 which are the most common

3. To work out the recursive case just go ONE LESS than the first call

4. Work out any processing/artimetic/string handling that needs to be done

NB Steps 3 and 4 can be swapped. If the recursive call is last then this is known as tail recursion

Let's see this method in action with a program to work out the factorial of a number

5! = 5 * 4 * 3 *2 *1 = 120

One way of solving 5! is divide it into smaller problems so 5! = 5 *4! and 4 ! = 4 *3! and so on.

1. factorial(n) {

2. 0! and 1! =1 so base case is: if n<=1

3. the recursive call is 1 less than n so: n * factorial(n-1) eg. 5 * 4!

4. No other processing other than returning 1 which stimulates the reqinding of the aglorithm

  • factorial(int n)
  • {
  • if (n >= 1) //base case
  • return n*factorial(n-1); //recursive case
  • else
  • return 1;
  • } Or a tail recusion version
    • factorial(int n)
    • {
    • if (n = 1)
    • return 1;
    • return n*factorial(n-1); //recursive case
    • }
  • Tracing a recursive algorithm

    The way to think about how a recursive algorithm works is to imagine a kitchen timer. Every time you turn the timer it represents a recursive call. When the you release the timer it uncoils back up which represents the results of each call being passed back up to the previous call.

    timer

    The trace below isa nested approach and is one way

    ..........Winding Up

    .........................................recoil

    Another way is to use a table. The return column goes bottom to top

    Call # call n n>=1 return
    1 fact(5) 5 true 120
    2 5 *fact(4) 4 true 24
    3 4 *fact(3) 3 true 6
    4 3 *fact(2) 2 true 2
    5 2* fact(1) 1 true 1
      fact(0) 0 false 1

    Here is an actual past exam question with a trace mirroring a stack which recursion uses anyway

    Example 2

    Reverse a string so "kit cat" will output "tac tik"

    1. reverse(str, n) // str is the string and n is the length

    2. base case: if n = 0 then return empty string

    3. return string[n-1]+ reverse(str, n-1) //note the one less again

    function reverseString(str, n){
        //If the length is 0     //then return an empty string
     if(n == 0){      
       return '' }  //Call the function recursively with one character less 
    return str[n-1] + reverseString(str, --n); }

    Now recoil from bottom to get "tac tik"

     

    You may be asked to convert a recursive algorithm to an iterative one or vice versa. if the algorithm is tail recursion then it is pretty straight forward. Split the recursive call into 2 and use a while loop .

    • factorial(int n) {
    • fact = 1
    • while(n>0) {
      • fact =fact*n
      • n =n-1; }
    • print fact
    • }


    There will be more examples as we discuss different structures below.

    Abstract data structures are collections of data that include arrays, stacks, queues, linked lists and binary trees. Other than arrays they can grow and shrink and are therefore dynamic.

    They can be dynamic because as well as the data, nodes contain a link to other nodes using pointer references to locations in RAM. Dynamic structures therefore use memory more efficiently, there are no overflow issues and we can manipulate them more readily than static structures like arrays. Remember that static structures can randomly access elements and are in contiguous RAM locations.

    You will need to know when to use them and how to complete the most common data processing operations on each of the basic data structures which are:

    1. Addition
    2. Deleting
    3. Retrieval
    4. Traversal
    5. Sorting

    The most common taskis iterating over a collection either to print out the elements or to search for an element with a certain value

    I tend to use the enhanced for loop as it stops you going out of bounds. Taking a linked list as an example:

    for (String temp : linkedList) {
    System.out.println(temp);
    }

    The exam board are more likely to use a while loop:
    int i =0;
    while (i<linkedList.size()) {
    System.out.println(linkedList.get(i));
    i++;
    }


    Did you note the .size and get(i) methods?

    What if we want to iterate over a collection of students to find the first highest level (9) in a GCSE subject? If we hit a 9 then we can stop.

    Some notes to consider.

    When you find the first 9 we need a system to stop. Otherwise the loop will keep going to the end even after we found the first 9. We would use a flag set initially to false which changes to true if a 9 is found and we test for this in an if statement. A while loop is needed.

    boolean found = false;
    firstTopStudent = new Student(); //set up
    topStudent.setlevel =0;
    while (i < studentList.size() && (!found))
    {
    current = studentList.get(i);
    if (current.getLevel()>firstTopStudent.getLevel()) {


    result = i;
    }


    or simply, put the condition in the while loop with an AND.

    while ((i <StudentList.size() && (current.getLevel()! =9))

    Arrays

    We have covered arrays in the programming course and in SL. For SL and HL we need to know a few more techniques. Use an arraylist if there is any chance of having to add to it. Arrays are fixed size static structures (not dynamic) with an index allowing direct access to elements. Arrays elements are stored in contigous storage locations. Searching is O(1) but inserting and deleting are problematic.

    We can declare an array in a number of ways. Let's declare an array of 10 customers.
    I know! Who has just 10 customers? Which is why an arraylist or linked list would be better, but lets stick with it. You could create an array of 10,000 but what if you only do use 10. You have wasted space.

    private Customer[] customers = new Customer [10];


    array
    To increase the size of a existing array we have to declare a suitably sized new one and copy every element over.

    Arrays are fixed so to add another element to the next vacant place is not easy. We have to iterate over the array until we find an empty slot by using an index variable to keep track of the next empty slot

    Object[] arr = new Object[10];
    int index = 0;   
     public void add(Object obj)  { 
         if (index < arr.length)          
    			arr[index++] = obj; // add an object to the first empty spot, if the array is not full  }

    Iterating over an array

    See programming course

    To check if an array is empty

    boolean isEmpty = array == null || array.length == 0;



    What if the array is full? We will need to copy the original into another larger array

    int[] newArray = Arrays.copyOf(array, array.length + 1);
    newArray[newArray.length - 1] = newItem;



    What if we want to place a new item between two others, say between indicies 4 and 5?

    If you put the new object in 4 you will lose the element that was there so the trick would be to start at position 8 (the last placed element) and move that along 1 place into 9 and then reduce the indicies iterator by 1 and move that into where 8 was and so o
    array add

    To do this we need the index 5 in a variable called pos and another called end which has the first empty position (see algorithm above) and add 1

    end++;
    for (int i =end-1; i >pos; i--) // index wii go down to pos +1 {
    arr[i+1] =arr[i]; //going from higher index to lower
    }
    arr[pos] = ?// whatever new value


    If you import the ArrayUtils class then you can use >

    int[] largerArray = ArrayUtils.insert(2, array, 77);


    We have to specify the index in which we want to insert the value, and the output will be a new array containing a larger number of elements.


    What if you want to swap 1 and 4 over?

    swap

    We have to put the 4 in a temporary variable and then put the 1 in position 0 and temp in position 1. To make it more useful we can add source and destination variables.
    swap code
    Sorting arrays

    // Sort in ascending order
    Arrays.sort(arr);

    / Sorts arr[] in descending order         Arrays.sort(arr, Collections.reverseOrder());

    past paper arraysanswers

    Trees

    Trees are popular with examinations and interviews because they can be used to test your knowledge of recursion and Big O run time analysis. They are simple to implement within the time constraints of the exam

    A tree is made up of nodes (data elements) with zero , one or several child pointers to other elements. Notice how each node has only one other node pointing to or referencing it.

    Image result for computer science trees

    Tree terminology:

    Root: The top level - Monaco

    Parent: A node that points to other nodes. Paris is the parent of Norwich

    Child: A node is the child of any node that points to it Lewisham is the child of Monaco

    Descendant: All the nodes that can be reached from one node. Norwich, New York,

    Vatican, Rome and Partington are descendants of Paris

    Ancestor: Any node that can be reached  by following the children . Monaco, Norwich, Paris are the ancestors of  New York

    Leaf: leaves have no children – New York, Lewisham and Partingdon


    Binary Trees

    binary tree

    A binary tree is a tree where each node has no more than two child nodes. We would normally refer to the children as left and right nodes. Pointers link each node.

    When an element does not have a left or right node the pointer will be set to NULL.

    A binary search tree (BST) is one that is ordered or sorted. The value of each nodes left child is less than or equal to its value AND the right child is > or = to it. Trees have sub trees. See how 3 is a subtree as well as 6 for example.

    The big advantage of a BST structure is that searches are fast and simple which is useful for data storage

    bst

    Where are they used?

    Trees are unlike the other data structures as they are a hierarchical in structure. This makes them good for representing file systems. They would be quicker to search for an item than arrays say making them useful for data storage. Could be used for a merge sort.

    However, compared to linked lists it can be more difficult to manipulate references and pointers in a binary tree, making it harder to add, delete etc nodes.

    Searching binary trees

    Looking for 4 we go left as 4 is not 8 but less than 8. Thus doing away with half of the tree. 4 is greater than 3 so go right. Six is greater than 4 so go left. And we have found it.

    For each iteration we half the tree.  If we get down to 1 node we know if we found the number or not. The number of times (x) we have to half the tree to get down to this one number can be worked out using a logarithm.  n (number of elements)  = 2X.

    Big O is  O - (log(n)) if the tree is balanced. This is fast because we eliminate half the nodes from the search on each iteration by going either left or right.

    If it is not balanced it could be like a linked list and near O(n)

    Pseudocode

    Trees have the data plus a node.

    Start at the root node

    loop while current node is not NULL

    if the current node = value searched for
    return current node

    if current less than value looked for

    Make the left node current node

    if current node greater than value looked for

    make right node curent code

    end loop

    Each iteration gets smaller and smaller as you move through the sub trees but is the same procedure so recursion is used.

    public static boolean isElementinTree(int num, BinaryTreeNode root)   {
          return root != null && (root.getData() == num ||                            
          isElementInTree(num, root.getLeft()) ||                             
          isElementInTree(num, root.getRight()));  }
    	  

    bin tree

    Look at how easy it is to find the smallest number and the largest number using recursion. (going left is the one less in our method)

    So the steps for finding the node with minimum value in a Binary search tree are as follows-

    1. Starting from the root node go to its left child.
    2. Keep traversing the left children of each node until a node with no left child is reached. That node is a node with minimum value.
     // Finding node with min value     
     public Node findMinimum(Node node){         
        if(node.left != null){              
            return findMinimum(node.left);    }          
        return node;      } 
    

    Same way steps for finding the node with maximum value in a Binary search tree are as follows-

    1. Starting from the root node go to its right child.
    2. Keep traversing the right children of each node until a node with no right child is reached. That node is a node with maximum value.

    Task: Write the code for this yourself

    Note how that a BST consists of a number of sub trees and the procedure is the same for each sub tree.

    As the search continues the number of times this happens gets smaller -  down eventually to 1

    Such an algorithm lends itself to a recursive solution. Popular with exams but less so with programmers who are more likely to use plain old iteration.

    Memory can be organised in stacks or heaps. Heaps are a form of binary trees. Each node’s value is less than its parent node’s value.  So you can go straight to the highest value in the tree or sub trees

    A and E departments at a hospital use a heap to work out when patients are seen. Each patient is given a priority  and put on the heap.

    Heaps are used to store references to OOP type object instances whereas generally other variables and parameters will go on a stack.

    Traversing BST - algorithms and examples

    preorder postorder inorder
    Pre Order Traversal
    Post Order Traversal
    In Order Traversal

    In the examination we focus on depth first type traversals pre,post and inorder

    Linked Lists

    A linked list is a dynamic list which means it can grow or shrink. Access to elements are linear. Each node has data or a record and a next pointer. The data and next pointers can be in non contigous storage locations and so also needs pointers to the start or root nodeand to the end of the list (NULL). If the list is empty then these pointers will be the same.

    Image result for singly linked list

    Linked lists are one of the least used dynamic data structures. Java has more powerful dynamic data structures built into its standard library.

    They are insanely popular though with examination boards and interviewers, only in the same way as recursion is.  Questions can be completed in 10 minutes or so. Talking about recursion, as linked lists are searched in a linear way a recursive approach to manipulating them would probably result in lots of calls and stack overflows a possibility.

    You will need to know the three main types of linked lists. In interviews or examinations if they just say “Linked List” they probably mean a singly linked one. There are doubly linked list and circular linked lists.

    If a question states a function takes a linked list as an argument, then they mean it takes the head pointer as the argument.

    A search in a linked list for a value has to be done sequentially. WATCH out for the fact that by the end if there is no value searched you may crash the program.

    Use a while (element) { conditional loop to prevent this.

    Where are they used?

    Exam questions and interviews - Sorry!

    Linked lists can be used to create stacks or memory heaps in low level memory management. Also can be used for representing queues and lists. In the examination you can justify the use of a linked list by saying space can be allocated dynamically as elements in an application are added or removed. Hotel guests or hospital waiting rooms for are examples from the past.

    Compared with arrays and array lists

    Deleting and adding new elements will be more efficient than an array or arraylist as you don't have to shift elements along or have to re-size itself with a complex algorithm.

    So removing an element is Big O(1)

    Array are designated to memory at design time whereas in a linked list this is done at run time. Extra memory is needed for the pointers in a linked list.

    Arrays use cache effectively (adjacent data) and have random access O(1) whereas linked lists do not.
    O(n)

    Typical Java instantiation for a list of objects using the JavaUtils library

    private LinkedList<Customer> CustomerList = new LinkedList<Customer>

    Typical method to add a new node to the end

    public void add(Customer C) {

    CustomerList.addLast(P); }

    Please note that in Option D questions you will probably use the JavaUtils methods which does all the hard work like adding nodes at a particular place for you. However in the Paper 1 examination they are more likely to get you to write Pseudocode for manipulating the pointers like this:

    remove

    Method

    1: Do a linear search to find 5 starting from the root unti we get to NULL

    2. If found the then change the next pointer in previous node to the next node of the 5.

    3. If not found then return Null

    removed node

    code remove

    You need a special case for deleting the first element in a linked list.

    Test to see if the elementt to be removed is the head.

    If so:

    The heads next point becomes the head pointer

    Instead of a diagram a linked list can be represented in a table.

    Node No order No next
    0 154 1
    1 157 2
    2 155 3
    3 156 NULL

    If we needed to add 159 between node 0 and node 1 we keep the Node number order but obviously change the next pointers

    Node No order No next
    0 154 4
    1 157 2
    2 155 3
    3 156 Null
    4 159 1

    You may find it helps to draw out the table to work it out

    Common JavaUtils methods to use Linked List

    Doubly Linked Lists

    Circular Linked Lists

    Queues

    A queue is a data structure in which elements are removed in the same order as they were entered. (FIFO). A queue requires 2 pointers. One for the front and one for the back.

    Methods

    Enqueue : puts an item at end of the q

    Dequeue: removes and returns first item

    isEmty: Returns true if the q is empty.

    Where are they used?

    To create any schedule based on a first come first served basis. Printer spooling , a process and also routers queue up packets of data prior to being sent.

    Queues can be circular

    Below we have some common real-world examples where circular queues are used:

    1. Computer controlled Traffic Signal System uses circular queue.
    2. CPU scheduling and Memory management.

    Theory slides Stacks and Queues

    Stacks

    A stack is a static data structure where the last item added is the first off the list. (LIFO). Normally used to store temporary data say during an interrupt . Only one pointer is needed to maintain a stack which identifies the top of the stack.

    If you try to add (push) an item to an already full stack you will get a stack overflow error. If you try to remove (pop) an item from an empty list you will get a stack underflow error.

    stack

    Where are they used?

    I have already mentioned interrupts where pointers and contents of registers are pushed onto a stack and the popped when the interrupt is over. They are also ideal for applying redo actions by simply storing the last action and the popping it if needed. They are also used in recursion where variables and address registers of calls are stored and then popped when the base case is reached.

    You could use a stack to reverse the order of a queue. This is a popular question.

    In the exam use the correct terminology enqueue, pop etc.

    Managing a stack with an array

    Managing a stack with a linked list

    Comprehensive data structure multiple choice questions