Imagine that you are hired by company XYZ to organize all of their records into a computer database. The first thing you are asked to do is create a database of names with all the company’s management and employees. To start your work, you make a list of everyone in the company along with their position (Tab.4-4).
Tab.4-4 List of management and employees
But this list only shows one view of the company. You also want your database to represent the relationships between management and employees at XYZ. Although your list contains both name and position, it does not tell you which managers are responsible for which workers and so on. After thinking about the problem for a while, you decide that a tree diagram as shown in Fig.4-4 is a much better structure for showing the work relationships at XYZ.
Fig.4-4 Relationships between management and employees
These two diagrams are examples of different data structures . In one of the data structures, your data is organized into a list. This is very useful for keeping the names of the employees in alphabetical order so that we can locate the employee’s record very quickly. However, this structure is not very useful for showing the relationships between employees. A tree structure is much better suited for this purpose.
In computer science, data structures are an important way of organizing information in a computer. Just like the diagrams above illustrate, there are many different data structures that programmers use to organize data in computers. Some data structures are similar to the tree diagram because they are good for representing relationships between data. Other structures are good for ordering data in a particular way like the list of employees. Each data structure has unique properties that make it well suited to give a certain view of the data.
During these lessons, you will learn how data structures are created inside a computer. You will find there is quite a difference between your mental picture of a data structure and the actual way a computer stores a data structure in memory. You will also discover that there are many different ways of creating the same data structure in a computer. These various approaches are tradeoffs that programmers must consider when writing software. Finally, you will see that each data structure has certain operations that naturally fit with the data structure. Often these operations are bundled with the data structure and together they are called a data type. By the end of this study, you should be able to do the following.
(1) Show how data structures are represented in the computer.
(2) Identify linear and nonlinear data structures.
(3) Manipulate data structures with basic operations.
(4) Compare different implementations of the same data structure.
The most common linear data structure is the list . Now we are going to look at a particular kind of list: an ordered list. Ordered lists are very similar to the alphabetical list of employee names for the XYZ company. These lists keep items in a specific order such as alphabetical or numerical order. Whenever an item is added to the list, it is placed in the correct sorted position so that the entire list is always sorted.
1) Ordered List
Before we consider how to implement such a list, we need to consider the abstract view of an ordered list. Since the idea of an abstract view of a list may be a little confusing, let’s think about a more familiar example. Consider the abstract view of a television. Regardless of who makes a television, we all expect certain basic things like the ability to change channels and adjust the volume. As long as these operations are available and the TV displays the shows we want to view, we really don’t care about who made the TV or how they chose to construct it. The circuitry inside the TV set may be very different from one brand to the next, but the functionality remains the same. Similarly, when we consider the abstract view of an ordered list, we don’t worry about the details of implementation. We are only concerned with what the list does, not how it does it.
Suppose we want a list that can hold the following group of sorted numbers: [2 4 6 7]. What are some things that we might want to do with our list? Well, since our list is in order, we will need some way of adding numbers to the list in the proper place, and we will need some way of deleting numbers we don’t want from the list. To represent these operations, we will use the following notation:
AddListItem(List, Item)
RemoveListItem(List, Item)
Each operation has a name and a list of parameters the operation needs. The parameter list for the AddListItem operation includes a list (the list we want to add to) and an item (the item we want to add). The RemoveListItem operation is very similar except this time we specify the item we want to remove. These operations are part of the abstract view of an ordered list. They are what we expect from any ordered list regardless of how it is implemented in the computer.
2) Stack
Another common linear data structure is the stack . With the stack, we have restricted the access to one end of the list by using the pop and push operations. The result of this restriction is that items in the list pile one on top of the other. To get to the bottom item, we must first remove all the items above it. This behavior is sometimes described as “last-in, first-out” or LIFO since the last item to enter the stack is the first item to leave the stack. With the stack, the top item is always the last item to enter the stack and it is always the first item to leave the stack since no other items can be removed until the top item is removed.
We will represent these two operations that can be performed on a stack with the following notation:
Item PushStackItem(Stack, Item)
Item PopStackItem(Stack)
The PushStackItem operation has two parameters which are a stack and an item. This operation adds the item to the top of the specified stack. The PopStackItem operation only takes one parameter which is a stack. However, notice that this operation has the keyword item listed to the left. This keyword represents the item that is removed from the top of the stack when the PopStackItem operation is done. These two operations are part of the abstract view of a stack. They are what we expect from any stack regardless of how it is implemented in the computer.
3) Queue
Like the stack, the queue is a type of restricted list. However, instead of restricting all the operations to one end of the list as a stack does, the queue allows items to be added at one end of the list and removed at the other end.
The restrictions placed on a queue cause this structure to be a “first-in, first-out” or FIFO structure. This idea is similar to customer lines at a grocery store. When customer X is ready to check out, he or she enters the tail of the waiting line. After the preceding customers have paid, then customer X pays and exits the head of the line. The check-out line is really a queue that enforces a “first come, first serve” policy.
We will represent these two operations that can be performed on a queue with the following notation:
Item EnqueueItem(Queue, Item)
Item DequeueItem(Queue)
These two operations are very similar to the operations we learned for the stack data structure. Although the names are different, the logic of the parameters is the same. The EnqueueItem operation takes the Item parameter and adds it to the tail of Queue. The DequeueItem operation removes the head item of Queue and returns this as Item. Notice that we represent the returned item with a keyword located to the left of the operation name. These two operations are part of the abstract view of a queue. Regardless of how we choose to implement our queue on the computer, the queue must support these two operations.
Tree is a common nonlinear data structure . We have already seen an example of a tree when we looked at the employee hierarchy from the XYZ company. Let’s take another look at this diagram with some of the important features of trees highlighted, see Fig.4-5.
In this diagram, we can see that the starting point, or the root node , is circled in blue. A node is a simple structure that holds data and links to other nodes. In this case, our root node contains the data string “John” and three links to other nodes. Notice that the group of nodes circled in red does not have any links. These nodes are at the ends of the branches and they are appropriately called leaves or leaf nodes . In our diagram, the nodes are connected with solid black lines called arcs or edges. These edges show the relationships between nodes in the tree. One important relationship is the parent/child relationship. Parent nodes have at least one edge to a node lower in the tree. This node is called the child node . Nodes can have more than one child, but children can only have a single parent. Notice that the root node has no parent, and the leaf nodes have no children. The final feature to note in our diagram is the subtree . At each level of the tree, we can see that the tree structure is repeated. For example, the two nodes representing “Charles” and “Rick” compose a very simple tree with “Charles” as the root node and “Rick” as a single leaf node.
Fig.4-5 Tree diagram
Now let’s examine one way that trees are implemented in the computer’s memory. We will begin by introducing a simple tree structure called a binary tree . Binary trees have the restriction that nodes can have no more than two children. With this restriction, we can easily determine how to represent a single binary node in memory, as shown in Fig.4-6. Our node will need to reserve memory for data and two pointers .
Fig.4-6 Structure of binary node
Using our binary node, we can construct a binary tree. In the data cell of each node, we will store a letter. The physical representation of our tree might look something like Fig.4-7.
Fig.4-7 Physical representation of tree
The last data structure that we will study in this section is the graph . Graphs are similar to trees except they do not have as many restrictions. In the previous lesson, we saw that every tree has a root node, and all the other nodes in the tree are children of this node. We also saw that nodes can have many children but only one parent. When we relax these restrictions, we get the graph data structure. The logical representation of a typical graph might look something like Fig.4-8.
Fig.4-8 A graph
Notice that our graph does not have a root node like the tree data structure did. Instead, any node can be connected with any other node. Nodes do not have a clear parent/child relationship like we saw in the tree. Instead nodes are called neighbors if they are connected by an edge. For example, node A above has three neighbors: B, C, and D.
(4-2) It is not hard to imagine how the graph data structure could be useful for representing data. Perhaps each of the nodes above could represent a city and the edges connecting the nodes could represent roads. Or we could use a graph to represent a computer network where the nodes are workstations and the edges are network connections. Graphs have so many applications in computer science and mathematics that several algorithms have been written to perform standard graph operations such as searching the graph and finding the shortest path between nodes of a graph.
Now that you have a basic idea of the logical representation of graphs, let’s take a look at one way that graphs are commonly represented in computers. The representation is called an adjacency matrix , and it uses a two-dimensional array to store information about the graph nodes. The adjacency matrix for our graph (in Fig.4-8) is given in Fig.4-9.
Fig.4-9 Adjacency matrix for a graph
Notice that the matrix has six rows and six columns labeled with the nodes from the graph. We mark a “1” in a cell if there exists an edge from the two nodes that index that cell. For example, since we have a edge between A and B, we mark a “1” in the cells indexed by A and B. These cells are marked with a dark gray background in the adjacency matrix. With adjacency matrix, we can represent every possible edge that a graph can have.