Linked lists are used within QDOSMSQ to hold details of the directory devices installed on the system, interrupt routines and so on, but what are they exactly?
Imagine that you are writing a program, and you decide that you need some storage for some data, let's say a list of people's names and addresses. So, how about an array? Well, the problem with that is how many entries are you going to allow? If you don't allow enough entries, you won't have much of an address book. If you have too many entries then you are wasting space. If you sell the program, or give it away, then you need to consider the needs of people other than yourself - some will need a few entries and others, much more. How do you cope?
Well, a linked list could be the answer. You start off with no storage defined at all, except for a single, maybe two, variables which hold the address in memory of the beginning (and maybe the end) of your list of addresses. As you add new contacts to your address book, each one is created at some 'random' location in memory and linked into your existing list of contacts. Hence, you have a linked list.
In a linked list, each entry is called a node, and the pointer to the very first entry in the list is known as the root node.
In memory an array, of 10 entries of 100 byte long strings, is consecutive. Don't forget the strings have a word at the start defining their length, so each entry is actually 102 bytes long. If the first entry is located at address 1000 then the next entry is at address 1102, the next at 1204 and so on. There are no gaps between entries and you can quickly calculate the start address of any particular entry as 1000 + (INDEX * 104) where INDEX is the entry you are looking for, starting at zero.
In a linked list, the nodes are all over the place, the first might be at address 1000, the second at 2000, the third at 1200 and so on. There is no logical order to the locations and you cannot calculate the address of a particular node using any formula as you can with arrays.
What you can do, however, is store away the address of the first node in a special node known as the root node, and from that, you can navigate along the list from start to finish by finding the address of the next node from the data stored in the individual nodes. Our 100 byte long strings would be 106 bytes allowing 4 bytes to store the memory address of the 'next' entry in the list and the obligitory 2 byte legth word. However, think about that 102 bytes in each entry of the array - you might not need all 102 bytes. In our linked list, each node will have 4 bytes for the pointer and only as much space as is required by the data, so each node need not be 102 + 4 bytes long. Another saving over the array.
A linked list can be thought of like an old program on UK TV, Treasure Hunt, where Aneka Rice used to zoom around the country in a helicopter picking up clues in one location which told her where to go for the next clue and so on, until she found the 'treasure' at the end of the list of clues. This is exacly what a linked list is.
If we have a node in our list defined as follows, then we can see how it looks in memory below. Each node in the list will look like this :
+------+ 4 bytes at the start of each node hold the address of the NEXT node in the | NEXT | list. +------+ | DATA | The data bytes can be any size and hold any data we like. It can even hold a +------+ linked list itself. The possibilities are endless.
The root node, as mentioned above, is special. It has no data part, only the pointer part.
The conceptual layout in memory is a bit like this (using the addresses mentioned above and assuming the root node lives at address $ABCD) :
+-------+ +------+ +------+ +------+ | FIRST |--------> | NEXT |--------> | NEXT |--------> | NEXT | +-------+ +------+ +------+ +------+ ABCD | DATA | | DATA | | DATA | +------+ +------+ +------+ 1000 2000 1200
In physical terms, there are no handy arrows. Using real values as described above in the pointer locations, it would look like this :
+-------+ +------+ +------+ +------+ | 1000 | | 2000 | | 1200 | | 0000 | +-------+ +------+ +------+ +------+ ABCD | DATA | | DATA | | DATA | +------+ +------+ +------+ 1000 2000 1200
You can see the address of the following node in each node's first 4 bytes above. The address of FIRST is actually somewhere in your program and your program only needs to allocate storage for the 4 bytes it takes to hold the address of the initial node in the list. FIRST is, of course, the root node of the list.
You must store the value zero in there before you go off adding nodes, you'll see this reason why below in the code to add a node.
Adding a new node is simple, you allocate it on the heap, fill in the data part and add it to the front of the list. It is far easier to add a node at the start - address 1000 in the above example - than to have to work through the entire list to find the current end, and then add it there. This method takes longer and longer to carry out as you add extra entries to the list. Adding at the start of the list takes the same time regardless of how many entries are in the list.
As you add each node to the list, you copy the value in FIRST into the new node's NEXT pointer and put the address of the new node into FIRST. Sounds complicated, but here it is in code. If we assume that A0.L has the address of FIRST and that A1.L has the address of the node to be inserted into the list, as contrivingly demonstrated by these two lines :
Prelude lea FIRST,a0 ; Pointer to storage of first node address lea NewEntry,a1 ; Address of new node
Then adding a new node to a list is as simple as this :
AddNode move.l (a0),(a1) ; Save current first node in new node's NEXT area move.l a1,(a0) ; Store address of new node in FIRST storage area rts
Nothing to it. The new node is always added at the start of the list, so the value in FIRST always points to the most recently added node. As you need to have zero in the NEXT pointer of the final node in the list, you can see why it was important to initialise the value in your programs FIRST variable to zero before adding any nodes. If you didn't have zero, you'd never know when the list was finished.
One thing, you don't want to allow the user to add the root node to its own list at any time, so best change the above code to prevent this from happening.
AddNode cmpa.l a0,a1 ; Don't allow the root node to be added again beq.s AddExit ; Bale out quietly if attempted move.l (a0),(a1) ; Save current first node in new node's NEXT area move.l a1,(a0) ; Store address of new node in FIRST storage area AddExit rts
Another problem is when you try to add a node that is already there. So to be really careful, you could call the FindNode routine (coming soon - have patience !) prior to adding it in. However, as this scans the entire list until it finds or doesn't find the new node, it could add quite a lot of time to the simple exercise of adding a new node.
If you wrote the program, and you are allocating nodes on the heap each time, then don't bother attempting to find the node in the list before you add it.
Deleting a node is slightly more difficult. The node to be deleted could be anywhere in the list, or not even in the list. How to find the correct node is the main problem. However, for the same of argument, assume that we have the node address to be removed in A1.L and the address of FIRST in A0.L after a successful 'find' operation, then removing the node at A1.L requires that we must navigate the list, as in the following explanation.
We must navigate the list because we don't know where in memory the node prior to the one we wish to delete is. We need to find it, because it has a NEXT pointer holding the address of the 'deleted' node and this has to be changed or we lose everything in the list after the deleted node.
As ever, the value in the NEXT area of the very last node in the list is always zero. That way, we know when we have hit the end of the list. Here's the pseudo code to delete the node at A1.L from the list beginning at (A0.L)
If the node to be deleted is the root node (the list pointer in A0) then don't allow it to be deleted.
Start of the main loop.
If the value stored in the address that A0.L points to is equal to zero, we have been passed an incorect node address to delete. Exit from the loop with an error.
If the value stored in the address that A0.L points to is not the same as the value in A1.L then copy the value in the address that A0.L points to into A0 and restart the main loop. Basically we have replaced the address in A0 with the NEXT address from the node we were just looking at.
If the value stored in the address that A0.L points to is equal to the value in A1.L then we have found the node PRIOR to the node we wish to delete and so the node we are looking at has to have the NEXT address updated to bypass the node we wish to delete so that it now points to the NEXT address which is currently stored in the node we are deleting. Exit from the loop with no errors.
End of main loop.
That's the pseudo code, here's the actual code. Using the same preliminary stuff as above to sort out initial values of A0.L and A1.L and a little bit extra to show whether errors have been detected or not, we begin with this :
Prelude lea FIRST,a0 ; Pointer to storage of first node address lea OldNode,a1 ; Address of node to delete moveq #ERR_EF,d0 ; End of file = node not found = error
Now, here's the actual code to find and remove the requested node.
DelNode cmpa.l a0,a1 ; Don't allow the root node to be deleted beq.s DelExit ; Bale out with error if attempted DelLoop cmp.l #0,(a0) ; Reached the end yet ? beq.s DelExit ; Yes, node not found, exit with error cmp.l (a0),a1 ; Found the PRIOR node yet ? bne.s DelNext ; No, skip over the deletion code & try again DelFound move.l (a1),(a0) ; PRIOR node's NEXT = the deleted node's NEXT address moveq #0,d0 ; Node found and deleted ok bra.s DelExit ; Bale out with no errors DelNext move.l (a0),a0 ; A0 now holds the NEXT node in the list bra.s DelLoop ; Go around again DelExit tst.l d0 ; Set zero flag for success, unset for error rts
The above code returns with the Z flag set if the node was deleted from the list, and unset if the node was not in the list. This allows the calling code to handle and errors correctly.
The first thing you must do when deleting a node is to actually find it. The code above assumes that A1 holds a valid node address in the list defined by A0. Having said that, the code is robust enough to know that programmers make errors and it can handle the problem of a node address being passed which is not in the list by virtue of the fact that it scans the list until it finds the node prior to the one we wish to delete. It has to work that way because we need to adjust the NEXT pointer in the prior node to point past the deleted node to its NEXT node - if you catch my drift?
The code to find an node in a list is dependant on the sort of data stored in each node. If you store strings, the some form of string comparison routine needs to be built in - does it compare on an equality basis ('AAA' = 'AAA') or nearly equal basis ('AAA' == ' aaa') and so on. You can use the built in QDOSMSQ routines to do the comparisons.
If the data in the nodes are numbers (integers of word or long length) then you can compare them directly. If they are QDOSMSQ floating point format numbers, you can use the built in arithmetic routines to compare them. Regardless of which method is used, you need to write your own code to compare two nodes, or a node and a value so that the find routine knows when it has found the correct entry.
Of course, it is quite simple to build a FindNode routine which doesn't know or care what sort of data the individual nodes contain, provided it is passed the address of a routine which does know and care. If the specification for said routine requires the Z flag to be set for found and unset for not found, it could look something like the following peseudo code.
Assume that A0.L holds the address of FIRST, A1.L holds a pointer to a routine which compares the node with a given value and A2.L holds a pointer to that value. The data that A2.L points to can be anything, the routine at (A1.L) does the working out, our FindNode simply calls the routine once for each node in the list until such time as it gets a set Z flag on return. The comparison routine gets passed a node address in A3.L.
Start of the main loop.
If the value stored in the address that A0.L points to is equal to zero, we have not found a node with the desired value. Exit the main loop with a NOT FOUND error.
Copy the address at (A0.L) into A3 and call the routine to compare data. If it returns with the Z flag set, the address in (A0.L) is the address of the node prior to the node we were looking for, however, the address in A3.L is the address of our required node as it is taken from the NEXT pointer. Remember, we passed the NEXT address (A0.L) over to the routine, not the address of THIS node - A0.L. Exit from the loop with the Z flag set to indicate a found node.
Copy the NEXT address from the node we are looking at into A0.L and go back to the start of the loop.
End of main loop.
And here's the real code to do the finding for use. As ever, we start off with some contrived values.
Prelude lea FIRST,a0 ; Pointer to storage of first node address lea Compare,a1 ; Address of node comparison routine lea Required,a2 ; Address of the data we are looking for moveq #ERR_NF,d0 ; Node not found = error
Now, here's the actual code to find a node in the list which holds the required value.
FindNode cmp.l #0,(a0) ; Reached the end yet ? beq.s DelExit ; Yes, node not found, exit with error move.l (a0),a3 ; Fetch the NEXT node address into A3.L jsr (a1) ; And jump into the comparison routine beq.s FindFound ; Looks like we found our node FindNext move.l (a0),a0 ; A0 now holds the NEXT node in the list bra.s FindNode ; Go around again FindFound moveq #0,d0 ; Clear the error flag FindExit tst.l d0 ; Set zero flag for success, unset for error rts
The following is an example of a compare routine to look at a long word of data in the node at (A3.L) and see if it is equal to the long word of data stored at (A2.L). Don't forget, the comparison routine must preserve A0, A1, A2 and D0 or it will all go horribly wrong. The following routine does exactly that, by the simple method of not actually using those registers at all!
NData equ 4 ; Offset form start of node to the data part Compare cmp.l NData(A3),(A2) ; Is the data in the node = the value we want? rts ; Exit with Z set if so, unset otherwise.
If an attempt is made to 'find' the root node, then it will fail.
So there you have three short but extremely powerful routines which make linked lists possible. At this point I have to mention that there are actually routines built into QDOSMSQ to do exactly the same work as the AddNode and DelNode routines above, but there is nothing like FindNode - which is a shame. However, you now know how to build linked lists and add and delete nodes. You also know how to find an entry in a linked list so that you can process it in some way.
Putting all of the above together and tying in some extras to allocate nodes etc, here is a small, but perfectly formed program to create a linked list. The following is a wrapper that we shall use to demonstrate first the single linked lists as explained above. Later on, when other types of linked list are explained, we shall drop in only the code we need for the demo
* ======================================================================= * A test harness 'job' for our linked lists code. What's the point of all * the explanations if you can't test the code ? * * This code is simply a wrapper to allow different demos to be 'slotted' * in to demonstrate the real code in the chapter, as opposed to the job * code. * * The code being demonstrated is located at DEMO below. As new demos are * required, only that bit should (!) need changing. * ======================================================================= * ----------------------------------------------------------------------- * These are offsets from the start of the job's dataspace where working * variable are stored. The dataspace is held at (A4) in the job's code. * ----------------------------------------------------------------------- con_id equ 0 ; Id for title channel con_id2 equ 4 ; Id for main output * ----------------------------------------------------------------------- * These are simply user friendly names instead of numbers for various * bits and bobs, colours etc. * ----------------------------------------------------------------------- black equ 0 ; Colour code for mode 4 black red equ 2 ; Red green equ 4 ; Green white equ 7 ; White linefeed equ 10 ; Linefeed character oops equ -1 ; General error code for sub-routines err_nc equ -1 ; NOT COMPLETE error code * ----------------------------------------------------------------------- * Constants for use with job control commands. (It doesn't matter if I * have two names with the same value ! ) * ----------------------------------------------------------------------- infinite equ -1 ; Infinite timeout me equ -1 ; Id for 'this' job * ----------------------------------------------------------------------- * Code starts here. * ----------------------------------------------------------------------- start bra.s LinkList ; 2 bytes short jump dc.l 0 ; 4 bytes padding dc.w $4afb ; Job identifier must be at location 6 dc.w 11 ; Bytes in job's name dc.b 'LinkedLists',0 ; Bytes of job's name plus padding LinkList adda.l a6,a4 ; A4.L = start of dataspace bsr Mode4 ; Set the screen mode bsr Title ; Open the title window bsr Output ; Open the output window bsr Headings ; Display headings bsr Demo ; Do the demo code bsr Finished ; Advise user that we are done * ----------------------------------------------------------------------- * Code ends here. * ----------------------------------------------------------------------- all_done moveq #mt_frjob,d0 ; Force Remove a job moveq #me,d1 ; The current job move.l d0,d3 ; Any error code to send to Superbasic trap #1 ; Kill this job, its channels and its memory bra.s all_done ; Should never get here - sanity check !
The following code will be replaced by either the singly linked list or the doubly linked list demo code which follows at the end of this chapter. For now, however, it is a place holder.
* ----------------------------------------------------------------------- * The DEMO code starts here. * ----------------------------------------------------------------------- Demo rts * ----------------------------------------------------------------------- * The DEMO code ends here. * -----------------------------------------------------------------------
The following code is common to both demos.
* ----------------------------------------------------------------------- * Set mode 4 if not already set. Do not change from TV to monitor or * vice versa. We must preserve the display type if we reset the mode. * ----------------------------------------------------------------------- Mode4 moveq #mt_dmode,d0 moveq #-1,d1 ; Read current mode moveq #-1,d2 ; Read current display type trap #1 ; Do it tst.l d0 ; Did it work ? bne all_done ; No, bale out, cannot continue tst.b d1 ; 0 in D1.B = Mode 4 beq.s ModeExit ; No need to set mode 4 moveq #mt_dmode,d0 clr.l d1 ; We need mode 4 trap #1 ; Set mode 4 (d2 = display type) tst.l d0 ; Check it bne all_done ; Bale out if errors detected ModeExit rts ; Done. * ----------------------------------------------------------------------- * Mode 4 is current mode. Open the title window at the top of the screen. * ----------------------------------------------------------------------- Title lea con_def,a1 ; Window definition movea.w ut_con,a2 ; Utility to define a window etc jsr (a2) ; Do it tst.l d0 ; Did it work ok ? bne all_done ; No, exit program move.l a0,con_id(a4) ; Store console id for title channel rts ; Done *------------------------------------------------------------------------ * Definition for title window channel *------------------------------------------------------------------------ con_def dc.b red ; Border colour dc.b 1 ; Border width dc.b white ; Paper/strip colour dc.b black ; Ink colour dc.w 448 ; Width dc.w 24 ; Height dc.w 32 ; Start position x dc.w 16 ; Start position y * ----------------------------------------------------------------------- * Open the output window underneath the title one. * ----------------------------------------------------------------------- Output lea con_def2,a1 ; Output window definition movea.w ut_con,a2 ; Utility again jsr (a2) ; Do it tst.l d0 ; Did it work ? bne all_done ; No, exit routine move.l a0,con_id2(a4) ; Store console id for output channel moveq #0,d0 ; No errors detected rts *------------------------------------------------------------------------ * Definition for output window channel *------------------------------------------------------------------------ con_def2 dc.b red ; Border colour dc.b 1 ; Border width dc.b white ; Paper/strip colour dc.b black ; Ink colour dc.w 448 ; Width dc.w 200 ; Height dc.w 32 ; X org dc.w 40 ; Y org * ----------------------------------------------------------------------- * Print the headings * ----------------------------------------------------------------------- headings movea.l con_id(a4),a0 ; Title channel id bsr.s cls ; Clear screen lea mes_title,a1 ; Title string bsr.s prompt ; Print title string rts mes_title dc.w mes_end-mes_title-2 dc.b 'Single Linked Lists' mes_end equ * * ----------------------------------------------------------------------- * Sign off message * ----------------------------------------------------------------------- Finished movea.l con_id2(a4),a0 ; Title channel id lea end_title,a1 ; Title string bsr.s prompt ; Print title string bsr.s input ; Wait for ENTER rts end_title dc.w end_end-end_title-2 dc.b linefeed,linefeed,'Press ENTER to quit : ' end_end equ * * ======================================================================= * CLS : * ======================================================================= * 1. Clear the (screen) channel whose id is in A0. * ======================================================================= cls moveq #sd_clear,d0 ; CLS moveq #infinite,d3 ; Infinite timeout trap #3 ; CLS title window rts * ======================================================================= * Prompt : * ======================================================================= * 1. Print the string at (A1) to the channel in A0. * * Z set if all ok, unset if not. * ======================================================================= prompt movea.w ut_mtext,a2 ; Print a string utility jsr (a2) ; Print it tst.l d0 ; Check for errors rts * ======================================================================= * Input : * ======================================================================= * Wait for user input from the channel id in A0. * * Returns the input length (not counting the ENTER character) in D1.W * Returns the address of the first character in the buffer in A1.L * Preserves the channel id in A0.L * Z set if all ok, unset if not. * ======================================================================= input lea buffer+2,a1 ; Our buffer address plus 2 move.l a1,-(a7) ; Save it on the stack moveq #io_fline,d0 ; Input some bytes (inc linefeed) moveq #60,d2 ; Buffer size maximum moveq #infinite,d3 ; Inifinite timeout trap #3 move.l (a7)+,a1 ; Restore buffer pointer subq.w #1,d1 ; Subtract the linefeed character move.w d1,-2(a1) ; Store length in buffer tst.l d0 ; Did it all work ? rts buffer ds.w 31 ; 60 chars for input plus 1 word for size. * ======================================================================= * hex_l : * ======================================================================= * Convert a 4 byte value in D4.L to Hex in a buffer. Use the input * buffer for the output and DOES NOT store the length word ! * * Expects D4.L to hold the value. * ======================================================================= hex_l swap d4 ; $ABCD -> $CDAB in D4 bsr.s hex_w ; Convert the $AB part first swap d4 ; $CDAB -> $ABCD again * drop into hex_w to convert the $CD part * ======================================================================= * hex_w : * ======================================================================= * Convert a 2 byte value in D4.W to Hex in a buffer. * * Expects D4.W to hold the value. * Expects A1.L to point at the buffer. * ======================================================================= hex_w ror.w #8,d4 ; $DE -> $ED in D4 bsr.s hex_b ; Convert the $D part first rol.w #8,d4 ; $ED -> $DE again * drop into hex_b to convert the $E part * ======================================================================= * hex_b : * ======================================================================= * Convert a 1 byte value in D4.B to Hex in a buffer. * * Expects D4.B to hold the value. * Expects A1.L to point at the buffer. * ======================================================================= hex_b ror.b #4,d4 ; Swap lower and higer nibbles bsr.s hex_nibble ; Print high nibble first rol.b #4,d4 ; Swap back again * drop into hex_nibble to print the lower nibble * ======================================================================= * hex_nibble : * ======================================================================= * Convert a 4 bit value in D4.B to Hex in a buffer. * * Expects D4.B to hold the value. * Expects A1.L to point at the buffer. * ======================================================================= hex_nibble move.b d4,-(a7) ; Save value in both nibbles andi.b #$0f,d4 ; D4.B now = 0 to 15 addi.b #'0',d4 ; Now = '0' to '?' (ascii only) cmpi.b #'9',d4 ; Is this a digit ? bls.s nib_digit ; Yes addi.b #7,d4 ; Add offset to UPPERCASE letters nib_digit move.b d4,(a1)+ ; Store in buffer move.b (a7)+,d4 ; Restore original value rts * ======================================================================= * print_hex : * ======================================================================= * Convert D4 into 8 hex characters, then print it to the channel in A0.L * * Expects D4.L to hold the value. * Expects A0.L to hold the channel id. * ======================================================================= print_hex lea buffer,a1 ; Output buffer for address move.w #8,(a1)+ ; We know the result is 8 bytes bsr hex_l ; Convert 4 bytes to text lea buffer,a1 ; Text to print bsr prompt ; Print it rts ; All done. (Error code in D0) * ======================================================================= * End of test harness * =======================================================================
The above code does absolutely nothing, but if you assemble it and exec the resulting file, you should see a pair of windows one with a message 'Single Linked Lists' and a prompt in the other to 'Press ENTER to quit'. Once you press the ENTER key, the job will finish. So far so good.
The reason that it does nothing is shown below :
* ======================================================================= * The DEMO code starts here. * ======================================================================= Demo rts * ======================================================================= * The DEMO code ends here. * =======================================================================
The code at Demo, does nothing but return to the caller. Our linked list code will be slotted in to replace the lines of code shown above.
To demonstrate linked lists, we need only add some code to replace the lines above. In the following two chapters we do just that and code to demonstrate single and doubly linked lists follows in the next two chapters.
The above description, and code, is for a Single Linked List, so called because there is a single link in each node which points to the next entry in the list. This is simple to code up - as we have seen - and is fairly simple to understand, at least it is if I've done my job correctly.
The problem with a linked list created in the above fashion is that you always have to scan the list from start to some undetermined entry when you want to delete a node. And this can add serious delays to the processing of your application when a lot of nodes have to be traversed each time you need to delete one.
There is an answer, Doubly Linked Lists.