2. More Branches.

At the end of part 2, I left you with a promise that the DBcc instructions would be explained in this part, but just before we do that, there is the BSR instruction. This means 'Branch to Sub-Routine' and acts very much like GOSUB in SuperBasic (an instruction I have never used in SuperBasic, but use almost in every program in assembler - strange that.)

BSR comes in 2 sizes - byte and word. The format is :

BSR.S    label

or

BSR      label

Label is the destination of the subroutine to be executed. BSR is a PC relative instruction in that the destination is relative to the program counter - although it does not really look it.

The size of the instruction, byte or word, defines the size of the displacement from the PC of the following instruction to the address of label. This displacement is added to the PC and the next instruction executed is the one at that address (or PC + displacement). As the displacement is signed, the byte sized BSR can 'gosub' -128 to +127 bytes from the PC while the word sized BSR can 'gosub' -32,768 to +32,767 bytes from the PC.

At this point, a small example will maybe make things a bit clearer. Consider this chunk of (useless) code. It serves no useful purpose apart from showing the use of BSR (and a few of the other instructions we have already discussed.).

Read through the following code and at the end I shall explain what it is doing. The only instruction not yet explained is RTS which for now simply means 'Return To Sender' - similar to RETURN or END DEF (sort of) in SuperBasic.

Example 3.1. BSR Example

Start     MOVEQ     #0,D1

Again     BSR.S     Addon
          CMPI.L    #10,D1
          BNE.S     Again
          MOVEQ     #0,D0
          RTS

Addon     ADDQ.L    #1,D1
          RTS

The code starts by setting D1 to zero in all 32 bits - it is a long sized move. The label 'Start' simply identifies the start of the code fragment and need not be called start - it could be called fred. It acts like a line number in SuperBasic.

The second line of code calls a sub-routine called 'addon' which lives only a few bytes further on - for this reason the byte sized variant of BSR is used and this makes the program smaller and slightly quicker - as explained later. Had the distance to the sub-routine been more than 127 bytes (or less than -128) then the assembler would have complained and the source would have had to have been amended to remove the '.s' from the instruction.

The second line also has a label - 'Again'. Labels are used in assembler programs to mark significant places in the code. In SuperBasic every line must have a number - in assembler only those referenced in the code need have one, but there is no proble putting labels where it makes the code more readable.

Following on, there is a check to see if the value in D1.L is 10 (decimal) followed by a branch if not equal zero (BNE.S) to the label 'Again'. If the value in D1 is not 10 the Zero flag will not have been set and so the code will start executing from the label 'Again'. If D1.L does equal 10 then the branch to Again will be ignored.

The next line sets D0.L to zero. This is because any code that runs on a QL either as a result of a CALL address or EXECing a file returns any error codes to QDOS in D0.L and zero shows that no error has taken place. All this will be explained in a later article.

The RTS instruction ends a subroutine and means return to where you came from (almost). If the above code - beginning at 'Start' was called from SuperBasic, the RTS would return us to SuperBasic. If it was called from some other pasrt of the assembler program, it would return us to the next instruction in that program.

The subroutine called from the second line begins at the label 'Addon'. It is very simply and adds 1 to the value in D1.L before the RTS returns to the place where it was called from.

Put simply. The code above loops around adding 1 to D1.L until such time as D1.L equals 10. At this point the code returns to wherever it was called from.

This is not quite true. The RTS instruction returns back to the instruction that follows the BSR one. So the above code returns to execute the CMPI.L #10,D1 instruction after running the code in the Addon subroutine.

Now that we have a few more instructions under our belts, there will be more bits of code appearing in the rest of the series. This allows the reader to alleviate the boredom of these articles and allows me to illustrate some examples of what I am trying to say !

For D0 = 10 to -1 step -1 ...

Looks a bit like SuperBasic that, but you can do the very same in assembler as well. The above code illustrating the BSR instruction can be rewritten to use the DBcc or 'Decrement and Branch' instructions. These are very similar to the Bcc instructions from part 2 of the series but they have an additional purpose. They allow a loop to be executed a set number of times and also can cause an exit from the loop is a certain condition occurs while executing the loop.

It might be better if these instructions were called DBUcc as in 'Decrement and Branch UNTIL condition' because that is actually what they do. The full set of DBcc instructions is :

DBCC - Carry clear.
DBCS - Carry set.
DBEQ - Zero flag set.
DBF (or DBRA) - Branch false or always.
DBGE - Greater or equal.
DBGT - Greater than.
DBHI - Higher.
DBLE - Less or equal.
DBLS - Lower or same.
DBLT - Less than.
DBMI - Minus.
DBNE - Not equal (zero flag not set)
DBPL - Plus.
DBT - True. Very strange instruction, see below !
DBVC - Overflow clear.
DBVS - Overflow set.

The format of the instruction is :

DBcc      Dn,label

The counter is always a data register, D0 to D7, and only the lowest word is affected. The label is specified as a 16 bit displacement from the PC to the next instruction to be executed. The displacement is, as usual, signed allowing branches of between -32,767 and +32,768 bytes.

This instruction does not affect the condition codes. They remain the same as they were before the instruction.

The operation of the instruction is in three parts :

First, the condition is tested to determine if the termination condition of the loop has been detected. This is the cc part. So a DBCS checks to see if carry is set. If the condition is detected, no branch will be performed and no decrement of the data register will be carried out either.

Second, if the condition is not detected, the lowest 16 bits of the data register is decremented by 1. If this results in a value of -1, then the loop is also terminated and no branch takes place.

Third, the branch is taken to the label specified. (PC relative).

Another example :

Example 3.2. DBNE Example

Start     MOVEQ     #1000,D1
          MOVEQ     #0,D2
Loop      ADDQ.L    #1,D2
          CMPI.L    #100,D2
          DBNE      D1,Loop

More      More code here ...

D1.L is initialised with 1,000 and D2.L is set to zero. Then the start of the loop (at label 'Loop') where 1 is added to D2.L. Following the addition, D2 is checked to see if it equals 100. The DBNE instruction checks the zero flag and if not set - therefore D2 is not equal 100 - subtracts 1 from D1 and if this does not result in D1 becoming -1, branches to the label 'Loop' to go round again.

At the label 'More' how can you tell which of the two cases ended the loop ? As you know, the loop is ended when the condition is detected or the counter reaches -1 As the DBcc instructions do not change the flags you can make a simple check on the Zero flag or test D1 to see if it is -1 or not. So the code that goes in at label 'More' will be this :

More      BNE.S     Got_100
Not_100   :         Process D1 = -1 here
          :
Got_100   :         Process D1 = 100 here
          :

Obviously, if we run a loop 1001 times where D1 goes from 1000 to -1, adding 1 to D2 then at some point D2 must equal 100 and that will be the only termination of the loop. D1 will never get to -1.

There are two 'interesting' DBcc instructions. These are 'DBF' (Decrement and Branch Until False) and 'DBT' (Decrement and Branch Until True). What is so interesting about these two ?

DBF is commonly written as DBRA which is more meaningful as it implies that a decrement will be done followed by a branch. This is exactly what happens. The condition FALSE can never be created so the instruction always branches until the counter becomes -1.

DBT is the opposite. It never branches because the condition is always detected. I have never seen a DBT instruction used in any program I have read, written or disassembled.

Note that the loop is terminated when the counter becomes set to -1. This means that the above loop will have 1,001 iterations assuming that D2 never became 100. This can cause confusion to programmers used to processors that stop at zero. I learned on a Z80 (Sinclair ZX81) and there was a DJNZ instruction which subtracted 1 from the B register and branched if it was non zero.

To loop around 10 times you set B to 10 and just did it. On the 68000 series, you would set the counter to 9 not 10. Some programmers do this and others do it with the counter set to 10 but skip the first iteration. The following two examples are doing the same thing :

Example 3.3. Looping Example

Start     MOVEQ     #10,D0
          BRA.S     Skip
Loop      BSR       Useful_code
Skip      DBRA      D0,Loop

Example 3.4. Another Looping Example

Start     MOVEQ     #9,D0
Loop      BSR       Useful_code
          DBRA      D0,Loop

In example 1 the programmer sets the counter to the number of times the loop is to be executed but then skips over the loop code itself to the end of the loop. The counter is reduced to 9 and the loop is entered properly this time. The subroutine at label 'Useful_code' will be executed when the counter has values 9,8,7,6,5,4,3,2,1,0 or 10 times.

In example 2 the programmer sets the counter to 9 and then executes the code as normal. Once again the loop code at subroutine Useful_code will be executed 10 times once again, with the values 9,8,7,6,5,4,3,2,1 and 0 in the counter register D0.

Note

George Gwilt (the author of the GWASL assembler we are using in this series) points out that while the second example is better in terms of readability, there could be problems if the value in the counting register is zero. As George says, the method of subtracting one from the counter then dropping into the loop could lead to a loop that performs 65536 times rather than zero times - how can this be ?

Assume that this subroutine is called from another part of some program with the loop counter in D1.W :

Example 3.5. Potentially Bug-ridden Looping Example

loopy_bit   SUBQ.W  #1,D1
loop        BSR     do_something
            DBRA    D1,loop
            RTS

Obviously, the problem is only apparent when the loop counter is set by some calculation elsewhere in the program, not when setting it directly with immediate data as in my examples above.

Why would this fail, or more to the point, when ?

Imagine if D1.W was 1 then the above subroutine called, what would happen ? Well, remember how the DBcc instructions operate in three parts :

  • the condition, if any, is tested to see if it is true. In this case, the condition is ignored as the DBRA instruction will always loop (it has no condition to check).

  • the lowest word of D1 is decremented by one. Then tested to see if it is -1 yet. If it is, the loop is not taken and the RTS is executed

  • Third, If the counter register is not -1 then the loop is taken to the code at label 'loop'.

So, with D1 set to 1 on entry, the loop is carried out once with D1 adjusted to zero by the SUBQ.W #1,D1 instruction. The loop will then terminate. No worries here.

What happens if D1 was set to zero on entry ?

D1 would be set to -1 by the SUBQ.W instruction, then the code at 'do_something' would be executed - but we had a zero count so this is wrong straight away. On return, the condition test would be checked - but as there is no condition with DBRA, D1 would be decremented to -2. This does not equal -1 so the branch would be taken and taken again and again until D1 once more became -1. Then it would have been executed 65,536 times too many !

So beware. I can highly recommend the following code instead :

Example 3.6. Bug-fixed Looping Example

loopy_bit   BRA.S   skippy_bit
loop        BSR     do_something
skippy_bit  DBRA    D1,loop
            RTS

Which will always avoid the above problem. Now if D1 was zero, it will be decremented to -1 when it skips to the DBRA instruction and this will correctly terminate the loop without executing the code in the do_something sub-routine.

So keep in mind the fact that the loop stops when the counter reaches -1 and that the counter is decremented before testing for -1. Also bear in mind that George is a far better assemler programmer than I am - if he says something, believe it !!

Which is the best to use ? It's up to you. Sometimes I use the first forms and sometimes the second. As far as reading source code is concerned, I prefer the second method because you can write something like :

Start     MOVEQ     #10-1,D0
          :

Which at least shows better that the loop will be executed 10 times. Unfortunately, when you disassemble the above instruction the assembler has calculated that 10 - 1 is 9 and it has once again become :

Start     MOVEQ     #9,D0
          :

The first method, where the loop counter is initialised with the actual iteration count, then skips the loop loses out in that there is the extra BRA.S instruction which uses up 2 bytes every time it is used, and the BRA.S has to be executed as well as the jump - all of this takes time.