Q L H A C K E R ' S J O U R N A L =========================================== Supporting All QL Programmers =========================================== #27 January 1998 The QL Hacker's Journal (QHJ) is published by Tim Swenson as a service to the QL Community. The QHJ is freely distributable. Past issues are available on disk, via e-mail, or via the Anon-FTP server, garbo.uwasa.fi. The QHJ is always on the look out for article submissions. QL Hacker's Journal c/o Tim Swenson 38725 Lexington St. #230 Fremont, CA 94536 swensontc@geocities.com http://www.geocities.com/SilconValley/Pines/5865/ EDITORS' FORUMN The QHJ is back. After a year of taking a break, I'm back in the programming spirit again. Of course, I have not been inactive during that time, as any reader of QL Today can attest. I just have not felt like writing any programs for a while. I guess I did get burnt out a bit. Now we'll see how long before I get burnt out again. Having recently purchased Qliberator, I have found its manual similar to the original QL manual, full of information, but kind of hard to find without reading the whole manual. I'm all for reading the whole manual, but sometimes it takes a while to figure out exactly how to apply what you are reading. I sometimes like manuals that are more "If you want to do this, this is how to do it." From this thought, came the idea for the "Qlib Source Book", which will be something similar to the "Z88 Source Book". The Z88 Source Book was a collection of existing knowledge about the Z88. Most of the Z88 Source Book came from older published sources. With Qlib, there does not seem to be a wealth of published material helping the beginning Qlib user. So, time to send out a query and ask for material. If you are an experienced Qlib user and have a few tricks that you would like to pass on, please send them to me (either hard copy, disk or e-mail). If you are a beginning Qlib user and you have questions that you would like to see answered, send them too. Since I do not have the knowledge to really do the subject well, I will play the role of editor. I'll collect the different submissions and put them in an organized document. The Qlib Source Book will be Freeware in its electronic form. Like the Z88 Source Book, a hard copy version will probably be available at minimal cost. With the Z88 Source Book the price of the book covered the cost of production and a small profit to FWD Computing. It was my way of supporting the primary US QL dealer. Through the QHJ and QL Today I'll keep QLers informed of my progress. I've already volunteered Dilwyn Jones to help in writing some parts. Dilwyn has a number of years of experience with Qliberator and producing commercial software. So, here is the long awaited next issue of the QHJ. Feel free to send any comments, complaints, articles, large denomination bills, etc. Enjoy. REGULAR EXPRESSIONS In all the years that I've been dealing with Unix, one of the things that I have not taken the time to really learn is Regular Expressions. Regular expressions are based on a mini-language used for pattern matching in a number of Unix search utilities. The most well known of these programs is grep and its variations fgrep and egrep. The term 'grep' is even derived from the words 'regular expression'. No matter what operating system you have used, you have probably run across a regular expression. Most operating systems have a way of understanding something like this; "dir *.txt". In MS-DOS this means to list all files that end with a .txt extension. In QDOS, the equivilent phrase would be "wdir flp1__txt". The asterisk or star, "*", is a wild card and means to match all strings. The asterisk is really a metacharacter. Metacharacters are special characters that mean different things in the regular expression language. More experienced users of MS-DOS may have used something like this; "dir *.e??". This means to match all files that start with a .e in the extension. It will match .exe, .efs, .exx, and others. The question mark is a metacharacter that means to match any character of length one. So what does all this means to QDOS users? Well, a version of grep has been ported to the QL and comes with the C68 distribution. Grep is a very powerful and popular utility that can fill a number of needs. It is used to extract lines of text from files, but with its handling of regular expressions, it can be very smart on what it extracts. Once you know how grep works and how to use it, you will probably remember a time when it would have been useful to you. With grep, you can do two things with its output, it can go to standard output or you can redirect it to a file. Since the QL does not have standard output, the QL version of grep opens a window to display its results. it also supports file direction. This means that you can send the output of grep to a file to be dealt with later. To demonstrate the file redirection, lets take a look at a short grep example. In this example we have a text file and we want to find all lines that have the word QL in them: exec flp1_grep;"ql flp1_file_in > flp1_file_out" Since we are using arguements, we have to put them in quotes after the grep command. The results of the grep will now be in th file flp1_file_out. Before we go to far, let's talk about three major concepts in regular expressions: characters, metacharacters, and character classes. A character is basically a byte, be it a text byte or binary byte. Metacharacters are a set of characters that are part of the regular expression language. In the examples above, the asterisk is a metacharacter. A character class is a way of matching a group of characters. Let's take a look at the metacharacters: A character matches itself. Any character or string of characters are taken as literals. If you want to find the string "ing" in a file you would use the regular expression "ing". Most of the times when I am using grep, I use only literal characters. A dot (.) matches any character, but only 1 character, similar to the question mark in MS-DOS. If you want to find a word in a text file that has three letters, starts with a B and ends with D, then you would use the regular expression B.D (grep is case sensitive. Upper case lettering has only been used to highlight the example.). The caret (^) means the beginning of a line. If you want to find all lines that start with the word "The", you would use the regular expression "^The". The dollar sign ($) means the end of a line. If you want to find all lines that end with the word "end", you would use the regular expression "end$". The question mark (?) is used to match an optional character. If you wanted to find the word "color" but don't know if the British spelling "colour" is used, the regular expression "colo?r" would work. The ? means optional. The plus (+) is used to match one or more items. If you want to find the words helper or helps, but not just help, you would use the regular expression "help+". The plus must match at least one character or it will fail. The asterisk (*) is used like +, but it allows a null match. To find the words helper, helps and help, the regular expression "help*" would work. The asterisk allows for no character, as in the case of just help. To get a little more power out of regular expressions, there is a metacharacter for the logical OR, the pipe symbol (|). Say you have a text file with a bunch of e-mail messages and you want to find all of the From and Subject lines, you would use the regular expression "From|Subject". Now that you know how to use the OR metacharacter, you will find that you need to limit the OR. That's were the parentheses () come in. Using the last example of finding the From and Subject lines from e-mail messages, using the regular expression "From|Subject" will also find lies with either word in them. With e-mails, the From in the From line is always followed by a colon; "From:". The same goes for Subject. Now how do we write a regular expression for this? One way is this: "From:|Subject:". This will work, but a "cleaner" approach is this: "(From|Subject):". Since AND's are assumed in regular expressions, what you get is this "( From OR Subject ) AND :". Just like in math, the parentheses control the bounds of the OR condition. The backslash (\) is used to make a metacharacter a literal. If you want to look for all lines that end will a full sentence, meaning they end with a period, you could use the following regular expression: ".$". But, since the period is a metacharacter you will find all lines that end with a character. To get grep to use the period as a period, you need to use the backslash like this; "\.$". The backslash tells grep to take the next character as a literal and not to interpret it. Character classes are used as a way to search for groups of characters. Say you wanted to match the numbers less than 4. You could do this with "(1|2|3)". Using the brackets, you could also create a character class "[123]". The true power of the character class comes when using the period. The period means to create a range of characters (Metacharacters mean something else when in a character class). In the last example, the character class could also be written as "[1.3]", meaning all characters from 1 to 3. To define the letters of the alphabet the character class would be "[a.z]". Since grep is case sensitive, a better character class would be "[a.zA.Z]". You can mix up characters in a character class any way you like. Say you have to find all occurances of numberical dates in a file. Dates could be defined as 7-23-97, or 7/23/97, or even 7.23.97. You want to find any dates with a dash, slash, or period. You would create the character class "[-/.]". Remember that the period means only itself when inside a character class and does not mean to match a single character. So to find our dates, we would use the regular expression "7[-/.]23[-/.]97". The caret (^) means something else when used in a character class; it means to negate the class. If you want to match anything but numbers, you would create the character class "[^0.9]". The caret works to negate when it is immediately used after the opening bracket. If it is used after that it only means itself. The character class "[-.^]" matches only a dash, period, or caret. If you are interested in learning more, check out the book "Mastering Regular Expressions" by Jeffery Friedl. END-OF-FILE FINDING A lot of the programs that I like to write are filters. They take a text file as input, do something to the file, and output the results to a second file. Doing this involves reading a file one line at a time. A way of doing this would be something like this: REPeat loop INPUT #4,in$ IF EOF(#4) THEN EXIT loop PRINT in$ END REPeat loop This algorithm will work, except that it will not output the last line. When I first tried this, I could not figure out why the last line was not being output. It was all based on how I saw the program being executed. I thought that the INPUT statement would read in the end-of-file (EOF) marker and then do a compare. What is really happening is that the last line is read in, then the EOF check is made. Since the file pointer advanced after reading in the last string, it is now pointing at the EOF marker. When the EOF check is done, it returns TRUE and the EXIT loop is done. A better example would be this: REPeat loop IF EOF(#4) THEN EXIT loop INPUT #4,in$ PRINT in$ END REPeat loop This will print out the last line of the file. But, this algorithm also has its faults. It assumes that there is an end-of-line (EOL) marker at the end of the last line. If there was not EOL and only the EOF, an error would occur reading in the last line. A better routine would read in each character and put the line together while constantly checking for an EOF. Here is an example: DEF PROCedure read_line in$="" REPeat loop IF EOF(#4) THEN EXIT loop byte$ = INKEY$(#4,-1) in$ = in$ & byte$ END REPeat loop RETURN in$ END DEF read_line It would be used like this: next_line$ = read_line If using Qliberator, you can use the Q_ERR function to locate EOF. Q_ERR can only trap for EOF after the fact. You keep reading through the file until you get an EOF error, which is trapped by Q_ERR. This means that you would check for Q_ERR/EOF after an INPUT statement. An example is: Q_ERR_ON "INPUT" REPEAT loop INPUT #4,in$ IF Q_ERR = -10 THEN EXIT loop PRINT in$ END REPEAT loop Q_ERR_OFF BACKGROUND PROGRAMS Back in the hey-days of MS-DOS, before MS-Windows, there was a neat type of program called "Terminate & Stay Ready" (TSR). The program could be loaded up at boot time, remain in memory while other programs were running, and could be called up at any time. The program would stay in the background until a funny key sequence was typed in, then it would pop-up in front of the current program and be ready to do something. Sidekick was the first popular program to do this. Since MS-DOS could not multitask, how this was done is still a mystery to me. In the QDOS world, where multitasking is a reality, a program like this is fairly easy to do. Since SuperBasic will not multitask, the end program has to be compiled in some way. For this article, I'll use Qliberator to compile SuperBasic. A background job is designed to be hidden and not appear until it needs to. This means that the program will not immediately open any windows and only open them when necessary. When compiling this with Qliberator, be sure to turn the WINDS option off. The program will open it's own windows. If you have WINDS turned on, the program will execute, but you will need to do a CTRL-C to get back to QDOS. If anybody knows exactly what I'm doing wrong, please let me know. 100 job = Q_MYJOB 110 QP job,128 120 x = KEYROW(7) 130 IF x = 20 THEN 140 BEEP 1000,10 150 OPEN #3,con_50x50a100x100_32 160 PAPER #3,0: INK #3,2: BORDER #3,4,2: CLS #3 170 PRINT #3,"Hello" 180 x$ = INKEY$(#3,-1) 190 CLOSE #3 200 END IF 210 GO TO 120 MICROEMACS LINE NUMBERING I've been meaning to tinker around with MicroEmacs macros for some time, but never got around to it. Recently I decided to take the time to really give it a try. Of all of the text editors available for the QL, I think MicroEmacs is the most powerful. It's macro language is the most robust of the editors. Both QED and ED have macros that can automate keystroke commands, but they don't have any logic (IF..THEN) or structure ( WHILE ) features. MicroEmacs has looping and logic controls. As an example, I thought that a line number macro would be nice. The following macro goes to the beginning of the file and starts putting line numbers on each line. Before it does this it queries you for a starting line number, which are are incremented in 10's. To determine when to stop processing, I had to know when the macro had reached the end of the file. Since there is no end-of-file checking mechanism, I had to move to the end of the file and get the line number of the last line. This was then used in the while loop. If there are lots of empty lines at the bottom of the file, there macro will number them also. A check could be put in the see if the current line is empty, but this would not work if a line had only white space in it ( tabs and/or spaces). I noticed two differences between the execution of MicroEmacs and ED/QED macros. One, ED/QED macros are kind of slow and take a while to run. MicroEmacs macros are very fast. Total run time for this macro in an 20 line routine was about 1-2 seconds. Two, when executing ED/QED macros you can see what is going on as it happens. The screen updates with each command. With MicroEmacs, the screen seems to update only at the end of the macro. When the macro went to the bottom of the file and then returned to the top, I thought it would display the movement, but it did not. If you do want to update the dislay while a macro is executing, there is a redraw screen command that you can use. The documentation for the MicroEmacs macros is good in documenting the different commands, but it falls short of providing many examples. I used other macros that came with MicroEmacs to learn from. This can slow down the learning process, but there is no other alternative. In some ways I use this same technique in other languages. I keep bits of code around so I don't have to memorize how to do a routine in a particular language, I just go though my old code. ; Line Numbering Macro set %line_num @"Starting Line Number? " end-of-file set %tot_lines $curline ;LET tot_lines=line number @ EOF beginning-of-file !while &less $curline %tot_lines beginning-of-line insert-string %line_num insert-string " " set %line_num &add %line_num 10;LET line_num=line_num+10 next-line !endwhile beginning-of-file ADDING CONFIG BLOCKS TO QLIB PROGRAMS BasConfig is a utility, written by Oliver Fink, that creates config blocks for Qliberator compiled programs. For those that don't know, config blocks are extras chucks of data added to programs that are changeable by the user, using the program "config". In other words, if you have a program and you want the user to be able to change the size of the programs window, you can put the variables for the window size in a config block and let the user configure anytime they want. Config blocks are part of the executable and do not interfere with the running of the program. The 'config' program knows where in the executable the config block is and knows how to change it. Another way of looking at the config block is as an object that has some data that is used by your program and is separate from your program. In fact, until your program is compiled, the config block is a separate file from your SuperBasic program. This block is accessable from both your program and the "config" program. BasConfig creates a file that has the config block and some SuperBasic extensions that allow the program access to the block. These extensions need only be LRESPRed when you are developing your program. They can be compiled into your program and become part of the executable. Before you use BasConfig, you need to define what type of data you want the user to be able to change. There are 7 different data types that are allowed in config blocks: String Long Word Word Byte Select Code Char BasConfig does not support the Long Word or Select data types. I don't have any documentation on config, so I can't say exactly what the difference is between the types other than what is obvious. To access the data in the config block, there is a function for each data type supported by BasConfig: C_STR$(n) - String C_WORD(n) - Word C_BYTE(n) - Byte C_CODE(n) - Code C_CHAR(n) - Char The functions return the Nth data type in the config block. If you have two CHAR's and one STRING data types in the config block and you wanted to get the second CHAR, you would do something like this: $var = C_CHAR(2) If your config block does not have a CHAR data type, you should be back some sort of error (I have not tested this). To learn how all of this works, I created a SuperBasic program that opens a window and displays the contents of the two BYTE data items in a config block. The example code is: 100 REMark $$asmb=ram1_test_cfg,0,10 110 EXT_FN "C_BYTE" 120 OPEN #3,scr_100x100a50x50 130 PAPER #3,0: INK #3,4: CLS #3 140 item1 = C_BYTE(1) 150 item2 = C_BYTE(2) 160 PRINT #3,"Item #1 = ";item1 170 PRINT #3,"Item #2 = ";item2 180 PAUSE 500 190 CLOSE #3 Note the $$asmb directive that links the config block into the program. It is BasConfig that creates this block, which includes the 5 functions to access the config block. The EXT_FN command tells Qliberator that the references to C_BYTE will be resolved at link time. To create the config block, exec basconfig_obj. The program will ask you for how many different config items you want. For this example, I entered 2. Next you are asked to enter the name of your final program and its version number. This is used by the "config" program to let the user know exactly what program they are configuring. These to items can not be changed by the user. Now the program will query you for the data types for the first data item. You can scrolll through the data types by hitting the left arrow key. I scrolled over to "Byte" and hit return. Since each data type is different the next few questions will be different for each data type. In the base of the "Byte" data type the items were: Initial value, Minimum value, & Maximum value. The Min and Max values give you control of the changes the user can make, so that a "bad" configuration can't be made. For this example, I gave the first item a initial value of 10 and the second a value of 20. Once I answered all of the questions for the second config item, the program asked for a file to store the config block. It looks like the convention for config block file name extensions is _cfg. Now, the documentation for BasConfig is very sparse. I had to figure out how to get the data out of the config block by reading the source code for BasConfig. So, I have only done just enough to get a fair idea of what is going on and how to get it to work.