Q L H A C K E R ' S J O U R N A L =========================================== Supporting All QL Programmers =========================================== #26 December 1996 The QL Hacker's Journal (QHJ) is published by Tim Swenson as a service to the QL Community. The QHJ is freely distributable. Past issues are available on disk, via e-mail, or via the Anon-FTP server, garbo.uwasa.fi. The QHJ is always on the look out for article submissions. QL Hacker's Journal c/o Tim Swenson 38725 Lexington St. #230 Fremont, CA 94536 swensontc@geocities.com http://www.geocities.com/SilconValley/Pines/5865/index.html Editor's forumn It's hard to believe that the last QHJ came out last May. What have I been doing? Well, let me tell you. Since May I have had a number of life changes that have kept me busy. The first is a change in jobs. I decided to leave the Air Force and seek employment else where. I spent a number of months looking through technical career newspapers and various technical job related web pages, looking for job openings. I found the San Jose Mercury News Talent Center to be about the best place to look, esp. for the SF Bay Area. Related to the first change, was leaving my job. I had to finish a few tasks and then document my job so I could pass it along to someone else. Documenting what you know is not as easy as it sounds. I also had to spend some time outprocessing from the service. It takes paperwork to get in the sevice, and it takes even more to get out. The final and biggest change was moving from Dayton, OH, back to the SF Bay Area. Getting the house ready for moving and getting it ready to sell took a while. I had to do some painting, replace a few doors (one cracked and one warped), patch some mortar on the brick outside of the house, and a few other household chores. This all left very little time for hobbies. About the only time I used the QL was writing cover letters and printing Resumes. And since the move my access to the Net, esp. USENET has been limited. I am waiting for my house to sell in Ohio, so I moved into an apartment. This meant that I had to put a number of household goods in storage. The movers did not do a good job of putting the right stuff in the right boxes so I could get what I needed off the moving van and put the stuff I did not need in storage. This meant that my QL is with me, but the disk drives, power supply, mouse, and modem cable are in storage. I've had to borrow disk drives, a PC power supply, and a QL power supply to get the QL up and running. I still have to make a modem cable. I'm using my Z88 for my telecomm. needs, and it's tough finding an Internet Service Provider that supports 8 lines of display (real tough). Once I get a modem cable built I should be able to read comp.sys.sinclair. Speaking of the Z88, most of this issue has been written on the Z88 while riding BART (the local commuter rail system) to work. I have about a 50 minute BART ride, so I have lots of time to put to good use. And also speaking of work, I am now working for a company in Berkeley called Project Technology. They were founded by Sally Schelor and Steve Melor, creators of the SM Object Oriented Analysis Method. My job is to maintain the Sun Unix boxes and the PC's. While I've been busy doing non-QHJ things, I noticed that no one sent me e-mail asking where the next QHJ issue was. I'm not too sure if this is a good sign or not. Granted it was nice not to be bugged, but then I have to wonder if the QHJ was missed. One thing you will notice with this issue is the number of articles with no code. I have not had the time to sit and code at the QL, so I've written some articles and covered what code was necessary with pseudo-code. Well, that's about enough for me. Oh, since I have just moved, please note the new snail mail address, but don't write it down in ink. I hope to buy a house sometime around the March or April '97 timeframe. Here now the newsletter. Exclusive OR Encryption I've always been interested in encryption. Keeping my files safe from prying eyes has been more of a want than a need. Plus encryption is a neat programming problem to solve. Many years ago I wrote a probram called QL Crypt that was my first look at encryption. In QHJ #XX there was Complex Ascii Rotation (CAR) that was aimed at encrypting mail messages just enough to make them secure from casual observers. There are many other ways to encrypt files, each with it's own level of safety. Encryption is based on two parts, the Method and the Key. The Method is what various computations are perfomed to get from the clear text to the encrypted text. This is equivalent to a lock. The Key is the chunk of data used to make one encryption different than an other. Since the encryption Method does not change, it is the Key that makes your text encrypted different from somebody elses. This is the equivilent to, well, a key. A specific model of lock is manufactured into a thousands of individual locks. These locks all look and work the same. It is the key that makes each one secure and different from the others. There are many methods used in encryption, from the very easy to break, to the damn near impossible. The harder to break, the more computation necessary to encrypt. If you are worried about wasting computational cycles, then you need only implement the Method that secures the information to the level you need it. Securing a Christmas gift list is different than securing company trade secrets. QL Crypt and CAR both used a character rotation Method for encryption. As each character was read in, a value of 1-4 would be added to their chracter value (CHR$), based on the Key, and then output to the resultant file. QL Crypt allowed the encryption of binary files, CAR stayed with pure ASCII text so that it could be sent in e-mail. Each one of these Methods, and many more, require the use of two functions that are the opposite of each other. In character rotation, a value would be added to enrypt, and subtracted to decrypt. What ever gyrations you go through to encrypt you must reverse to decrypt. Exclusive OR encryption does not have two opposite functions because Exclusive OR is the oppostie of itself. Exclusive OR (XOR) Bit 1 Bit 2 XOR 0 0 0 1 0 1 0 1 1 1 1 0 When using Exclusive OR with a bit pattern, what you XOR it with is usually called the Mask. To show you how XOR is the opposite of itself let take a look at the binary pattern 010110 XORed with the mask 111111. Bit Mask XOR Bit Mask XOR 0 1 1 1 1 0 1 1 0 0 1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 Notice that after XORing the bit pattern with the mask and then XORing the resultant bit pattern with the mask the original bit pattern returns. This means that writing the program to implement XOR encryption does not require the writing of an encryption routine and a decryption routine, only one is XOR routine is needed. The Mask that is used in the XOR routine is derived from the Key. How secure you data is, is dependent on the Key and its length. If you use a Key of length one (1 byte) then it would take only 256 tries to break the encryption. The longer the Key, the more tries necessary to break the encryption. QL Crypt used the random number table in the QL as the key. A password was entered from the user, which then was used as the sed value for the random number table. This makes for very strong encryption (as the random number table is fairly large and makes a long Key), but it make it impossible to port to other platforms. Even differences in QL ROMs could cause the program to fail. CAR used a ASCII password entered by the user. This makes the program very portable, but also makes it a weaker form of encryption. If the user typed in a fairly long password, then the level of secureness would go up. Constructing a Spell Checker A spell checker is usually comprised of two parts: 1) word lookup (to see if a word is spelled correctly) and 2) word suggestion (to suggest the correct spelling of the word). Past issues of the QHJ have looked at different algorithms to tell how close two words are, a key part of word suggestion. This article will focus on word lookup. The key thing to decide in creating the word lookup algorithm is the data structure for storing the words and quickly looking them up. If the word list was fairly short, a brute force method would work. Since most spell checkers will need a word list in the tens of thousands, the lookup algorithm will need to be smarter. We also need to keep in mind that the words will be of many different lengths. At first the most obvious data structure would be a tree structure. A word would walk down the tree structure letter by letter. When it reached the end of its length it would check the current tree node to see if it is a valid word. Let's take a look at three words, bar, bard, and barr, and the following tree structure. b / \ a b / \ r m / \ d n With the word BAR, the B is valid, with leads to an A which is valid and it leads to an R which is valid. The R node will have a value of 1 to signify that it is the end of a valid word. This way the structure can parse both BAR and BARN and distinguise between the two. When parsing BARR, the B is fine, which leads to A, which leads to R, but now there is no R path in the tree and the word is determined to be invalid. The problem with this data structure is two fold: one, you need to construct it out of the dictionary file at run time, which can take some time, or you need to find a way to store it so it can be read in easily. The second problems is that the language we are going to contruct the spell checker in is SuperBasic, which does not easily support making tree structures. They are easily created with C structures or Pascal records. We could use a hashing algorithm since it is designed for very quick look up, but with a very static list of words, our hashing algorithm may require more dataspace than we really need. We need to come up withh some data structure that is tailored to our needs. One that will provide a fairly quick look up and minimize on the data space needed to store the word list. Here is a suggestion: Store the words in a flat array. The words will be pre-sorted on disk, first by the length of the word and then alphabetically. This means that all of the two letter words will be grouped together and sorted alphabetically, then the three letter words, etc. Word length is one way to distinquish one word from an other. Create a two-dimensional array called start_array(x,y). The X value will be LENGTH and the Y value will be FIRST_CHAR. As the words are read in, the array will be used to keep track of where the first 2 letter starts in the array, where the first three letter word starts, and so on. It will also keep track of where words start by the first letter. When you need to do a lookup of the word BAR, LENGTH is 3, FIRST_CHAR is equal to B, so you would look up start_array(3,'B'). this will return where the first 3 letter word that starts with B is stored in the word array. From there the search can be a simple brute force search that compares all three letter B words to see if they match BAR. To determine where the search should end, you will also need to know where the first three letter C word is at. This can also be looked up in the start_array. Below is a little pseudo code showing how this would work. start = start_array(3,'B') stop = start_array(3,'C') FOR x = start TO stop IF word_array$(x) = BAR THEN EXIT success NEXT x EXIT fail