Wednesday, January 28, 2009

Signal Processing 101

Stumbling across the Old Computers site the other day I was pleasantly reminded of the era of the first wave of home computers which began, roughly, with the Commodore PET and ended, rudely, with the the IBM PC. (Rather astonishingly, this site even has a description of the Compukit UK101, a machine which came into my possession, in kit form, in 1981 or 82.)

A characteristic of these machines was that they used compact cassettes as mass storage. Although floppy disk drives were available (for the Apple II, for instance) their price made them unaffordable to most nerdy teenage boys on a pocket-money budget. Cassette tapes however were not something teenage boys, whether nerdy or not, were short of in those days, for reasons embarrassingly obvious to anyone who's read High Fidelity.

So, one day in 1997 the idea of building a software emulator of the UK101 occurred to me. The first step would be to attempt to recover various programs written and bought for it and naturally stored on aging cassette tapes. In those days, the best source of information on old computers was, of course, the USENET group alt.folklore.computers.My query for information about the Kansas City tape standard led to this thread. (Nowadays of course, one's first port of call is Google, and indeed without Google's USENET archive to prompt my memory, I doubt I would have remembered enough about this effort to write anything about it at all.)

The Kansas City tape standard specifies that 0s are encoded as 4 cycles of a 1200Hz sine-wave while 1s are encoded as 8 cycles of 2400Hz. Bytes are transmitted in little-endian order, framed by a single 0 start-bit and two 1 stop-bits. This gives a data-rate of 300 baud. (A subsequent hardware modification to the UK101 doubled the above frequencies with no loss of fidelity.)

Armed with a walkman, a jack-to-jack cable, a PC with a sound-capturing sound-card and a program such as wavrec, lossless WAV files corresponding to the audio tracks containing the original programs can be prepared relatively easily. Typically these are of CD quality: 16-bit samples at 44.1kHz.

With a theoretically-perfect 300 baud signal, each bit is encoded in 147 samples of the WAV file. (This is 44100/1200*4 or 44100/2400*8.) In other words, we'd expect the first and last samples to be of zero amplitude and to see 7 zero-crossings between them for a 0-bit. Observing a 1-bit we'd expect 15 zero-crossings. The crucial insight therefore (prompted by a line in a response to the original request on alt.folklore.computers) was simply to count the zero-crossings, in software rather than hardware.

Heartened by the fact that no FFTs were required, a small C program was written to test the theory out. In practice, however, it was discovered that cassette tapes can undergo a couple of types of damage which cause deviations from the ideal: `drop-outs' which cause a diminution of signal amplitude and creasing which tends to compress or stretch the tape, increasing or decreasing the frequency of the signal. The first made it harder to determine zero-transitions, while the second caused 1s to be mistaken for 0s and vice-versa.

(A more `complete' solution to the same problem can be found here. Judging by the quality of the output, its error-correction was much better than the primitive approach just described.)

For BASIC programs, which were in the majority, single (or even multiple) bit-errors didn't prove fatal since the program was saved in ASCII format and thus they could usually be spotted by eye. The relatively low bit-rate also helped to keep errors short. (Assembler programs or those saved in a custom binary format were less tractable unfortunately.) Where possible, errors were corrected manually, using Emacs as a binary editor.

A fragment of a successfully-recovered program is shown below. (The line-terminating NUL characters provide a delay, allowing the interpreter to scan the line it's just read --- the language is Microsoft BASIC.)
^M^@^@^@^@^@^@^@^@^@^@
0 PRINTCHR$(26):DIMD(70),B(70):Q=226:GOTO31^M^@^@^@^@^@^@^@^@^@^@
1 G$=STR$(G):L=LEN(G$)^M^@^@^@^@^@^@^@^@^@^@
2 FORI=1TOL:POKE53364-I,ASC(MID$(G$,L-I+1,1)):NEXTI^M^@^@^@^@^@^@^@^@^@^@
3 S=53463:BP=1:DP=1^M^@^@^@^@^@^@^@^@^@^@
4 FC=0^M^@^@^@^@^@^@^@^@^@^@
5 IFPEEK(S-65)=QTHENFC=FC+1^M^@^@^@^@^@^@^@^@^@^@
6 IFPEEK(S-64)=QTHENFC=FC+1^M^@^@^@^@^@^@^@^@^@^@
7 IFPEEK(S-63)=QTHENFC=FC+1^M^@^@^@^@^@^@^@^@^@^@
8 IFPEEK(S-1)=QTHENFC=FC+1^M^@^@^@^@^@^@^@^@^@^@
9 IFPEEK(S+1)=QTHENFC=FC+1^M^@^@^@^@^@^@^@^@^@^@
Ironically the program which produced this is now lost; however I remember a lot of tweaking to get it to produce such output, which probably caused considerable deviation from the idealised description above. Such highly-specialised (to be charitable) software is one-shot, indicative of craftsmanship rather than engineering, rather like bespoke tools used in restoration as opposed to industrial-strength engineering disciplines.

Incidentally this project provides a good example of the important lesson that the best is the enemy of the good. Without the insight that counting zero-crossings of the sample might be just good enough, and certainly simple enough to try, this project might never have been started in the first place, and I'd have saved several hours producing this pointless rambling. (Actually I've just been moved to wonder whether that saying, a favourite of my father's, is a quotation, and have discovered that it's Voltaire. Nice.)