In the prior blog post, I described a four-word program for the Bendix G-15 that would clear all of memory, the software-accessible registers, and the overflow and double-precision sign flip-flops. At the time, no source or object code for the program existed, but I was able to reconstruct a working version from a post by the late Jim Horning in his blog, "The Way it Was: Tales from a life in computing," and get it to run in my retro-g15 emulator.

That was three years ago, and a lot has happened in the G-15 community since then. Most importantly, there is actually a robust G-15 community now. When I started working on my emulator in 2021, there were few people who remembered the system, and fewer still who were interested in it. An earlier post in this blog describes how I knew about the G-15 and why I was interested in it.

Around the time I started working on the emulator, the System Source Computer Museum near Baltimore, Maryland (US) acquired two complete G-15 systems and a ton (literally) of documents, paper tapes, and other miscellanea for the system. Rob Kolstad has been curating the collection of documents and paper tapes in preparation for making the collection available publicly. One of the G-15s is currently on static display at the museum. Then in mid-2023, the museum shipped the older of its G-15s to David Lovett in Texas for restoration.

David is the creator of the Usagi Electric channel on YouTube, where he rotates his attention among various electronic and electro-mechanical restoration projects, releasing new videos approximately weekly. You can subscribe to early access for his videos on Patreon. There is also an active G-15 channel on the Usagi Electric Discord server.

David has done an amazing job with the restoration of this G-15 and has had the system working quite well since the end of 2024. His work has stimulated much of the recent growth of interest in the G-15. His current plan is to clean up some loose ends of the restoration, ship that system back to the museum later this year, and then receive their other G-15 for its restoration.

A Significant Find

All of the foregoing is background for the subject of this post. In February 2024, while plowing through the large collection of G-15 documents at System Source, Rob Kolstad encountered a one-page document titled "Example IV. Routine for Clearing Memory." Knowing of my prior blog post on the subject, he sent me a scanned PDF of the page.

That page does indeed describe the original four-word memory clear program, or at least one version of it. The document is not dated, but from its appearance was clearly prepared on a typewriter and is probably from the late 1950s or early '60s.

For the impatient, this original version of the program can be run by typing the following string of hexadecimal digits into the G-15 typewriter by using ENABLE fc7fq to preset the processor state and enable TYPE IN (see the "In Summary" section of the prior post for what those codes mean), or by preparing a paper-tape image from the digit string and loading it using ENABLE p.

060u35z08051vy8303yvxx040u778s

I found it interesting to compare the four words of G-15 machine language in this document with my reconstruction of it from 2022. It was gratifying to see that I had reconstructed the mechanism of the program entirely correctly. I had the same instructions and in the same order, but it was a shock to see how much more sophisticated the timing of this version was. In the prior post, I had written:

Once those locations were determined, the values for the N fields were obvious (well, except for that weird N=105 in the command that cleared AR—that came later), and the values for the T fields were straightforward to calculate from the command locations.

Well, yeah, determining the values for the N and T fields of the instructions was straightforward, if you didn't care how fast the program ran. The original version runs almost exactly twice as fast as my reconstructed one.

Analyzing the Timing Differences

To understand why the original version is significantly faster, let's analyze the two versions and their respective timing. To understand the following discussion, I strongly recommend that you first read the prior blog post, if you haven't done so already, as it discusses many details about the G-15 and my reconstructed version that will be touched on only lightly here. Most importantly, it discusses the G-15 instruction word formats and the technique of precession, a word you will see multiple times in the following.

Comparing the Code

My reconstructed version, after the initial precession that converts the program from its loaded form to its executing form, is this:

L	WT	Hex	T	N	CH	S	D	Description
0	0	060235z	6	2	0	26	31	INCREMENT AR by 3
1	0	02693vw	2	105	0	29	28	Set AR to 0 (not used here)
2	2	0403uz7	4	3	2	23	23	SWAP location 3 with AR
3	3	040237x*	4	2	0	27*	29*	CLEAR a line to zero
3	3	0000000	0	0	0	0	0	NOOP (copy line 0 to itself)

See the prior post for definitions of the instruction fields. This table has an extra column, though, labeled WT, the word-time at which the instruction executes. For the reconstructed version of the program, it is the same as L, the physical location of the word on line 23 of the drum, but that is not the case in the original version below, and as we will see, that makes all the difference.

The original version, after the initial precession into its executing form, is this:

L	WT	Hex	T	N	CH	S	D	Description
0	0	060u35z	6	10	0	26	31	INCREMENT AR by 3
1	0	82053vw	2	5	0	29	28	Set AR to 0 (not used here)
2	10	0w0zuz7	12	15	2	23	23	SWAP location 3 with AR
3	15	100u37x*	16	10	0	27*	29*	CLEAR a line to zero
3	15	0000000	0	0	0	0	0	NOOP (copy line 0 to itself)

The characters in red-boldface show the differences between the two versions. Note that the only differences are in the T and N fields of the instructions and in the word-times at which two of the instructions are executed. Those word-times are determined by the N fields of their preceding instructions in execution sequence, but note that they are congruent modulo-4 with the physical drum locations.

Also note, although it is not shown here, that all commands in my reconstructed version run in Immediate-mode. In the original version, the command that clears AR to zero (which is executed only once during initialization) is Deferred-mode and the other three are Immediate-mode. That one command being in Deferred-mode has no effect on the overall timing or behavior of the program. It could as well have been coded as an Immediate-mode command. For both versions that command is located at L=1 after the precession, but it was executed at word-time 0 before the precession.

The critical thing to appreciate about G-15 instruction timing is the difference between the physical drum location at which an instruction is stored (L) and the word-time at which that instruction is executed (WT). These programs run from line 23 on the drum, which is a 4-word line. You would think that an instruction addressed by the preceding instruction's N field of, say, 10, would be executed the next time that a physical location of 10 mod 4 (i.e., 2) passed under line 23's read head, but that's not how the G-15 works.

All timing for the drum is based on full 108-word drum cycles. If you want to execute an instruction on a four-word line at word-time 10, you have to wait for word #10 of 108 on the drum's circumference to rotate around to the read head, even though physical word #2 on that four-word line may pass the read head multiple times before that. The actual delay is determined by the difference between the N field of the prior instruction and word-time at which the prior instruction terminated its transfer state.

For example, if the prior instruction ended at word-time 4, you will need to wait for five word-times (5, 6, 7, 8, and 9, bypassing the desired physical word at word-time 6) until word-time 10 on the drum arrives at the read head. If, however, the prior instruction ended at word-time 36, you will need to wait for 81 word-times until word-time 10 arrives at the read head, bypassing the physical word you are trying to read at word-times 38, 42, 46, ... 102, 106, 2, and 6, for a total of 20 missed opportunities. Hence, how you code word-times in the T and N fields of instructions can have a big impact on timing.

I knew this when I was working on the reconstruction of the program three years ago, but I think it's fair to say that I didn't appreciate it very well. I appreciate it much better now, but I still don't think I could have come up with a solution as optimal as that of the original version of the program.

Comparing the Instruction Timing

To see how optimal the original version is, let's examine the detailed timing of both versions. The following table shows the timing for my reconstructed version:

L	WT	T	N	CH	S	D	Desc	RC	TR	WRC	TT
0	0	6	2	0	26	31	INCR	1	5	104	110
1	1	2	105	0	29	28	AR=0	1	1	102	104
2	2	4	3	2	23	23	SWAP	1	1	107	109
3	3	4	2	0	27*	29*	CLEAR	1	108	106	215
3	3	0	0	0	0	0	NOOP	1	104	0	105

The next table shows the timing for the original version of the program:

L	WT	T	N	CH	S	D	Desc	RC	WTR	TR	WRC	TT
0	0	6	10	0	26	31	INCR	1	0	5	4	10
1	1	2	5	0	29	28	AR=0	1	1	1	2	5
2	10	12	15	2	23	23	SWAP	1	0	1	3	5
3	15	16	10	0	27*	29*	CLEAR	1	0	108	102	211
3	15	0	0	0	0	0	NOOP	1	0	92	0	93

The left-most eight columns in both tables are the same as the ones with corresponding headings in the two tables of instructions presented earlier. The right-most five columns show the timing of the states that each instruction goes through:

RC (Read Command):
The number of word-times necessary to read a command and load it into the processor's internal registers. This is always one.
WTR (Wait to Transfer):
The number of word-times necessary to wait after a command is read until Transfer state (execution of the command) can begin. This is always zero for Immediate-mode commands and at least one for Deferred-mode commands.
TR (Transfer):
The number of word-times necessary to execute the instruction. This is always one for single-precision Deferred-mode commands and T–WT–1 for Immediate-mode commands. If the result of that calculation is less than one, add 108 to compensate for address wraparound on the drum.
WRC (Wait to Read Command):
The number of word-times necessary to wait after termination of Transfer state until the next instruction to be executed (at word-time N) arrives at the read head.
TT (Total Time):
The total number of word-times for the instruction (RC+WTR+TR+WRC).

Note that all of the Total Times for the original version are lower than the corresponding ones for my reconstructed version, often by quite a lot.

Calculating Total Run Times

Understanding the timing of the individual instructions is necessary, but to determine the total run time for each version, we next need to determine how long it takes to clear each line or destination. The two instructions that cleared AR to zero and precessed the words in line 23 into the form they clear the drum are executed only once at the beginning of the program, so they can be ignored for the moment. The remaining instructions are executed in a five-instruction loop—NOOP, INCR, SWAP, CLEAR, and SWAP again—with one iteration per drum line or destination. Referring to the Desc and TT columns of the two tables immediately above, the timing of the two versions for each line or destination is this:

Desc	Reconstructed Word-Times	Original Word-Times
NOOP	105	93
INCR	110	10
SWAP	109	5
CLEAR	215	211
SWAP	109	5
Destination Total	648	324

There are a total of 32 destinations (0-31), but the program clears only 30 of them. As discussed in the prior blog post, the destinations were cleared in the sequence 0, 3, 6, ... 20, 23, but after line 23 is cleared to zero, the program now consists of nothing but NOOP instructions, so the system continues to run, copying line 0 to line 0 until it is halted manually. Thus, the program never gets to the remaining destinations in the sequence, 26 and 29. As the prior blog explains, that doesn't matter, because destination 26 was cleared as a side-effect of clearing line 25 earlier. Line 29 is not a separate drum line, but a special destination that added the source value to AR, so skipping it simply avoids adding zero to zero, and thus has no effect.

To determine the total run time for the program, we therefore need to multiply the per-destination times by 30, which yields the values in the Drum Total row in the table below. In addition, we need to account for the two initial instructions:

Clear AR to zero requires 105 word-times for the reconstructed version and five word-times for the original version.
Precessing the program into the form that actually clears memory requires 114 word-times for the reconstructed version and 10 word-times for the original version.

Therefore, the total time that each version should run is this:

Desc	Reconstructed Word-Times	Original Word-Times
Destination Total	648	324
×30 = Drum Total	19,440	9,720
+ Initialization	219	15
Grand Total	19,659	9,735

Thus, the original version of the program clears the drum slightly more than twice as fast as my reconstructed version. As pleased with myself as I was after figuring out how to reconstruct the program and get it to work, this result is humbling.

Measuring Actual Run Times

How does this analysis compare to actual run-times for both versions? To determine that, I had to have a way to stop the program once it finished clearing everything rather than have it continuously execute NOOPs. Therefore, I modified a local copy of the retro-g15 emulator (v1.08) to detect when a command word of all zeroes (the NOOP instruction) was executed at word-time 0.

I also modified the Diagnostic Panel so that I could reset its word-time counter and run-time clock, and modified the Diagnostic Trace feature to write only one line per instruction, prefixed by the current word-time counter so that the actual instruction sequences could be examined and their timing accurately determined. Lightly annotated copies of the trace output are available from the project's GitHub software repository for the reconstructed and original versions.

I then ran both versions of the program several times in Firefox version 140.0.4 (64-bit) under Microsoft Windows 10. After loading the program from a paper-tape image file, I reset the word-time counter and run-time clock on the Diagnostic Panel, set the COMPUTE switch to GO, and waited until the system halted. Then I recorded the timings as shown on the Diagnostic Panel, yielding these overall results:

For the reconstructed version:

Word-times ranged between 19,775 and 19,871 with a mean of 19,832.
Run times ranged between 5.32 and 5.36 seconds.

For the original version:

Word-times ranged between 9,937 and 10,041 with a mean of 9,990.
Run times ranged between 2.67 and 2.70 seconds.

The average word-time counts are higher than the calculated counts above by roughly 150, but it turns out there's two reasons for that, as discussed below for the original version.

First, the program finishes by clearing line 23 (where the program is running) to zero, but the N field of the instruction that does that clear is 10, so the processor dutifully proceeds to execute the command at word-time 10.

Normally that would be the second SWAP command, which runs for five word-times and then branches to the NOOP at word-time 0 to start the next cycle. But now the word at word-time 10 has been cleared to zero, i.e., it's a NOOP with a T field of zero and an N field of zero. Thus that NOOP requires 98 word-times instead of the five required by the SWAP, for a net increase of 93 word-times.
That NOOP completes the 30th iteration of the clear loop. The temporary modification to halt the program that I put into the emulator detects a NOOP being executed from word-time 0, but this NOOP was executed from word-time 10. Thus, the program continues running, branching to the location in the N field of the NOOP, which is 0 and addresses another NOOP. The emulator detects that NOOP as the halt condition, but the halt does not take place until the end of the instruction, which requires an additional 108 word-times.
Thus, the final iteration of the original version requires 93+108=201 additional word-times.
A similar analysis for the reconstructed program shows that the final iteration requires an additional 105 word-times.

Second, when the program is started, the drum is at a random location, but loading the program from paper tape implicitly sets the starting word-time to 0. Therefore, there will be an initial WRC delay ranging from 0 to 107 word-times before the first instruction of the program can be read and executed. Thus, we should expect that the run times as measured in the emulator should fall into the following ranges:

For the reconstructed program: 19,659+105+0=19,764 to 19,659+105+107=19,871 word-times.
For the original program: 9,735+201+0=9,936 to 9735+201+107=10,043 word-times.

As shown above, the measured word-times do indeed fall within those calculated ranges.

In Conclusion

I am very pleased that Rob Kolstad found that one-page description of the original version of the program, and that it has given me the opportunity to analyze its performance and how it differs from my reconstructed version. I was particularly impressed by the way the initial precession that rotates the entire program by one word in line 23 was coded, and although it uses the same technique that my reconstructed version did, its T and N values yield a much more elegant solution to the problem.

The original version of the program is a tour de force in clever use of G-15 instructions and optimization of drum rotational latency. Frankly, I think it outshines anything that Mel was reported to have done.

The project's G15-software repository contains the materials for both versions, including loadable paper-tape images of the code, static disassemblies of the code, the aforementioned trace files, a text-only transcription of the page describing the original version, a script showing the PPR commands used to assemble the original version, and a spreadsheet detailing the instruction timing calculations.

I hope that the discussion here will prove to be useful to those interested in learning about the G-15 and the techniques that were used to squeeze a much performance as possible from a relatively slow, drum-based computer system. It's a shame that so much thought and effort had to be put into managing drum latency rather than being invested in addressing larger and more complex application problems. As Harry Huskey mentioned in his 1982 lecture on the G-15 (starting at 52:40 in the video), once magnetic core memory became viable, that spelled the end for drum-based memory and the programming techniques that it required.

From that point onward, programmers have been able to concentrate on thinking in straight lines rather than circles.

Retro Emulation

Monday, July 14, 2025

The REAL G-15 Four-word Memory Clear Program