osimplay

PAGE DATE

july 2001/Jan 2002

keywords

osimplay COWPP parent/self/child pre-named locals shasm compembler

overview

osimplay is shell "functions" implementing a "compembler" somewhere between a compiler and an assembler for a generic one-stack machine with re-thought opcode names varying in terseness from "=" to "loadmachinestatusword".

Possible names for how high-level osimplay is are "mid-level-language" (Randy Hyde), meta-assembler, or compembler. In terms of text-per-binary-opcode, it winds up outputting about as much object code as C. Osimplay is based on my asmacs assembly macros, and the version of osimplay documented here is on top of shasm, my infamous 386 assembler in GNU Bash. Something like osimplay should be easy to implement on top of dedicated assemblers, using something like m4, Forth or Lisp, in which case it's performance would be acceptable for large batch assembly jobs, which it currently isn't.

osimplay tries to bring some of the simplicity and elegance of Forth to native one-stack code without the performance/complexity disadvantages (on register machines) of a virtual machine that's fundamentally different than a register machine. As such, it emphasizes a virtual machine as it's conceptual continuity*, rather than a "language" or syntax. There are some "language constructs" though, which is one way osimplay isn't just an assembler. It's language-ness is very bottom-up though. It's still a set of features within an assembler, and it's syntax is inherited from whatever it's implemented in, the unix shell in the demo implementation. Like Forth, osimplay is trying to be a machine control system, not a "language". I'm starting to call osimplay commands "words" though, a Forth habit. osimplay "familial variables" also admit of Forth-like stack diagrams.

BCPL intermediate code does quite well on 3 registers. At this point I'm thinking the 386 as a whole, the 8 basic registers anyway, are constricted enough to be a useable "virtual machine" for most other CPUs. Other aspects of the 386 are more fumblesome, and I'm not familiar with other architectures, so portability is a back-burner thing. Well, side burner. osimplay code is strikingly generic-looking for assembly, and familial variables may help with portability. Familial variables are locals that come in three flavors; parent, self and child. This reflects the current stack frame, the parent (calling) routine's stack frame, and the stack frame of the most recently called child routine. An interesting byproduct of a 3-frame interface is that just coding as you would normally will in most cases implement copy-on-write parameter passing, which has the delightful acronym of COWPP, or "cow pee pee". COWPP is quite efficient, and is more or less an accident of the parent/self/child locals. This is the sort of synergy one also sees in Forth. BCPL does something similar, but in osimplay it's just what you'd do with familial variables anyway, not compiler elegance.

Most shasm and osimplay commands have thier own interactive help via [command] h, so I'll try to concentrate on non-specifics for a bit. Some of osimplay's features are pretty high-level, but some deliberate omissions remain. I have managed to avoid the temptation to do flow-control abstactions like IF/ELSE/WHILE/FOR... .(Actually I'm not even tempted.) I suspect those are treacherous vis-a-vis remaining an assembler. osimplay's when conditional branch compembler is pretty nifty though, and xrays are a flow-control abstraction distantly related to SWITCH/CASE.

osimplay doesn't have data types. Like BCPL and the ANS Standard for Forth, there is a cell concept, the size of a native machine pointer, but that's just a size, not a type in the C sense. ($cell is, BTW, how osimplay keeps track of what x86 mode it's in.) Enumerations are numbering's, the clump facility is a vestige of C structs, and there are strands, which are a very general type of array. Strands can also serve as strings, masked-index rings, and other goodies. Host functionality is available via a Linux word analagous to "sys" in BCPL and similar, with a big wad of defined Linux syscall names. Also like Forth, the osimplay overlay is not a lot of code not counting the underlying assembler. Linux is a bit more flexible than C parameter-passing.

features

Once you have a working base, you enter the "The Joy of Coding" phase. This can be a bad thing. Some of the side-effects of that are listed below.

# OSIMPLAY COMPEMBLER                                     ]]]]]   ()
ELF             () { # build simple but complete ELF header
osimplay        () { # main(). e.g. osimplay <your source file>
beam            () { # Fill a range of an xray with a jump address
cell            () { # name/allot a cell-size storage location
copyrange       () { # plural, range-to-range copy, handles overlaps
clump           () { # C struct() kinda, data associations namer
entrance        () { # name/begin a reentrant procedure
leave           () { # return from the current reentrant procedure
fill            () { # plural, copies A across range @ DI       -x86 STOSD
flag            () { # assert the zero/sign flags of a register's value
enter           () { # how you call an osimplay reentrant procedure
heap            () { # start uninititialized data sub-section
maskbyte        () { # mask arg down to a byte
maskdual        () { # mask arg down to it's lowest-significance 16 bits
match           () { # plural, range-range compare, zero flag=match
max             () { # BROKE  inline, max bla zay  to zay
min             () { # BROKE    inline, min bla zay   to zay
numbering       () { # C enum, but not strictly constants and no commas
quad            () { # name/allot a 4-byte storage location
quadtohex       () { # quad value to ASCII hexadecimal string
range           () { # name/allot some count-cell prefixed memory
scan            () { # plural, compare A to memory range @ DI until hit/miss
strand          () { # name/allot a general array and 8 cells of metadata
sum             () { # plural, additive checksum
text            () { # name/allot some text
xjump           () { # indexed jump into an xray execution array
xray            () { # name an execution array for beam, yarx and xjump
xsum            () { # plural, XOR-ing checksum
yarx            () { # finish compembling an xray and it's beams
zero            () { # simple convenience to set (a) register(s) to 0
                                                                # ()
                                                                # ()
# HOST OS DEPENDANT     ()
Linux           () { # syscalls, e.g.  Linux $read, many available
print           () { # write all of a given range's net data to stdout
regspew         () { # raw binary register dump to stderr
newline         () { # write a (unix) newline to stdout

ELF

ELF is used at the beginning of an assembly to make an ELF executable command. It makes one loader segment that is r/w/x, and supports a .bss-like subsection via the heap directive. .bss/heap is for uninitialized data to not take up storage space in the program storage image.

ELF uses 0 for the program load address. This makes things much simpler than using Linux's typical 0x0804040 or whatever it is, which appears to be purely a traditional holdover from SysV. argv and env still seem to work too.

In other words, osimplay doesn't have a proper linker. That's getting to be a problem already with something like the 6k bs command in the included demos, because osimplay is so slow, but I'm not sure of the appropriate solution yet.

plurals

The x86 direction flag and single-instruction looping constructs using the opcode prefix called REP in Intel-ese has been wrapped into several osimplay "plurals". It appears that only a few such composites or macros do hide the REP completely and do also do everything you can do with REP. If so this a nice little portability coup, and probably has some bizarre relationship with the similar Forth words.

clumps

C structs create name trees, i.e. hierarchical names, for data. So do clumps. That's all clumps do though. Structs have other niceties. A C name like process.buffer.count.thingy is process_buffer_count_thingy in osimplay. Or maybe that's really process->buffer.count.thingy in C. I Dono, and I don't care :o)))) C unions, of course, make no sense at all in assembly.

entrance/enter/leave

The facilities for defining and handling osimplay reentrant procedures are somewhat distinctive. The assembler state is informed that a new routine is being assembled with the entrance word, as follows

	entrance newname

 
The name is affiliatied with the current assembly address. newname remains the current procedure being defined until the next entrance. A routine can be ended with a leave, or not. When an entrance does occur in the source, a stack-frame-rewinding return is assembled. There may be 0, 1 or more leaves, but most routines will have one. A routine assembled this way is properly called with enter.

	enter newname
assembles a caller-hikes preamble to the actual call, and thus we have stack frame maintenance requiring one extra instruction over a machine call instruction, as in BCPL. We haven't passed strandtoken any parameters yet. We gave it a 16-cell stack frame, but there's currently no valid data in it, just leftover random bits. This is something I suspect may be fairly unique about osimplay. It is caller-hikes, callee-passes. This creates copy-on-write parameter passing, and is the result of osimplay's means of dealing with routine stack frames as three levels of local variable. osimplay provides macros to refer to cells in the parent, current, and most recently exited child routines as pre-named local variables. The current routine's locals are as, bs, cs.... s implies "self". The parent routine's locals can be referenced from the current routine as pa, pb, pc..., and similarly for the last child routine the current word go'ed, ala ca, cb, cc, cd...

The loss of naming flexibility isn't so bad for locals, which often get terse names anyway, and the flexibility created is notable. Accessing the locals of a child routine is a form of multiple return value. Accessing the locals of a parent routine can involve moving thier value to the A register, i.e. the accumulator, or not. If a parent value doesn't need to actually be moved into the current routine's frame to be of use, such as a condition test of it, there is a performance benefit. Values to be passed from parent to child through current must be moved though. Also, any naming annoyances in osimplay should be easily offset by the fact that it's just a script, and can be seasoned to taste in seconds.

All this talk about the internals of the language at this early point is a bit abnormal, but you wind up having to know all this nonsense to use something like C well anyway. I feel that attempts to abstract these things out of systems programming languages is not thier greatest success. Forth (on conventional hardware) gives you a simple virtual machine to deal with as directly as possible. osimplay tries to simplify by giving you a subset of your actual machine as directly as possible, hopefully with some portability side-benefits, while leaving all the specifics of your actual machine available right there in the rest of the assembler osimplay is implemented in/on. This is the case with just assuming that all stack frames are 16 cells. You can hand-roll whatever, but 15 locals is a lot, and all 15 slots cost a total of one instruction. Having said all that, you only need entrance when you need reentrance.

use

osimplay features are at all times merely optional addenda to the regular assembler, shasm in this case. Thus you have all the usual data declaration facilities and directives of assembly before osimplay enters the picture. See the shasm help also. Like shasm, every osimplay command is a shell command. If you use osimplay/shasm sourced into your shell state, i.e. interactively, you can for example do


	[your_shell_prompt]binary h


Convert the binary representation of a number to decimal. Accepts
it's single argument in multiple segments, like for bytes in a quad.

                        binary 0101010011010011

Bash math is 32 bit.

 
You can also use Bash's "type" facility, which is very nice. As in Forth, interactivity pays. Try things. Define something and see what happens. Assuming you have grep (or I got osimplay "scan" working), You can

	grep "()" osimplay 

 
for a list of all osimplay routines, as with shasm. A clump doesn't assemble anything, it just alters assembler state. That also happens to be your shell's variables, so you can

	set | grep clumpname 

to see what got defined and then

	 echo $whatever 

 
to see what it got defined to. There's also reading the script itself, the listing file and looking at the actual machine code in output with a binary editor. You can assemble instructions one at a time interactively if you set pass=2, and then read listing. You can also set $listing to stdout via /proc in the osimplay sourcefile.

conditionals

Randy Hyde's High Level Assembler does flow-control structures like IF-THEN-ELSE. osimplay doesn't. Those things are a lot of work, and thier meanings are not standardized. I suspect they also tend to compromize performance. osimplay remains just an assembler in this regard, but otherwise offers the xray execution array facility. For normal conditional branches based on the FLAGS register, the various forms have been exploded in osimplay into a mini-parser. The x86 has flags for zero, sign and so on, and those names can be combined with either IF or when, and not, into allowable flags and combinations. This happens to be trivial to implement on x86 because it reflects how the opcodes are constructed. For example,

	when not zero 		some_branch_target_label_name

is a typical osimplay conditional branch. As usual, you have to be sure the flags are current yourself, using thier state from math and logical ops, or osimplay words like flag.

The x86 also has a set of words conditional in part on the C (ECX) register. These are separate instructions from the above, which in Intelese are the Jxx instructions. The Intel names for them are pretty bad. LOOP, for example, does forward branches just fine. These have also been folded into the when switches like


		when C-1 branch_target 

There are also now some libc-like macros for the basic things you'd want to use LOOP and friends for. LOOP doesn't hide completely like REP though.

strands

strands embody several ideas I've wanted to pursue for array-like data structures that bear some discussion independant of thier osimplay-ness. Strands implement nested metadata. If you have a simple data structure like a count-cell-prefixed "string", and you add metadata in an organized way, you have certain generality and reuseability. This is the reason for the strand header format. Code that wants to deal with strings needs to know about the first metadata cell before the string. If code knows about the next preceding metadata cell, the "ply" cell, it can treat the strand in question like an array of ply N. Or a string. If the metadata cells are in a consistant format there is only one data structure definer needed for a wide variety of uses, with a slight space-consumption hit. The fields already decided upon in a strand prefix are, proceding toward lower addresses from the nominal address of the strand instance; size in bytes, ply, mask. "mask" is for ring buffers. Other possible cell reservations would be indexes, pointers to affiliated strands, and so on, which would allow stacks, dequeues, list elements, and so on with maximal code reuse.

performance

The performance of shasm/osimplay is ludicrously bad. shasm is about two orders of magnitude slower than gas, or about one order of magnitude slower than gcc. Checkmate leaves no weaknesses, however. The code osimplay produces is as good as you. A gas/m4/osimplay would also be another matter. The osimplay enter mechanism seems quite good. leave uses the x86's RET imm16 instruction, which does a stack frame drop in parallel with a return, so an entrance frame fixup is free. The copy-on-write parameter-passing aspect of words appears to also be a win. (x86 Intel "LEAVE" is "frame" in osimplay, in case you want to write Pascal.)

versus HLL's

I think the most valuable thing a one-stack compiler does is subroutine frame management. I haven't done an algebraic expression parser, and probably won't. Fortran-style assignments don't thrill me at all, and one can easily convert such expressions to sequential instructions with the simple rule of "innermost first", which you had to do in your mind anyway to write the formulaic form. In osimplay for many formulae you'll need more that just A, so use a couple locals. You won't have Forth's RPN "stack-dancing", nor will you have the performance hit of a strict stack machine model. I personally feel that stack machines are actually superior to register machines, but my PC is a register machine.

Nov 2001
I don't think reentrant subroutine frames are as important as I used to, but they're still in osimplay if you need them, without getting in your way.

osimplay currently has just a trace of high-level-ness at this point, in the sense of instructions assembled per keyword. rangewrite does a few instructions, Linux does a few, copyrange does 5 and loops over one of them, and so on.

All in all, osimplay is very competetive with HLL's like C for high-performance code in terms of performance. In terms of ease of use, osimplay, once again like Forth, may not do a lot for you, but it never gets in your way either, and you can extend it arbitrarily. This might be advantageous in a systems programming context. It bears mention also that a complete osimplay systems development system fits on one floppy. The shasm version is just a shell script. Putty in your hands. In fact, like Forth, writing your own compembler on top of an existing assembler is an over-the-weekend thing. HLL's are supposed to be 10 times more productive than assembly. I can only guess that that is versus some very bad assemblers, and allowing an inferior end-result in the comparison. Assemblers haven't changed much since, oh, 1965, but the CPUs they control have seen explosive improvement. An assembler that reflects the power of the CPU is called for. Just the transliteration of Intel non-mnemonics to readable names in asmacs/shasm was a big help.

As of Jan. 2002, osimplay can build several little Linux utils and an x86 bootsector. I'm saying "beta". Heck, I'll even say "useable".

author

Rick Hohensee
rickh@capaccess.org

 

* the crux of the biscuit