Possible names for how high-level osimplay is are "mid-level-language" (Randy Hyde), meta-assembler, or compembler. In terms of text-per-binary-opcode, it winds up outputting about as much object code as C. Osimplay is based on my asmacs assembly macros, and the version of osimplay documented here is on top of shasm, my infamous 386 assembler in GNU Bash. Something like osimplay should be easy to implement on top of dedicated assemblers, using something like m4, Forth or Lisp, in which case it's performance would be acceptable for large batch assembly jobs, which it currently isn't.
osimplay tries to bring some of the simplicity and elegance of Forth to native one-stack code without the performance/complexity disadvantages (on register machines) of a virtual machine that's fundamentally different than a register machine. As such, it emphasizes a virtual machine as it's conceptual continuity*, rather than a "language" or syntax. There are some "language constructs" though, which is one way osimplay isn't just an assembler. It's language-ness is very bottom-up though. It's still a set of features within an assembler, and it's syntax is inherited from whatever it's implemented in, the unix shell in the demo implementation. Like Forth, osimplay is trying to be a machine control system, not a "language". I'm starting to call osimplay commands "words" though, a Forth habit. osimplay "familial variables" also admit of Forth-like stack diagrams.
BCPL intermediate code does quite well on 3 registers. At this point I'm thinking the 386 as a whole, the 8 basic registers anyway, are constricted enough to be a useable "virtual machine" for most other CPUs. Other aspects of the 386 are more fumblesome, and I'm not familiar with other architectures, so portability is a back-burner thing. Well, side burner. osimplay code is strikingly generic-looking for assembly, and familial variables may help with portability. Familial variables are locals that come in three flavors; parent, self and child. This reflects the current stack frame, the parent (calling) routine's stack frame, and the stack frame of the most recently called child routine. An interesting byproduct of a 3-frame interface is that just coding as you would normally will in most cases implement copy-on-write parameter passing, which has the delightful acronym of COWPP, or "cow pee pee". COWPP is quite efficient, and is more or less an accident of the parent/self/child locals. This is the sort of synergy one also sees in Forth. BCPL does something similar, but in osimplay it's just what you'd do with familial variables anyway, not compiler elegance.
Most shasm and osimplay commands have thier own interactive help via [command] h, so I'll try to concentrate on non-specifics for a bit. Some of osimplay's features are pretty high-level, but some deliberate omissions remain. I have managed to avoid the temptation to do flow-control abstactions like IF/ELSE/WHILE/FOR... .(Actually I'm not even tempted.) I suspect those are treacherous vis-a-vis remaining an assembler. osimplay's when conditional branch compembler is pretty nifty though, and xrays are a flow-control abstraction distantly related to SWITCH/CASE.
osimplay doesn't have data types. Like BCPL and the ANS Standard for Forth, there is a cell concept, the size of a native machine pointer, but that's just a size, not a type in the C sense. ($cell is, BTW, how osimplay keeps track of what x86 mode it's in.) Enumerations are numbering's, the clump facility is a vestige of C structs, and there are strands, which are a very general type of array. Strands can also serve as strings, masked-index rings, and other goodies. Host functionality is available via a Linux word analagous to "sys" in BCPL and similar, with a big wad of defined Linux syscall names. Also like Forth, the osimplay overlay is not a lot of code not counting the underlying assembler. Linux is a bit more flexible than C parameter-passing.
# OSIMPLAY COMPEMBLER ]]]]] () ELF () { # build simple but complete ELF header osimplay () { # main(). e.g. osimplay <your source file> beam () { # Fill a range of an xray with a jump address cell () { # name/allot a cell-size storage location copyrange () { # plural, range-to-range copy, handles overlaps clump () { # C struct() kinda, data associations namer entrance () { # name/begin a reentrant procedure leave () { # return from the current reentrant procedure fill () { # plural, copies A across range @ DI -x86 STOSD flag () { # assert the zero/sign flags of a register's value enter () { # how you call an osimplay reentrant procedure heap () { # start uninititialized data sub-section maskbyte () { # mask arg down to a byte maskdual () { # mask arg down to it's lowest-significance 16 bits match () { # plural, range-range compare, zero flag=match max () { # BROKE inline, max bla zay to zay min () { # BROKE inline, min bla zay to zay numbering () { # C enum, but not strictly constants and no commas quad () { # name/allot a 4-byte storage location quadtohex () { # quad value to ASCII hexadecimal string range () { # name/allot some count-cell prefixed memory scan () { # plural, compare A to memory range @ DI until hit/miss strand () { # name/allot a general array and 8 cells of metadata sum () { # plural, additive checksum text () { # name/allot some text xjump () { # indexed jump into an xray execution array xray () { # name an execution array for beam, yarx and xjump xsum () { # plural, XOR-ing checksum yarx () { # finish compembling an xray and it's beams zero () { # simple convenience to set (a) register(s) to 0 # () # () # HOST OS DEPENDANT () Linux () { # syscalls, e.g. Linux $read, many available print () { # write all of a given range's net data to stdout regspew () { # raw binary register dump to stderr newline () { # write a (unix) newline to stdout
ELF uses 0 for the program load address. This makes things much simpler than using Linux's typical 0x0804040 or whatever it is, which appears to be purely a traditional holdover from SysV. argv and env still seem to work too.
In other words, osimplay doesn't have a proper linker. That's getting to be a problem already with something like the 6k bs command in the included demos, because osimplay is so slow, but I'm not sure of the appropriate solution yet.
entrance newnameThe name is affiliatied with the current assembly address. newname remains the current procedure being defined until the next entrance. A routine can be ended with a leave, or not. When an entrance does occur in the source, a stack-frame-rewinding return is assembled. There may be 0, 1 or more leaves, but most routines will have one. A routine assembled this way is properly called with enter.
enter newnameassembles a caller-hikes preamble to the actual call, and thus we have stack frame maintenance requiring one extra instruction over a machine call instruction, as in BCPL. We haven't passed strandtoken any parameters yet. We gave it a 16-cell stack frame, but there's currently no valid data in it, just leftover random bits. This is something I suspect may be fairly unique about osimplay. It is caller-hikes, callee-passes. This creates copy-on-write parameter passing, and is the result of osimplay's means of dealing with routine stack frames as three levels of local variable. osimplay provides macros to refer to cells in the parent, current, and most recently exited child routines as pre-named local variables. The current routine's locals are as, bs, cs.... s implies "self". The parent routine's locals can be referenced from the current routine as pa, pb, pc..., and similarly for the last child routine the current word go'ed, ala ca, cb, cc, cd...
The loss of naming flexibility isn't so bad for locals, which often get terse names anyway, and the flexibility created is notable. Accessing the locals of a child routine is a form of multiple return value. Accessing the locals of a parent routine can involve moving thier value to the A register, i.e. the accumulator, or not. If a parent value doesn't need to actually be moved into the current routine's frame to be of use, such as a condition test of it, there is a performance benefit. Values to be passed from parent to child through current must be moved though. Also, any naming annoyances in osimplay should be easily offset by the fact that it's just a script, and can be seasoned to taste in seconds.
All this talk about the internals of the language at this early point is a bit abnormal, but you wind up having to know all this nonsense to use something like C well anyway. I feel that attempts to abstract these things out of systems programming languages is not thier greatest success. Forth (on conventional hardware) gives you a simple virtual machine to deal with as directly as possible. osimplay tries to simplify by giving you a subset of your actual machine as directly as possible, hopefully with some portability side-benefits, while leaving all the specifics of your actual machine available right there in the rest of the assembler osimplay is implemented in/on. This is the case with just assuming that all stack frames are 16 cells. You can hand-roll whatever, but 15 locals is a lot, and all 15 slots cost a total of one instruction. Having said all that, you only need entrance when you need reentrance.
[your_shell_prompt]binary h Convert the binary representation of a number to decimal. Accepts it's single argument in multiple segments, like for bytes in a quad. binary 0101010011010011 Bash math is 32 bit.You can also use Bash's "type" facility, which is very nice. As in Forth, interactivity pays. Try things. Define something and see what happens. Assuming you have grep (or I got osimplay "scan" working), You can
grep "()" osimplayfor a list of all osimplay routines, as with shasm. A clump doesn't assemble anything, it just alters assembler state. That also happens to be your shell's variables, so you can
set | grep clumpnameto see what got defined and then
echo $whateverto see what it got defined to. There's also reading the script itself, the listing file and looking at the actual machine code in output with a binary editor. You can assemble instructions one at a time interactively if you set pass=2, and then read listing. You can also set $listing to stdout via /proc in the osimplay sourcefile.
when not zero some_branch_target_label_nameis a typical osimplay conditional branch. As usual, you have to be sure the flags are current yourself, using thier state from math and logical ops, or osimplay words like flag.
The x86 also has a set of words conditional in part on the C (ECX) register. These are separate instructions from the above, which in Intelese are the Jxx instructions. The Intel names for them are pretty bad. LOOP, for example, does forward branches just fine. These have also been folded into the when switches like
when C-1 branch_targetThere are also now some libc-like macros for the basic things you'd want to use LOOP and friends for. LOOP doesn't hide completely like REP though.
Nov 2001
I don't think reentrant subroutine frames are as important as I used to,
but they're still in osimplay if you need them, without getting in your
way.
osimplay currently has just a trace of high-level-ness at this point, in the sense of instructions assembled per keyword. rangewrite does a few instructions, Linux does a few, copyrange does 5 and loops over one of them, and so on.
All in all, osimplay is very competetive with HLL's like C for high-performance code in terms of performance. In terms of ease of use, osimplay, once again like Forth, may not do a lot for you, but it never gets in your way either, and you can extend it arbitrarily. This might be advantageous in a systems programming context. It bears mention also that a complete osimplay systems development system fits on one floppy. The shasm version is just a shell script. Putty in your hands. In fact, like Forth, writing your own compembler on top of an existing assembler is an over-the-weekend thing. HLL's are supposed to be 10 times more productive than assembly. I can only guess that that is versus some very bad assemblers, and allowing an inferior end-result in the comparison. Assemblers haven't changed much since, oh, 1965, but the CPUs they control have seen explosive improvement. An assembler that reflects the power of the CPU is called for. Just the transliteration of Intel non-mnemonics to readable names in asmacs/shasm was a big help.
As of Jan. 2002, osimplay can build several little Linux utils and an x86 bootsector. I'm saying "beta". Heck, I'll even say "useable".
* the crux of the biscuit