(     Title:  Convert among raw fp unformatted hex string,
              raw fp data stack, and IEEE 754 formats              
       File:  rawhexfloat.fs
  Test file:  rawhexfloat-test.fs
   Log file:  mixfloat.log
    Version:  0.9.6
    Revised:  January 20, 2021
     Author:  David N. Williams
    License:  LGPL

We believe any of this code derived from other authors to be either in
the public domain or otherwise compatible with the LGPL.  For the sake
of the LGPL, the rest is
)
\ Copyright (C) 2003, 2005, 2009, 2020, 2021 David N. Williams
(
This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or at
your option any later version.

This library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
General Public License for more details.

If you take advantage of the option in the LGPL to put a particular
version of this library under the GPL, the author[s] would regard it as
polite if you would put any direct modifications under the LGPL as well,
and include this paragraph in your license notice.  A direct
modification is one that enhances or extends the library in line with
its original concept, as opposed to developing a distinct application or
library which might use it.
)

\ ***OVERVIEW
(
This library assumes a machine floating point memory representation
corresponding to an IEEE 754 binary32, binary64, binary80, or
binary128 format. The machine storage format is assumed to be bit-wise
contiguous in memory, all little-endian or all big-endian, and with the
same endianess as machine integers.

This library was formerly part of a larger library in mixfloat.fs,
which aimed to provide access to the bits of IEEE 754 binary
floating-point formats.

The central mxifloat format was a Forth text string version of the IEEE
754 hex-fp format, with a mantissa of hex digits and a 2's exponent of
decimal digits.  There were direct conversions between this and what we
called "raw" representations for common choices of the precision and
maximum exponent parameters in the IEEE 754 binary32, binary64,
binary80, and binary128 formats.

There were further conversions between each of the four raw formats and
a corresponding unformatted hex string format, and between the host
machine format and the appropriate one of the four raw or unformatted
string formats.

The conversions between raw and unformatted hex string formats are
relatively simple.  They depend only on the respective storage widths,
32, 64, 80, or 124 bits, and are agnostic about sign, exponent, and
mantissa layouts.  They require no floating point calculations.

The conversion between the machine format and the appropriate one of the
raw or unformatted string formats is also relatively simple.  It does
depend on the layout, but only implicitly, via F@ and F!.

This library covers conversions among the raw, the unformatted hex
string, and the host machine formats, and omits the mixfloat format.  It
also does fetches and stores of the raw formats between memory and the
data stack.

Here are more details about the three covered formats.

RAW BINARY CELLS:  A faithful representation of any of the four IEEE 754
  binary formats which organizes them into sufficiently many contiguous
  Forth cells on the data stack or in memory, with the most significant
  cell on the top of the stack or lower in memory.  Unused bits are
  padded by zeroes in the most significant part of the most significant
  cell, conforming with the Forth convention for narrow data on the
  stack.

  When the number of bits per cell versus the number of bits needed for
  the native fp representation requires two cells, this is the usual
  significance ordering style for double numbers in Forth.  For more
  than two cells, it extends that style.

  This format facilitates multicell operations such as shifting, and is
  moderately useful for input and display.

UNFORMATTED HEX STRING:  A faithful representation of any of the four
  IEEE 754 binary formats as a string of characters containing hex
  digits, ordered from the most to the least significant digit.  For an
  IEEE storage width of x bits, the corresponding hex string contains
  exactly x/4 hex characters, plus blank characters.  An input string
  may contain any number of leading, embedded, or trailing blanks.  An
  output string is formatted in 4-digit, blank separated groups.

  This format is intended for easy input and display of unformatted
  binary formats.

MACHINE STORAGE:  Assumed to correspond to the indicated storage width
  for an IEEE 754 binary32, binary64, binary80, or binary128 format. 
  Assumed to be bit-wise contiguous in memory, all little-endian or all
  big-endian, and with the same endianess as machine integers.

The test file rawhexfloat-test.fs contains syntax examples.

This library uses both the integer and floating-point stacks.

Upon inspection, neither here nor in rawfloat-test.fs do we see
words with mixed data/float inputs, outputs, or intermediate
calculations.  Everything should work whether or not the
floating-point stack is separate from the data stack, and has
been checked with both options in pfe.  See the section
"ENVIRONMENT" for how to select an integrated stack in pfe.

This library refuses to load if cells are neither 32 nor 64 bits, and it
works with either case.

It also refuses to load if chars are not 8 bits, and if 8 bits is not
the address unit.

It eventually refuses to load if the default float format is not 32, 64,
80, or 128 bits.

There is an environmental dependence on lower case.

Notation:
"s" or "something.s" in stack comments stands for an ANS Forth string
    [addr len] pair.
)
BASE @


\ *** PUBLIC WORDS

\ SIGNATURE
\   RAW-HEX-FLOAT

\ USER INTERFACE
\   raw!  raw@  raw-drop
\   hex>raw  raw>hex  raw>f  f>raw  hex>f  f>hex

\ ENVIRONMENTAL CONSTANTS
\   BITS/AU  BITS/CELL  HEX-DIGS/D   
\   BITS/FLOAT  LITTLE-ENDIAN?  FLITTLE-ENDIAN?  (from rawfloat.fs)

\ UTILITY
\   lshift>ud  (qlshift)
\   -spaces    hex>ud   ?hex>ud   hex>uq  ?hex>uq
\   qfaligned  qfalign  qfcreate  qfpad
 
\ HEX/RAW CONVERSIONS
\   hex8>raw32  hex16>raw64  hex20>raw80  hex32>raw128
\   raw32>hex8  raw64>hex16  raw80>hex20  raw128>hex32

\ RAW FETCH/STORE
\   raw32!  raw64!  raw80!  raw128!
\   raw32@  raw64@  raw80@  raw128@

\ RAW/FLOAT CONVERSIONS
\   raw32>f  raw64>f  raw80>f  raw128>f
\   f>raw32  f>raw64  f>raw80  f>raw128


\ *** SIGNATURE

: RAW-HEX-FLOAT ;


\ *** ENVIRONMENTAL CONSTANTS

decimal

[UNDEFINED] BITS/AU [IF]
s" ADDRESS-UNIT-BITS" ENVIRONMENT? 0=
[IF] cr .( ***Can't determine ADDRESS-UNIT-BITS.) ABORT [THEN]
CONSTANT BITS/AU
[THEN]

bits/au 1 chars * #8 <> [IF] cr
  .( ***1 CHARS is not 8 bits.) ABORT [THEN]

bits/au #8 <> [IF]
  cr .( ***bits/au is ) bits/au . .( bits, not 8 ) ABORT [THEN]

[UNDEFINED] BITS/CELL [IF]
bits/au 1 cells * CONSTANT BITS/CELL
[THEN]

bits/cell #32 <> bits/cell #64 <> and [IF]
  cr .( ***bits/cell is ) bits/cell . .( bits, not 32 or 64 ) ABORT [THEN]

bits/cell 2/ CONSTANT HEX-DIGS/D

\ *** UTILITY

: lshift>ud  ( u +n -- ud' )
(
Assume +n <= BITS/CELL.  Logical shift u to the left by +n bits into an
unsigned double number.
)
  LOCALS| n u |
  n 0=          IF u 0 EXIT THEN
  n bits/cell = IF 0 u EXIT THEN
  u n lshift u bits/cell n - rshift ;

: (qlshift)  ( q +n -- q' )
(
Assume +n <= BITS/CELL.  Logical shift the quadruple q to the left by +n
bits.
)
  LOCALS| n q0 q1 q2 q3 |
  n bits/cell = IF 0 ELSE q0 n lshift THEN ( q0'.hi)
  q1 n lshift>ud ( q0'.hi q1'.hi q0'.lo)
  rot + ( q0') to q0
  ( q1'.hi) q2 n lshift>ud ( q1'.hi q2'.hi q1'.lo)
  rot + ( q1') to q1
  ( q2'.hi) q3 n lshift>ud ( q2'.hi q3'.hi q2'.lo)
  rot + q1 q0 ;

: -spaces  ( s -- s' )
(
Remove all blanks from the ANS Forth string s and leave the compacted
string s'.  The new string body address is at PAD.  This works even if
the input string s starts somewhere in PAD, but then it may get
partially written over.
)
  pad >r
  BEGIN
    ( len) dup 0>
  WHILE
    over c@ bl <>
    IF over c@ r@ c! r> 1+ >r THEN
    1 /string
  REPEAT 2drop
  pad r> pad - ;

: hex>ud  ( s -- ud flag ) 
(
If the string contains only hexadecimal digits with no sign, or is
empty, leave true and the unsigned double number conversion [0 if there
are no digits].  Otherwise leave an undefined ud result and false.
)
  base @ >r hex
  0 0 2swap >number ( ud rest.s)
  nip 0=   r> base ! ;

: ?hex>ud  ( s -- ud )  hex>ud 0= ABORT"  ***Not a hex string" ;

: hex>uq  ( s -- uq flag ) 
(
If the string contains only hexadecimal digits with no sign, or is
empty, leave true and the unsigned quadruple number conversion [0 if
there are no digits].  Otherwise leave an undefined uq result and false.
)
  ( len) dup hex-digs/d <=
  IF over 0 2swap ELSE hex-digs/d - 2dup + hex-digs/d THEN ( hi.s lo.s)
  LOCALS| lo.len lo.addr hi.len hi.addr |
  base @ >r hex
  0 0 lo.addr lo.len >number
  IF
    ( lo.ud addr) 0 false
  ELSE  
    ( lo.ud addr) drop 0 0 hi.addr hi.len >number
    ( lo.ud hi.ud addr' len') nip 0=
  THEN  
  r> base ! ;

: ?hex>uq  ( s -- uq )  hex>uq 0= ABORT" ***Not a hex string" ;


\ *** HEX/RAW CONVERSIONS

: num>s  ( n -- n.s )
(
Convert the number according to the current BASE to a Forth string in
transient memory.  Traditional.
)
  <# dup >r abs s>d #s r> sign #> ;

(
Because char-counted strings are deprecated, we use "measured strings"
[mstrings], which are strings with an implementation-defined count-field
size.  Our library defaults to cell-size.

In stack comments, a $ suffix indicates the address of an mstring.
)

: m$>s  ( m$ -- s )  \ COUNT for cell count-field
  >r r@ cell+ ( addr) r> @  ( len) ;

: s-m$+  ( s m$ -- )
(
Append the Forth string to the mstring.  Transliteration of Wil Baden's
APPEND.
)
  2dup 2>r   m$>s +   swap move 2r> +! ;

create pad$ 128 chars allot

: 0pad$  ( -- )    0 pad$ ! ;
: pad$>s ( -- )    pad$ m$>s ;
: pad+   ( s -- )  pad$ s-m$+ ;
: bl+    ( -- )    s"  " pad+ ;

: digs+  ( +num #digs -- )
  ( #digs) >r num>s
  r> over ( len) - dup 0< ABORT" ***Too many digits."
  ( #digs-len) 0 ?DO s" 0" pad+ LOOP
  ( num.s) pad+ ;

: bl-4digs+  ( +4.digit.num -- )  bl+ #4 digs+ ;

: hdigs+      ( -- )  base @ >r hex digs+     r> base ! ;
: bl-4hdigs+  ( -- )  base @ >r hex bl-4digs+ r> base ! ;

\ HEX8>RAW32, HEX16>RAW64, HEX20>RAW80, HEX32>RAW128
(
Translate a Forth string representation of a raw floating-point datum to
the corresponding raw format on the data stack.  These words are for
convenience in expressing raw data without concern for right to left
stack notation, and with embedded blanks for readability.  They all
ABORT if the string does not contain exactly the required number of hex
digits.
)

: hex8>raw32  ( hex8.s -- raw32 )
  -spaces dup #8 <> ABORT" ***Not 8 hex digits."
  ?hex>ud ( hi) drop ;

: hex16>raw64  ( hex16.s -- raw64 )
  -spaces dup #16 <> ABORT" ***Not 16 hex digits."
  ?hex>ud [ bits/cell #64 = ] [IF] drop [THEN] ;

: hex20>raw80  ( hex20.s -- raw80 )
  -spaces dup #20 <> ABORT" ***Not 20 hex digits."
  ?hex>uq [ bits/cell #64 = ] [IF] 2drop [ELSE] drop [THEN] ;

: hex32>raw128  ( hex32.s -- raw128 )
  -spaces dup #32 <> ABORT" ***Not 32 hex digits."
  ?hex>uq [ bits/cell #64 = ] [IF] 2drop [THEN] ;

\ RAW32>HEX8, RAW64>HEX16, RAW80>HEX20, RAW128>HEX32
(
Translate the raw formats to hex strings of hex digits formatted
in 4-digit, blank-separated groups.
)

bits/cell #32 = [IF]

: raw32>hex8 ( raw32 -- hex8.s )  \ raw32 has 1 cell
  0pad$ #16 lshift>ud #4 hdigs+
  #16 lshift>ud bl-4hdigs+ ( ud'.lo) drop pad$>s ;

: raw64>hex16  ( raw64 -- hex16.s )  \ raw64 has 2 cells
  0pad$ #16 lshift>ud #4 hdigs+
  #16 lshift>ud bl-4hdigs+ ( ud'.lo) drop
  #16 lshift>ud bl-4hdigs+
  #16 lshift>ud bl-4hdigs+ ( ud'.lo) drop pad$>s ;

: raw80>hex20  ( raw80 -- hex20.s )  \ raw80 has 2.5 cells
  #16 lshift  \ move top.cell.lo16 to top.cell.hi16
  0pad$ #16 lshift>ud #4 hdigs+  ( ud'.lo) drop
  #16 lshift>ud bl-4hdigs+
  #16 lshift>ud bl-4hdigs+ ( ud'.lo) drop
  #16 lshift>ud bl-4hdigs+
  #16 lshift>ud bl-4hdigs+ ( ud'.lo) drop pad$>s ;

: raw128>hex32  ( raw128 -- hex32.s )  \ raw128 has 4 cells
  0pad$ #16 lshift>ud #4 hdigs+
  #16 lshift>ud bl-4hdigs+ ( ud'.lo) drop
  #3 0 DO #16 lshift>ud bl-4hdigs+
         #16 lshift>ud bl-4hdigs+ ( ud'.lo) drop
  LOOP pad$>s ;

[ELSE]  \ 64-BITS/CELL

: raw32>hex8 ( raw32 -- hex8.s )  \ raw32 has 1 cell
  0pad$ #48 lshift>ud #4 hdigs+
  #16 lshift>ud bl-4hdigs+ ( ud'.lo) drop pad$>s ;

: raw64>hex16  ( raw64 -- hex16.s )  \ raw64 has 1 cells
  0pad$ #16 lshift>ud 4 hdigs+
  #3 0 DO #16 lshift>ud bl-4hdigs+ LOOP ( ud'.lo) drop pad$>s ;

: raw80>hex20  ( raw80 -- hex20.s )  \ raw80 has 1.25 cells
  #48 lshift  \ move top.cell.lo16 to top.cell.hi16
  0pad$ #16 lshift>ud #4 hdigs+ ( ud'.lo) drop
  #4 0 DO #16 lshift>ud bl-4hdigs+ LOOP ( ud'.lo) drop pad$>s ;

: raw128>hex32  ( raw128 -- hex32.s )  \ raw128 has 2 cells
  0pad$ #16 lshift>ud #4 hdigs+
  #3 0 DO #16 lshift>ud bl-4hdigs+ LOOP ( ud'.lo) drop
  #4 0 DO #16 lshift>ud bl-4hdigs+ LOOP ( ud'.lo) drop pad$>s ;

[THEN]


\ *** USER INTERFACE

[UNDEFINED] RAW-FLOAT [IF] INCLUDE rawfloat.fs [THEN]

: hex>raw  ( hex.s -- raw.default )
  [ bits/float #32 = ]  [IF] hex8>raw32 [ELSE]
  [ bits/float #64 = ]  [IF] hex16>raw64 [ELSE]
  [ bits/float #80 = ]  [IF] hex20>raw80 [ELSE]
  [ bits/float #128 = ] [IF] hex32>raw128 [THEN] [THEN] [THEN] [THEN] ;

: raw>hex  ( raw64.default -- hex.s )
  [ bits/float #32 = ]  [IF] raw32>hex8 [ELSE]
  [ bits/float #64 = ]  [IF] raw64>hex16 [ELSE]
  [ bits/float #80 = ]  [IF] raw80>hex20 [ELSE]
  [ bits/float #128 = ] [IF] raw128>hex32 [THEN] [THEN] [THEN] [THEN] ;

\ default floats
: hex>f  ( hex.default.s -- f: r )     hex>raw raw>f ;
: f>hex  ( f: r -- s: hex.default.s )  f>raw raw>hex ;

BASE !