knowledge-database (beta)

Current group: comp.lang.forth

PARSE-WORD and 4tH again

PARSE-WORD and 4tH again  
Hans Bezemer
From:Hans Bezemer
Subject:PARSE-WORD and 4tH again
Date:11 Jan 2005 05:36:19 -0800
> What's flexible about that. Your PARSE-WORD cannot work with
> white-space as delimiter (there are several white-space characters).
> And concerning the flexibility of being able to use only blanks or
> only non-white-space characters, that seems pretty useless to me.
> In any case, why use the name PARSE-WORD for this word. How about
> giving it a different name, like OMIT-PARSE.

Note that 4tH runtime doesn't have to parse source code. It has a
dedicated compiler to do that. It is used to parse textfiles,
especially data textfiles (e.g. CSV format). In that environment OMIT
is very useful (e.g. leading spaces). I chose PARSE-WORD, because it
replaces WORD (4tH has no WORD, since virtually every string routine
returns an addr/count) and works much like PARSE. Since it is not part
of the ANS standard I can do pretty much as I like for that matter.

However, if I had to do whitespace parsing, I'd either make a more
flexible word that uses a vector (deferred word) to a scanning routine
(I did that for a special version of SCAN, SKIP and SPLIT. That is
_very_ flexible. I don't like reinventing lots of words with a
slightly different meaning every time. Factor, factor, factor...!) Or
I'd replace every single whitespace character in the input buffer by a
space (using TRANSLATE or equivalent).

Given two different alternatives, I always take the most flexible and
reusable. I don't like hunderd too-much-alike words which clutter my
namespace...

BTW given the functionality, I'd rather have chosen Wil Baden's
'NEXT-WORD'. He has always been a master of names. It reflects its
function better and isn't easily confused with WORD and PARSE.

Hans Bezemer

Examples:
\ 4tH library - TOKENIZE - Copyright 2004 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

\ Load definitions when needed
[UNDEFINED] /STRING [IF]
[NEEDS lib/anstring.4th]
[THEN]

[UNDEFINED] IS-TYPE [IF]
[UNDEFINED] ?NOT [IF]
DEFER ?NOT
: (NO) NOT ;
: (YES) ;
[THEN]

DEFER IS-TYPE ( c -- f)

: (TOKENIZE) ( a1 n2 xt -- a2 n2)
IS ?NOT BEGIN DUP IF OVER C@ IS-TYPE ?NOT ELSE DUP THEN
WHILE 1 /STRING REPEAT
;

: (-TOKENIZE) ( a1 n2 xt -- a2 n2 )
IS ?NOT BEGIN DUP IF 2DUP 1- CHARS + C@ IS-TYPE ?NOT ELSE DUP THEN
WHILE 1- REPEAT
;

: SCAN ['] (YES) (TOKENIZE) ; ( a1 n1 -- a2 n2 )
: -SCAN ['] (YES) (-TOKENIZE) ; ( a1 n1 -- a2 n2 )
: SKIP ['] (NO) (TOKENIZE) ; ( a1 n1 -- a2 n2 )
: -SKIP ['] (NO) (-TOKENIZE) ; ( a1 n1 -- a2 n2 )
: SPLIT 2DUP SCAN ROT >R ROT >R DUP 2R> ROT - ;
: -SPLIT TUCK -SCAN TUCK 2DUP 2>R CHARS + -ROT - 2R> ;
( a1 n1 -- a2 n2 a3 n3)

[DEFINED] 4TH# [IF]
forget (TOKENIZE)
forget (-TOKENIZE)
[THEN]
[THEN]

Usage example:
['] is-digit is is-type \ setup for -SPLIT

Example:
\ 4tH library - TRANSLATE - Copyright 2004 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

[UNDEFINED] TRANSLATE [IF]
\ Translates one group of characters, identified by a2/n2
\ in string a1/n1 into another set, identified by a3/n2.
\ Always returns a1/n1. If the translation strings are not
\ of equal length or have zero length, no translation is done.

: translate ( a1 n1 a2 n2 a3 n2 -- a1 n1)
dup 0> >r rot over = r> and ( a1 n1 a2 a3 n2 f)
if ( a1 n1 a2 a3 n2)
rot swap 2>r >r 2dup r> -rot ( a1 n1 a3 a1 n1)
r> -rot r> -rot chars bounds ( a1 n1 a3 a2 n2 a4 a5)
?do ( a1 n1 a3 a2 n2)
dup >r -rot r> 0 ( a1 n1 n2 a3 a2 n2 0)
do ( a1 n1 n2 a3 a2)
dup i chars + c@ j c@ = ( a1 n1 n2 a3 a2 f)
if over i chars + c@ j c! leave then
loop rot ( a1 n1 a3 a2 n2)
loop ( a1 n1 a3 a2 n2)
then 2drop drop ( a1 n1)
;
[THEN]
   

Copyright © 2006 knowledge-database   -   All rights reserved