 | > What's flexible about that. Your PARSE-WORD cannot work with > white-space as delimiter (there are several white-space characters). > And concerning the flexibility of being able to use only blanks or > only non-white-space characters, that seems pretty useless to me. > In any case, why use the name PARSE-WORD for this word. How about > giving it a different name, like OMIT-PARSE.
Note that 4tH runtime doesn't have to parse source code. It has a dedicated compiler to do that. It is used to parse textfiles, especially data textfiles (e.g. CSV format). In that environment OMIT is very useful (e.g. leading spaces). I chose PARSE-WORD, because it replaces WORD (4tH has no WORD, since virtually every string routine returns an addr/count) and works much like PARSE. Since it is not part of the ANS standard I can do pretty much as I like for that matter.
However, if I had to do whitespace parsing, I'd either make a more flexible word that uses a vector (deferred word) to a scanning routine (I did that for a special version of SCAN, SKIP and SPLIT. That is _very_ flexible. I don't like reinventing lots of words with a slightly different meaning every time. Factor, factor, factor...!) Or I'd replace every single whitespace character in the input buffer by a space (using TRANSLATE or equivalent).
Given two different alternatives, I always take the most flexible and reusable. I don't like hunderd too-much-alike words which clutter my namespace...
BTW given the functionality, I'd rather have chosen Wil Baden's 'NEXT-WORD'. He has always been a master of names. It reflects its function better and isn't easily confused with WORD and PARSE.
Hans Bezemer
Examples: \ 4tH library - TOKENIZE - Copyright 2004 J.L. Bezemer \ You can redistribute this file and/or modify it under \ the terms of the GNU General Public License
\ Load definitions when needed [UNDEFINED] /STRING [IF] [NEEDS lib/anstring.4th] [THEN]
[UNDEFINED] IS-TYPE [IF] [UNDEFINED] ?NOT [IF] DEFER ?NOT : (NO) NOT ; : (YES) ; [THEN]
DEFER IS-TYPE ( c -- f)
: (TOKENIZE) ( a1 n2 xt -- a2 n2) IS ?NOT BEGIN DUP IF OVER C@ IS-TYPE ?NOT ELSE DUP THEN WHILE 1 /STRING REPEAT ;
: (-TOKENIZE) ( a1 n2 xt -- a2 n2 ) IS ?NOT BEGIN DUP IF 2DUP 1- CHARS + C@ IS-TYPE ?NOT ELSE DUP THEN WHILE 1- REPEAT ;
: SCAN ['] (YES) (TOKENIZE) ; ( a1 n1 -- a2 n2 ) : -SCAN ['] (YES) (-TOKENIZE) ; ( a1 n1 -- a2 n2 ) : SKIP ['] (NO) (TOKENIZE) ; ( a1 n1 -- a2 n2 ) : -SKIP ['] (NO) (-TOKENIZE) ; ( a1 n1 -- a2 n2 ) : SPLIT 2DUP SCAN ROT >R ROT >R DUP 2R> ROT - ; : -SPLIT TUCK -SCAN TUCK 2DUP 2>R CHARS + -ROT - 2R> ; ( a1 n1 -- a2 n2 a3 n3)
[DEFINED] 4TH# [IF] forget (TOKENIZE) forget (-TOKENIZE) [THEN] [THEN]
Usage example: ['] is-digit is is-type \ setup for -SPLIT
Example: \ 4tH library - TRANSLATE - Copyright 2004 J.L. Bezemer \ You can redistribute this file and/or modify it under \ the terms of the GNU General Public License
[UNDEFINED] TRANSLATE [IF] \ Translates one group of characters, identified by a2/n2 \ in string a1/n1 into another set, identified by a3/n2. \ Always returns a1/n1. If the translation strings are not \ of equal length or have zero length, no translation is done.
: translate ( a1 n1 a2 n2 a3 n2 -- a1 n1) dup 0> >r rot over = r> and ( a1 n1 a2 a3 n2 f) if ( a1 n1 a2 a3 n2) rot swap 2>r >r 2dup r> -rot ( a1 n1 a3 a1 n1) r> -rot r> -rot chars bounds ( a1 n1 a3 a2 n2 a4 a5) ?do ( a1 n1 a3 a2 n2) dup >r -rot r> 0 ( a1 n1 n2 a3 a2 n2 0) do ( a1 n1 n2 a3 a2) dup i chars + c@ j c@ = ( a1 n1 n2 a3 a2 f) if over i chars + c@ j c! leave then loop rot ( a1 n1 a3 a2 n2) loop ( a1 n1 a3 a2 n2) then 2drop drop ( a1 n1) ; [THEN]
|
|