|
|
 | | From: | Sriyansa | | Subject: | Handling streaming input | | Date: | 30 Dec 2004 00:55:37 -0500 |
|
|
 | Hi,
I am trying to write a parser module which takes in a streaming input and converts it to an operation stack. All the steps (char stream -> token stream -> operation stack) are asynchronous and the modules read from and write into incomplete data structures.
The grammar for the parsing is itself not very complicated and I could use ANTLR to generate the parser. However, embedding the code for the customized I/O (a significant piece) in the grammar file does not look feasible.
Now the questions:
1. Is there any other parser generator more suitable for such a task? 2. Or am I missing something in ANTLR irself? 3. If I have to change the ANTLR source to allow for such I/O where do I start?
Thanks, Sriyansa
|
|
 | | From: | Chris F Clark | | Subject: | Re: Handling streaming input | | Date: | 31 Dec 2004 13:04:00 -0500 |
|
|
 | Sriyansa asked: > I am trying to write a parser module which takes in a streaming input > and converts it to an operation stack. All the steps (char stream -> > token stream -> operation stack) are asynchronous and the modules read > from and write into incomplete data structures.
I cannot attest to how easy it is to do in ANTLR, but with the right object-model and the right parsing model, it is not difficult to do. In partciular, we structure our parsing library, Yacc++ and the Language Objects Library around exactly such a methaphor.
So, what is the right object-model. Fairly simple. The "input object" must be a separate class from the lexer and parser objects and the class must be polymorphic. I think the ANTLR and JavaCC both have input objects as a separate class and access that class polymorphicly.
Now, what is the right parsing model. There are two choices.
1) One can have an "event-driven" parsing model--that's what Yacc++ uses. In an event-driven model, the control flow is inverted, the input object drives the lexer when it has new input and the lexer drives the parser when it has new tokens. Part of successfully having an event driven model is having all the state of the driven objects be in the object and none of it implicit in call stack (i.e. no persistent local variables and no use of calling hierarchy to encode position). The second restriction rules out recursive descent, since that methodology is rooted in keeping the parser position in the call hierarchy.
2) If one doesn't have an event-driven parsing model, the other possibility is using buffers (and possibly processes) to simulate it. I've seen several parsers follow this approach. In the buffer model, each level, doesn't directly communicate with the next level "up", but instead fills a buffer (i.e. it is a producer). The level above also doesn't read from the level below, it reads from the buffer (i.e. it is a consumer). When the buffer is empty the top levels "stall" and wait for the lower level to put something in the buffer. If one is using separate processes, one uses some form of processes syncronization (e.g. semaphores) to control access to the buffer. However, one can also implement the scheme in a single process environment. For example, I've seen several compilers that run the lower level anytime the buffer is empty and when running the lower level, run it until the buffer is full (or nearly full).
Note, in actuality the Language Objects Library does both. It uses inverted control flow because that has certain nice properties (in particular, it gives most of the advantages of having separate processes without the overhead--the inverted flow control acts like the semaphores causing the top-level processes to run exactly when they have data available) and it has buffers that fill and empty. The only thing not implemented in the Language Objects Library is putting the producers and consumers in separate processes (or threads).
In any case, the input object is separate from the grammar and there are no grammar changes to use a streaming methodology. The effect of streaming is entirely in the "run-time library". If you want to adapt ANTLR to support such an input model, you won't change anything in your grammar. It will all be in ancillary code.
Of course, the next question comes down to how much of the ANTLR support is contained in the code generated for each grammar and how much is in the run-time library. With Yacc++, very little is actually generated by the parser generator--it is effectively mostly some tables of integers and some class declarations. Other parser generators have struck other balances. In general, recursive descent parser generators tend to generate more of the mechanics as part of the generated code and less in a library, because they depend on the "host language semantics" to get the implicit parser position via the calling hierarchy.
However, I know that ANTLR has some routines, that do the lesxer parser interface in a "library", so it is not doomed to be hopeless. On a similar positive note, ANTLR has more the one "code generator" which means it is feasible to change the generated output if the semantic hooks do need to be embedded there.
I wish I could give you a more concrete answer as to how easy the job will be. It may be trivial as someone may have already done the work for you (i.e. someone else may have needed the exact model you desire and made the appropriate changes). It may also be quite difficult, not from the conceptual level, but only due to the "irrelevant" details you may have to resolve to make such a hack.
Note, if the support isn't trivial, I would look at some other parser generators, as it is possible that someone has written a parser generator/library that does exactly what you want. Now, not all of them will be of ANTLR's quality, which is worth considering, but they may be good enough for your needs.
Best of luck, -Chris
***************************************************************************** Chris Clark Internet : compres@world.std.com Compiler Resources, Inc. Web Site : http://world.std.com/~compres 23 Bailey Rd voice : (508) 435-5016 Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours) ------------------------------------------------------------------------------
|
|
|