knowledge-database (beta)

Current group: comp.programming

what is the correct data structure to use with this problem?

what is the correct data structure to use with this problem?  
Dylan
 Re: what is the correct data structure to use with this problem?  
spinoza1111 at yahoo.com
 Re: what is the correct data structure to use with this problem?  
spinoza1111 at yahoo.com
 Re: what is the correct data structure to use with this problem?  
Dylan
 Re: what is the correct data structure to use with this problem?  
Casey Hawthorne
 Re: what is the correct data structure to use with this problem?  
Dylan
From:Dylan
Subject:what is the correct data structure to use with this problem?
Date:Tue, 18 Jan 2005 08:43:48 +0000

I have an algorithm that must iterate through a rulebase as quickly as
possible to eliminate all rules that are not applicable within the
current context. Each rule consists of one or more boolean context
queries (P1, P2, .... Pn) many of which are duplicated in other rules.
The following is a very simple example of a rulebase containing 3
rules:

R1 = {P1, P5, P7}
R2 = {P1, P4, P5, P9, P11}
R3 = {P1, P2, P7, P11}

This is a very simple example (the rulebases I'm working with contain
many more rules with hundreds or thousands of different context
queries) but I hope it clarifies.

Each of the context queries is a function object that queries the
world state and returns a Boolean value. Some of these queries contain
lengthy calculations so I want to avoid duplication and make as few
function calls as possible.

I'm thinking that some sort of tree can be utilized to reduce the
amount of function call overhead but I cannot imagine how it would
look.

I would like to know if I'm thinking along the correct lines and if
so, what type of tree I should use. I'd also be interested in any
other ideas you guys may have for representing this kind of rulebase
in a way that allows it to be iterated through quickly.

Thanks for any help.
From:spinoza1111 at yahoo.com
Subject:Re: what is the correct data structure to use with this problem?
Date:20 Jan 2005 19:17:30 -0800
It would be a structure containing data, a left node index and a right
node index.

The top level would contain no data, a left node index either null or
pointing to the first real node (the top of the tree) and a null right
node. This design takes care of the need to represent an empty tree.

The traditional way to represent is an array of structures and if
deletions are unnecessary this is OK.

But if deletes are necessary and you are in an OO environment, an array
isn't needed. Instead create and delete objects representing nodes as
the triplet (data, left node, right node). Here left node and right
node are not numeric indexes as they are in arrays but object
references (they are not "pointers").

Creating an empty tree:

tree = new node(null, null, null)

Adding a member to an empty tree:

tree.leftNode = new node(memberData, null, null)

Adding a left node to node:

node.leftNode = new node(memberData, null, null)

Deletions are a bit tricky since it's not enough to delete a node. This
simply breaks all meaningful communication with the node's descendants
resulting in clutter and eventual crash. Instead, you have to
recursively delete the node and its descendants, going all the way to
the outermost child, deleting him and then deleting his ancestors one
after the other.

However, your problem doesn't seem to involve this.

The key in an OO environment is that you no longer are responsible for
managing a memory space and in this memory space, simulating a
tree...with all the nonsense that involves, of restricting the size of
the tree to a hard coded constant and pissing me off when your software
does not scale. The OO environment allows you to make each node
including the top a proxy for the rest of the structure, whose
management is responsibility of the OO heap manager.

I know many computer science profs still want you to emulate a tree as
an array but this is nonsense.

It may be necessary to have an n-way and not binary tree:

new node(data, null, null ... null)

That's cool. Any such tree can be proven equivalent to a binary tree by
adding dummy nodes but in actual practice it seems to be better to use
an n-way tree. Sure, the nodes are an array, but it is a fixed array,
and, in typical environments, a fixed array of four byte unsigned
pointers.

Of course, in your problem, it might be a variable length array and
here this could be problematic. The solution is to make the tree node
structure two way.

Any time you have to add node 3, you instead set the right node to a
new node whose left node node 2 and whose left node is node 3, and set
the right node of its parent to the new node. The data at the new node
becomes something amusing such as Nothing, null, 0, or whussup.
From:spinoza1111 at yahoo.com
Subject:Re: what is the correct data structure to use with this problem?
Date:19 Jan 2005 04:23:42 -0800
Dylan, the simplest idea would be to have a lookaside table accessed on
a hash or direct basis by the unique number or name of each rule. Each
entry would have in it true, false, or not evaluated as the value of
query Px.

In the example you present, P1, P5 and P7 would all be evaluated in
evaluating R1. I assume from the implied logic of what you present that
IFF at least one is False, you proceed to the next rule. A further
optimization would be to abandon the evaluation "lazily" at the first
false query and move to the next, retaining the fact that the false
rule is, well, false.

Suppose P1 is True but P5 is False. P7 remains Unknown but P5 is in the
lookaside as False. You save time in each rule evaluating P1 assuming
your direct or hash table algorithm is efficient and also in rule 2.

This without an elaborate "tree".

A further refinement would be to "optimize" the rules by sorting them.
Sort them column-wise from "easy" to "hard" and sort them rowwise by
length...the number of queries in a row, which, I fancy, would
correspond to the difficulty of the row. Or, if you like, assign each
row a weight, the sum of the hardness of each query.

Of course, this would require a separate data base giving the
complexity of each query UNLESS the "source code" of each query is
available at run time.

Of course, if the query values CHANGE, my lookaside table is useless.
But if they can CHANGE you are in deep doo doo anyway for you have to
go to the user, and say, Mister User, shall I use the first value, the
last value, or what?

No, it's probably the case that you can evaluate each query once and
record its Boolean value in a hash table keyed on query name, or a
direct access table keyed on query number.

A tree, properly constructed, would give you the theoretical maximum
performance. A lookaside would give you good enough for government work
performance.

Dylan wrote:
> I have an algorithm that must iterate through a rulebase as quickly
as
> possible to eliminate all rules that are not applicable within the
> current context. Each rule consists of one or more boolean context
> queries (P1, P2, .... Pn) many of which are duplicated in other
rules.
> The following is a very simple example of a rulebase containing 3
> rules:
>
> R1 = {P1, P5, P7}
> R2 = {P1, P4, P5, P9, P11}
> R3 = {P1, P2, P7, P11}
>
> This is a very simple example (the rulebases I'm working with contain
> many more rules with hundreds or thousands of different context
> queries) but I hope it clarifies.
>
> Each of the context queries is a function object that queries the
> world state and returns a Boolean value. Some of these queries
contain
> lengthy calculations so I want to avoid duplication and make as few
> function calls as possible.
>
> I'm thinking that some sort of tree can be utilized to reduce the
> amount of function call overhead but I cannot imagine how it would
> look.
>
> I would like to know if I'm thinking along the correct lines and if
> so, what type of tree I should use. I'd also be interested in any
> other ideas you guys may have for representing this kind of rulebase
> in a way that allows it to be iterated through quickly.
>
> Thanks for any help.
From:Dylan
Subject:Re: what is the correct data structure to use with this problem?
Date:Wed, 19 Jan 2005 16:12:55 +0000
Hi Spinoza,

Thanks for your input. Most of your suggestions are ones I've already
considered and some are already in place (such as complexity rating
and lazy evaluation).

I really do need this to be as fast as possible, which is why I think
a tree is best... I'm just stumped with how the tree should look...



On 19 Jan 2005 04:23:42 -0800, spinoza1111@yahoo.com wrote:

>Dylan, the simplest idea would be to have a lookaside table accessed on
>a hash or direct basis by the unique number or name of each rule. Each
>entry would have in it true, false, or not evaluated as the value of
>query Px.
>
>In the example you present, P1, P5 and P7 would all be evaluated in
>evaluating R1. I assume from the implied logic of what you present that
>IFF at least one is False, you proceed to the next rule. A further
>optimization would be to abandon the evaluation "lazily" at the first
>false query and move to the next, retaining the fact that the false
>rule is, well, false.
>
>Suppose P1 is True but P5 is False. P7 remains Unknown but P5 is in the
>lookaside as False. You save time in each rule evaluating P1 assuming
>your direct or hash table algorithm is efficient and also in rule 2.
>
>This without an elaborate "tree".
>
>A further refinement would be to "optimize" the rules by sorting them.
>Sort them column-wise from "easy" to "hard" and sort them rowwise by
>length...the number of queries in a row, which, I fancy, would
>correspond to the difficulty of the row. Or, if you like, assign each
>row a weight, the sum of the hardness of each query.
>
>Of course, this would require a separate data base giving the
>complexity of each query UNLESS the "source code" of each query is
>available at run time.
>
>Of course, if the query values CHANGE, my lookaside table is useless.
>But if they can CHANGE you are in deep doo doo anyway for you have to
>go to the user, and say, Mister User, shall I use the first value, the
>last value, or what?
>
>No, it's probably the case that you can evaluate each query once and
>record its Boolean value in a hash table keyed on query name, or a
>direct access table keyed on query number.
>
>A tree, properly constructed, would give you the theoretical maximum
>performance. A lookaside would give you good enough for government work
>performance.
>
>Dylan wrote:
>> I have an algorithm that must iterate through a rulebase as quickly
>as
>> possible to eliminate all rules that are not applicable within the
>> current context. Each rule consists of one or more boolean context
>> queries (P1, P2, .... Pn) many of which are duplicated in other
>rules.
>> The following is a very simple example of a rulebase containing 3
>> rules:
>>
>> R1 = {P1, P5, P7}
>> R2 = {P1, P4, P5, P9, P11}
>> R3 = {P1, P2, P7, P11}
>>
>> This is a very simple example (the rulebases I'm working with contain
>> many more rules with hundreds or thousands of different context
>> queries) but I hope it clarifies.
>>
>> Each of the context queries is a function object that queries the
>> world state and returns a Boolean value. Some of these queries
>contain
>> lengthy calculations so I want to avoid duplication and make as few
>> function calls as possible.
>>
>> I'm thinking that some sort of tree can be utilized to reduce the
>> amount of function call overhead but I cannot imagine how it would
>> look.
>>
>> I would like to know if I'm thinking along the correct lines and if
>> so, what type of tree I should use. I'd also be interested in any
>> other ideas you guys may have for representing this kind of rulebase
>> in a way that allows it to be iterated through quickly.
>>
>> Thanks for any help.
From:Casey Hawthorne
Subject:Re: what is the correct data structure to use with this problem?
Date:Wed, 19 Jan 2005 08:11:04 GMT
Isn't "parsing" of rule-bases what Prolog is good at?

There are compiled versions of Prolog for faster results.

--
Regards,
Casey
From:Dylan
Subject:Re: what is the correct data structure to use with this problem?
Date:Wed, 19 Jan 2005 09:57:07 +0000
On Wed, 19 Jan 2005 08:11:04 GMT, Casey Hawthorne
wrote:

>Isn't "parsing" of rule-bases what Prolog is good at?
>
>There are compiled versions of Prolog for faster results.


I have no idea Casey and I don't have the time to learn.
   

Copyright © 2006 knowledge-database   -   All rights reserved