knowledge-database (beta)

Current group: comp.simulation

Choice of data structure for large event queues?

Choice of data structure for large event queues?  
Chris McDonald
 Re: Choice of data structure for large event queues?  
Thad Smith
 Re: Choice of data structure for large event queues?  
Chris McDonald
 Re: Choice of data structure for large event queues?  
Thad Smith
 Re: Choice of data structure for large event queues?  
Martijn
 Re: Choice of data structure for large event queues?  
Martijn
 Re: Choice of data structure for large event queues?  
Nicolaas Vroom
 Re: Choice of data structure for large event queues?  
moi
 Re: Choice of data structure for large event queues?  
moi
From:Chris McDonald
Subject:Choice of data structure for large event queues?
Date:Thu, 23 Dec 2004 11:02:27 -0600
Hello, I have been implementing and measuring calendar-queues to manage
time-ordered event queues in a simulation.

The basic functions of enqueuing and dequeuing work very well (very fast)
for simulations involving thousands, even hundreds of thousands,
of pending events.

However, my simulation also needs to occassionally, maybe 2% of the time,
remove a "random" event from the queue - keeping the the queue
balanced/ordered correctly for the more frequent enqueuing and dequeuing.
Of course, with calendar-queues, this degenerates to linear behaviour -
which is woeful.

Does anyone have suggestions for other data structures that I should
examine? Thanks in advance,

______________________________________________________________________________
Dr Chris McDonald E: chris@csse.uwa.edu.au
Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris
The University of Western Australia, M002 T: +618 6488 2533
Crawley, Western Australia, 6009 F: +618 6488 1089
From:Thad Smith
Subject:Re: Choice of data structure for large event queues?
Date:Thu, 23 Dec 2004 22:56:19 -0600
Chris McDonald wrote:
>
> Hello, I have been implementing and measuring calendar-queues to manage
> time-ordered event queues in a simulation.
>
> The basic functions of enqueuing and dequeuing work very well (very fast)
> for simulations involving thousands, even hundreds of thousands,
> of pending events.
>
> However, my simulation also needs to occassionally, maybe 2% of the time,
> remove a "random" event from the queue - keeping the the queue
> balanced/ordered correctly for the more frequent enqueuing and dequeuing.
> Of course, with calendar-queues, this degenerates to linear behaviour -
> which is woeful.

I haven't used calendar queues, but did some reading from the net.
How many buckets do you use for the events and what are the average
number of elements per bucket? The cost of deleting an element should
be basically the cost of finding the element to be deleted. If you
have a pointer to the item to be deleted, then the cost is very
small. If you have to search for it, then, to my understanding,
finding the element is proportional to the number of items in the
bucket, which is the same cost as choosing the minimum if the bucket
is unsorted or doing an insertion if the bucket is sorted. Either
way, the cost should be similar to insertion or minimum item removal.

Thad
From:Chris McDonald
Subject:Re: Choice of data structure for large event queues?
Date:Sat, 25 Dec 2004 18:35:24 -0600
Thad Smith writes:

>I haven't used calendar queues, but did some reading from the net.
>How many buckets do you use for the events and what are the average
>number of elements per bucket? The cost of deleting an element should
>be basically the cost of finding the element to be deleted. If you
>have a pointer to the item to be deleted, then the cost is very
>small. If you have to search for it, then, to my understanding,
>finding the element is proportional to the number of items in the
>bucket, which is the same cost as choosing the minimum if the bucket
>is unsorted or doing an insertion if the bucket is sorted. Either
>way, the cost should be similar to insertion or minimum item removal.

Thanks for the reply Thad.
Here's my reply to another response to another reply, which may better
explain my problem.

-------
However, I probably didn't explain my problem well enough. I have
hundreds or thousands of events to be maintained in time-sorted ordered
(often termed a simulation's pending event set).
Each event has a time value, used to (keep it) sorted in the queue, and
one or more data fields, for example an integer part-number.

My problem is not in enqueuing or dequeuing, as that runs in O(1) time
with the calendar queue. Similarly deleting a found event is easy -
either mark it as a zombie and leave it in the queue, or adjust each
bucket's queue and resize the whole calendar if necessary.

My problem is first finding an event with a given, say, part-number. I
do not have the time of the event, and so must locate the required
part-number with a linear search. Woeful when the queue holds tens of
thousands of events, and a few thousand are to be processed per second.

Ideally I'd like to keep the calendar queue for the O(1) enqueuing and
dequeuing, but will also need to "merge" it with another of data structure
used to record the additional keys (of, of say, the part-numbers).
I can anticipate with other fields are required for the deltions.
As the random deltions (based on part-numbers) are far less frequent,
something O(log n) or O(sqrt n) would be fine.

For many simulations I think this would be a relatively common
requirement, so I wonder what others may use. Off to investigate threaded
hash-tables for the secondary keys....

______________________________________________________________________________
Dr Chris McDonald E: chris@csse.uwa.edu.au
Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris
The University of Western Australia, M002 T: +618 6488 2533
Crawley, Western Australia, 6009 F: +618 6488 1089
From:Thad Smith
Subject:Re: Choice of data structure for large event queues?
Date:Sun, 26 Dec 2004 18:23:05 -0600
Chris McDonald wrote:

> Thad Smith writes:
>
>>I haven't used calendar queues, but did some reading from the net.
>>How many buckets do you use for the events and what are the average
>>number of elements per bucket? The cost of deleting an element should
>>be basically the cost of finding the element to be deleted.
>
> My problem is not in enqueuing or dequeuing, as that runs in O(1) time
> with the calendar queue. Similarly deleting a found event is easy -
> either mark it as a zombie and leave it in the queue, or adjust each
> bucket's queue and resize the whole calendar if necessary.
>
> My problem is first finding an event with a given, say, part-number. I
> do not have the time of the event, and so must locate the required
> part-number with a linear search. Woeful when the queue holds tens of
> thousands of events, and a few thousand are to be processed per second.

My recommendation would be to either
1) build an index of part numbers in the queue. The index would point
to the queue entry. A linked hash table may work well. The queue entry
should point to the hash entry for fast removal. I think that is what
you suggested.

2) maintain a set of items to be deleted. As each item is removed from
the queue, it is checked against the set. A tree, skip list, or linked
hash table would probably work OK. If a hash table, make it larger than
the expected number of deleted entries in queue to make searching fast.

In the second case the set is much smaller, since it contains only
queued but deleted items, and would probably be faster if the deleted
item check is fast.

BTW, this is an area outside my normal work. Can you explain how O(1)
enqueuing and dequeuing is accomplished?

Thad
From:Martijn
Subject:Re: Choice of data structure for large event queues?
Date:Tue, 28 Dec 2004 09:28:56 -0600
> 2) maintain a set of items to be deleted. As each item is removed
> from the queue, it is checked against the set. A tree, skip list, or
> linked hash table would probably work OK. If a hash table, make it
> larger than the expected number of deleted entries in queue to make
> searching fast.

This is what I would do, I guess. But 2% of a large number is still a
pretty large number. Would binary trees not be an applicable solution for
fast searching?


On antother node, if I understand correctly, each event "points" to a part.
If a "random" event needs to be deleted, because something happened to that
part, maybe it is best to store that information in the part itself. So
mark a part as "zombie" or whatever and if necessary mark at what time this
happened.

This would be harder to clean up, probably, but that can be solved by using
yet another memory structure which contains the zombie'd part along with the
time of the last event that is in the chain (that way you are sure that all
pending events for that part are caught). There are different options for
this data structure as well, but I would use something like a combination of
a circular and expandable array (an array consiting of multiple smaller -
but still rather large - arrays in memory, and making this construction
circular - faster possibilities may exists which I have not experienced with
yet or simply more suitable for your exact problem). This chain is
automatically sorted by time due to its nature, so adding and removing items
should only take O(1).

But I think it is almost inevitable that you are going to sacrifice memory
over speed increase (given an "optimal" implementation of a certain
algorithm).


Just my 2 cents...

--
Martijn
http://www.sereneconcepts.nl
From:Martijn
Subject:Re: Choice of data structure for large event queues?
Date:Wed, 29 Dec 2004 02:15:33 -0600
Sorry 'bout my post last night, I must have been so completely wasted my
software even forgot to put in the signature! :P

--
Martijn
http://www.sereneconcepts.nl
From:Nicolaas Vroom
Subject:Re: Choice of data structure for large event queues?
Date:Sat, 08 Jan 2005 06:58:03 -0600
Hello Chris

Reading your signature file I would expect you are an expert
in this subject and as such I have great doubts If I can be of any
help, but still let me give you a try.

If I understand you correct your general problem is how to
retrieve a particular record out of a collection of for example
100 thousand records in one random access file.

First let me describe the fields of each record in file 1:
record #, record file 1 #, date, time, Juliandate, part number, description

Let me assume that new records are stored at the end of the file
not sorted i.e. in random order as created.
record # and record file 1 # are the same.
Julian date is a real number, combination of date and time
See for Details:
Astronomy with your Personal computer by Peter Duffett Smith
or Astronomical Algorithms by jean Meeus

Suppose you want to find a record with a particular Juliandate
what is your best strategy
IMO you should create the following sorted file:
record #, record file 1 #, Juliandate, next record #
This sorted file contains the Juliandates in ascending
order.
Initially next record # = record # +1
Suppose this file contains 100 sorted records and you want
to add record 101.
Record 101 should have been stored at record position 57.
What do you do?
Record 101 is stored at position 101.
next record # of record 56 is changed from 57 to 101.
next record # of record 101 is changed from 102 to 57

You can do the same for new records 102, 103, 104 etc
The problem is this file contains still 100 sorted items
but the tail of the file becomes messy and finding files
in the tail becomes time consuming.
The only solution is to do a clean up of this "sorted" file,
(with a type of copy new and delete old operation)
remove all holes and put the records in the tail in the correct
order.

This clean up does not effect random access file 1

Suppose you also want to find a record with a particular
part number what is your best strategy ?
IMO you should than create the following sorted file:
record #, record file 1 #, partnumber, next record #
Adding a record is the same as with Juliandate
except that the next record numbers calculated will be different
because the ordered position will be different.

To find a particular Juliandate in the ordered Juliandate
file is rather straight forward, of course I hope that
those dates are equal distributed and are not clustered
around certain specific values.
(If requested I will give more details)
The same for partnumbers or any key you want.

What is the language you are using ?
Visual Basic ?

Hope this helps

Nicolaas Vroom
http://users.pandora.be/nicvroom/


"Chris McDonald" schreef in bericht
news:cql0tf$r7k$2@enyo.uwa.edu.au...
> Thad Smith writes:
>
> >I haven't used calendar queues, but did some reading from the net.
> >How many buckets do you use for the events and what are the average
> >number of elements per bucket? The cost of deleting an element should
> >be basically the cost of finding the element to be deleted. If you
> >have a pointer to the item to be deleted, then the cost is very
> >small. If you have to search for it, then, to my understanding,
> >finding the element is proportional to the number of items in the
> >bucket, which is the same cost as choosing the minimum if the bucket
> >is unsorted or doing an insertion if the bucket is sorted. Either
> >way, the cost should be similar to insertion or minimum item removal.
>
> Thanks for the reply Thad.
> Here's my reply to another response to another reply, which may better
> explain my problem.
>
> -------
> However, I probably didn't explain my problem well enough. I have
> hundreds or thousands of events to be maintained in time-sorted ordered
> (often termed a simulation's pending event set).
> Each event has a time value, used to (keep it) sorted in the queue, and
> one or more data fields, for example an integer part-number.
>
> My problem is not in enqueuing or dequeuing, as that runs in O(1) time
> with the calendar queue. Similarly deleting a found event is easy -
> either mark it as a zombie and leave it in the queue, or adjust each
> bucket's queue and resize the whole calendar if necessary.
>
> My problem is first finding an event with a given, say, part-number. I
> do not have the time of the event, and so must locate the required
> part-number with a linear search. Woeful when the queue holds tens of
> thousands of events, and a few thousand are to be processed per second.
>
> Ideally I'd like to keep the calendar queue for the O(1) enqueuing and
> dequeuing, but will also need to "merge" it with another of data structure
> used to record the additional keys (of, of say, the part-numbers).
> I can anticipate with other fields are required for the deltions.
> As the random deltions (based on part-numbers) are far less frequent,
> something O(log n) or O(sqrt n) would be fine.
>
> For many simulations I think this would be a relatively common
> requirement, so I wonder what others may use. Off to investigate threaded
> hash-tables for the secondary keys....
>
>
____________________________________________________________________________
__
> Dr Chris McDonald E: chris@csse.uwa.edu.au
> Computer Science & Software Engineering W:
http://www.csse.uwa.edu.au/~chris
> The University of Western Australia, M002 T: +618 6488 2533
> Crawley, Western Australia, 6009 F: +618 6488 1089
>
From:moi
Subject:Re: Choice of data structure for large event queues?
Date:Sun, 26 Dec 2004 17:31:52 -0600
Chris McDonald wrote:

> My problem is first finding an event with a given, say, part-number,I
> do not have the time of the event, and so must locate the required
> part-number with a linear search. Woeful when the queue holds tens of
> thousands of events, and a few thousand are to be processed per second.
>

So basically the event queue and the part-thingies are stored in
different data structures, because they live in different worlds.
- queue: sorted on time, contains enough 'key'-data to find the
corresponding thingies.

- thingies: identifies by 'part-number'. they contain no foreign key.

My solution is simple: add a foreign-key field to the thingies.
either (a pointer to) the queue-node or the 'time-scheduled' .

BTW if the 'part-number' does not change during flight, you could also
choose some lineair adressing scheme.

Finally: My gutfeeling is that your queue is heterogenous, which could
make a smart-union like solution possible.

> For many simulations I think this would be a relatively common
> requirement, so I wonder what others may use. Off to investigate threaded
> hash-tables for the secondary keys....

(in my view, the event-queue is the secondary key. The first beeing
the'part-number' )

That would make the O a lot bigger. In a previous reply (which got
lost), I suggested a _separate_ hash table for the events to drop on
retrieval. Would be a lot leaner. (but kind of ugly ;-)

> ______________________________________________________________________________
> Dr Chris McDonald E: chris@csse.uwa.edu.au
> Computer Science & Software Engineering W: http://www.csse.uwa.edu.au/~chris
> The University of Western Australia, M002 T: +618 6488 2533
> Crawley, Western Australia, 6009 F: +618 6488 1089
>

HTH,
AvK
From:moi
Subject:Re: Choice of data structure for large event queues?
Date:Sun, 26 Dec 2004 17:31:46 -0600
Thad Smith wrote:

> I haven't used calendar queues, but did some reading from the net.
> How many buckets do you use for the events and what are the average
> number of elements per bucket? The cost of deleting an element should
> be basically the cost of finding the element to be deleted. If you
> have a pointer to the item to be deleted, then the cost is very
> small. If you have to search for it, then, to my understanding,
> finding the element is proportional to the number of items in the
> bucket, which is the same cost as choosing the minimum if the bucket
> is unsorted or doing an insertion if the bucket is sorted. Either
> way, the cost should be similar to insertion or minimum item removal.
>
> Thad
>

I agree.

Some more suggestions:
1) create an 'invalid' field inside the node, test it on retrieving.
this assumes you have a pointer to the node (or can find it), so
it will perform the same as thad's suggestion.
2) add a prev-pointer to the node, or preferably a handle-field (pointer
to the parent pointer). this will make insertions and retrievals
costlyer.

3) make a separate "ignore" table (typically a hash). Instead of
deleting the actual item, you remember to forget it. You do need to have
good "keyfields", and retrieval will be a bit more costly.


NB I 'd never heard of the calendar queue before. At first sight it
seemed to me thar the 'resize' operation is very costly. Could a more
efficient 'rehash' be invented ?

HTH,
AvK
   

Copyright © 2006 knowledge-database   -   All rights reserved