|
|
 | | From: | Ethan T | | Subject: | Handling auto_increment primary keys in MySQL | | Date: | Tue, 18 Jan 2005 16:28:27 -0500 |
|
|
 | Hello all,
I'm working on an application that uses normal auto_increment primary keys, but I'm also looking to the future when the value of this key exceeds its bounds. At the moment, I'm working around it by simply using an INT type, which is huge, but I'd rather set it to a more reasonable type. The main problem is that this app often deletes big chunks of data and then re-adds them, which means the key increases very quickly.
I suppose I could run a query before each insert to find the lowest available value, but I'm wondering if anyone has a better way of handling automatic primary keys other than the simple auto_increment.
-- eth'nT
|
|
 | | From: | Christopher Browne | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | 19 Jan 2005 04:59:04 GMT |
|
|
 | Oops! Ethan T was seen spray-painting on a wall: > Hello all, > > I'm working on an application that uses normal auto_increment primary > keys, but I'm also looking to the future when the value of this key > exceeds its bounds. At the moment, I'm working around it by simply using > an INT type, which is huge, but I'd rather set it to a more reasonable > type. The main problem is that this app often deletes big chunks of data > and then re-adds them, which means the key increases very quickly. > > I suppose I could run a query before each insert to find the lowest > available value, but I'm wondering if anyone has a better way of > handling automatic primary keys other than the simple auto_increment.
You might want to check with the vendor documentation to see if there is some way to either: a) Use another type b) Use an object type that "cycles", perhaps using something resembling the SQL standard "sequence" objects c) Change that primary key to combine the (evidently NOT honestly unique) "ID" field with something else that might allow it to be unique.
Alternatively, if these generated values are unsuited for use as primary keys, you might consider extracting primary keys from some other source.
For instance, there are schemes for generating globally unique identifiers aka "guids;" one was standard as part of the DCE project, and was subsequently adopted for COM/DCOM. Most OSes have some library available to allow you to generate guids. -- output = reverse("gro.gultn" "@" "enworbbc") http://cbbrowne.com/info/rdbms.html MULTICS MAN!!!! With his power ring PL-1, backed by the mighty resources of the powerful H-6880, his faithful sidekick, the Fso Eagle, and his trusted gang: "The System Daemons", he fights a never-ending battle for truth, security, and the Honeywell Way! -- T. Kenney
|
|
 | | From: | Ethan T | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | Wed, 19 Jan 2005 09:54:03 -0500 |
|
|
 | Christopher Browne wrote: > You might want to check with the vendor documentation to see if there > is some way to either: > a) Use another type > b) Use an object type that "cycles", perhaps using something > resembling the SQL standard "sequence" objects > c) Change that primary key to combine the (evidently NOT > honestly unique) "ID" field with something else that might > allow it to be unique.
Sounds good - I like choice B best, so I'll see if any such option exists in MySQL.
-- eth'nT
|
|
 | | From: | andrewst at onetel.com | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | 19 Jan 2005 10:09:55 -0800 |
|
|
 | --CELKO-- wrote: > >> Your entire post seems to assume that my primary key is > user-controlled, yet I distinctly said it's an AUTO_INCREMENT key, > which means the > database determines it. << > > You missed the whole point. The database cannot determine a key. The > data model and the reality determine a key.
All right, I'll bite (and I know I'm going to regret this!)
Think of all the many situations in which data is assigned an arbitrary, usually numeric, key. Like a posting on an internet forum, or an accounting transaction, or a sales receipt, or a customer identifier, etc. These can be useful, e.g. so that a customer can phone up and say "my reference number is 1242137".
If we have a need for such arbitrary keys, why should the DBMS not generate them for us? Maybe doing so is orthogonal to managing the data, but it is damned convenient.
> Then this is an exposed physical locator and not a logical key at all
No it isn't - it is an arbitrary number that bears NO RELATION WHATEVER to a physical (or even a logical) location within the database.
> But then what is the logical key? The real PRIMARY KEY?
I do agree that real logical keys should be used instead of auto-generated keys when possible. But it isn't ALWAYS possible, is it? What is the natural key for a customer for example?
|
|
 | | From: | --CELKO-- | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | 19 Jan 2005 13:12:53 -0800 |
|
|
 | >> All right, I'll bite (and I know I'm going to regret this!) <<
I am not that bad! If I had any friends you could ask them :)
>> Think of all the many situations in which data is assigned an arbitrary, usually numeric, key. Like a posting on an internet forum, or an accounting transaction, or a sales receipt, or a customer identifier, etc. These can be useful, e.g. so that a customer can phone up and say "my reference number is 1242137". <<
Those are all very different cases. Accounting transactions use a huge number of industry standards --ever see an IBAN? ABA number? Customer identifiers usually have some intelligence in them and these days are more likely to be your email address if you hate shopping at a real store. Internet sites use timestamps, forum id and thread ids to locate a message, etc.
A few months ago, someone made the same argument for his CD collection. He argued that there was no industry standard code, completely forgetting about the UPC code on the box and the Schwann catalog numbers. The recording industry likes to track things .. all the way to your hard drive :) 90+% of the time, you can find an ANSI. ISO, or industry standard for things in a real database if you look. In the old days, we had the excuse that we did not have the time to do the research; today, you cannot serioulsy say "I do not want to talk to my client and learn his business" or "I cannot spell 'GOOGLE', but I am a professional programmer!"
>> If we have a need for such arbitrary keys, why should the DBMS not generate them for us? Maybe doing so is orthogonal to managing the data, but it is damned convenient.<<
Convenient? To whom? This job ain't about being convenient to the programmer -- amateurs amuse themselves; professional protect the data integrity of their client or employer.
For that <5% of the cases where you really are doing something new, I have several chapter in a few books about how to create encoding schemes. How many newbies even know what a check digit is?
>> No it isn't - it is an arbitrary number that bears NO RELATION WHATEVER to a physical (or even a logical) location within the database. <<
Wrong. If I put three rows into a table with an auto-numbering pseudo-column, they get auto-numbered in temporal sequence by a counter in the database. Very physical. if I insert a set of rows, they are auto-numbered in the physical order they reside in the hardware, not in one of the other (n!) ways they might have been inserted. How do I insert the set in parallel? You cannot; they have to be queued up to get their auto-number.
Think about it for two seconds. If they have neither logical nor physical relationships to the row, then you would not be able to use them to locate the row. Better yet, think about a hashing function. The hash number has nothing to do with the data (most of them are built from the binary representation of the data) or the physical location on a disk drive. But you go from the hash number to the hash table to a pointer that gives you the physical location on a disk drive. Think of IDENTITY as bad hashing algorithm that has no collisions.
>> I do agree that real logical keys should be used instead of auto-generated keys when possible. But it isn't ALWAYS possible, is it? What is the natural key for a customer for example? <<
What is the best drug to prescribe for a patient? And I want one and only one answer!! Silly, unh? Or very, very dangerous.
There is no single, magic answer. If my customers are businesses, then I'd use their DUNS number. If my customers are website users, I'd use their email address (and vaidate it with a password). If my customers are credit card users, I'd use the credit card number (and perhaps tie multiple credit cards together into a customer_id).
It takes SIX YEARS of full time employment to become a Journeyman union carpenter in New York State. Not a Master carpenter. But any kid who can pass an MS certification or read a book can get a job to design a database and write code for a hospital, power plant, etc. in a year or two.
How many of those kids would take the time to learn about and to write a diherdal five check digit function when correctness and safety matters in their system? From a more practical side, sequential numbering tends to cluster physical records and lead to index tree re-balancing. A non-repeating, near-random series is better. Ever see anyone write the algorithms for this in their code when they could just stick in an IDENTITY?
|
|
 | | From: | --CELKO-- | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | 20 Jan 2005 07:01:22 -0800 |
|
|
 | >> Given, say, a circuit analysis class, there are several bits of data needed to make a unique record [sic]. Name: ECE 2040, section A, lecture 2. I could use a primary key of "ece2040a2" .. <<
Or you could do it right and use a multi-column PRIMARY KEY. If the business is that those columns have to be unique, then you have to enforce it. Where did you get the idea that a key is one column?
In tree-index systems, the order of the columns in the key declaration is important can buy you a lot of extra power; index (a,b,c) can also search on (a,b) and (a). This is much better than a single column.
|
|
 | | From: | --CELKO-- | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | 19 Jan 2005 07:26:11 -0800 |
|
|
 | >> Your entire post seems to assume that my primary key is user-controlled, yet I distinctly said it's an AUTO_INCREMENT key, which means the database determines it. <<
You missed the whole point. The database cannot determine a key. The data model and the reality determine a key.
>> In fact, each time the user saves the data in question, the database will recreate those keys and they'll change every time. That's the whole problem - the numeric value of the key keeps increasing. <<
Then this is an exposed physical locator and not a logical key at all
>> AUTO_INCREMENT exists chiefly when generating primary keys in MySQL, so I seriously doubt "such a creature cannot be a key". That's its entire purpose, to be an invisible surrogate. <<
If it is invisible to the user, fine -- it is just like a pointer chain or hash bucket (Codd has a long, boring quote about this). The system takes care of it and you never worry about it. But then what is the logical key? The real PRIMARY KEY?
|
|
 | | From: | Ethan T | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | Wed, 19 Jan 2005 11:05:42 -0500 |
|
|
 | --CELKO-- wrote: > If it is invisible to the user, fine -- it is just like a pointer chain > or hash bucket (Codd has a long, boring quote about this). The system > takes care of it and you never worry about it. > But then what is the logical key? The real PRIMARY KEY?
I see what you mean, but I'm not sure there really is a logical primary key that can be summed up into a single value without reproducing the rest of the data in the record. These are classes assigned to a particular classroom (via its ID, another auto_increment, but the 'classrooms' table changes much less often). Given, say, a circuit analysis class, there are several bits of data needed to make a unique record. Name: ECE 2040, section A, lecture 2. I could use a primary key of "ece2040a2", but
a) that's a string, which I would think is slower than an integer, and b) it would have to be altered if the structure of the classes was altered (like getting rid of sections).
Since there's no clear number that makes the record unique, I just use an auto_increment primary key. The value isn't even referenced, I just have it there because primary keys are needed for administration. Modifying a record is much easier if there's a primary key, as you probably know. I have other indices on the more useful fields.
I'm not just blindly shooting down your points, they're good ones, but I think it's possible that such a "key" as you state doesn't always exist.
-- eth'nT
|
|
 | | From: | Lee Fesperman | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | Fri, 21 Jan 2005 02:04:22 GMT |
|
|
 | --CELKO-- wrote: > > >> All right, I'll bite (and I know I'm going to regret this!) << > > I am not that bad! If I had any friends you could ask them :)
Hey Joe, I'll vouch for you as a friend!
-- Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com) ============================================================== * The Ultimate DBMS is here! * FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)
|
|
 | | From: | ak_tiredofspam at yahoo.com | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | 19 Jan 2005 15:25:54 -0800 |
|
|
 | >> >> No it isn't - it is an arbitrary number that bears NO RELATION WHATEVER to a physical (or even a logical) location within the database .. << Wrong. If I put three rows into a table with an auto-numbering pseudo-column, they get auto-numbered in temporal sequence by a counter in the database. Very physical. if I insert a set of rows, they are auto-numbered in the physical order they reside in the hardware, not in one of the other (n!) ways they might have been inserted. How do I insert the set in parallel? You cannot; they have to be queued up to get their auto-number
....<<
Joe, this is incorrect even if an auto-numbering pseudo-column (such as IDENTITY is MS SQL Server) is used. If there is a clustering index, it will dictate the physical order. Yet the idenity values will auto-increment in the same way as if clustering index did not impose a specific physical order.
besides, there are other ways to generate a surrogate key So this time your opponent is right
|
|
 | | From: | --CELKO-- | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | 18 Jan 2005 18:03:44 -0800 |
|
|
 | >> I'm working on an application that uses normal auto_increment primary keys,.. <<
Actually, BY DEFINITION such a creature cannot be a key and has nothing to do with normalizing your schema. Why don't you find an industry standard or a natural key and use it instead of trying to mimic a sequential tape file system in SQL? You seem need a course in data modeling. Here are some quotes:
"Identity primary keys are for people who believe there's never time to design a table right but there's always time to do it over." -- Ron Soukup, former SQL Server project manager
A quote from Dr. Codd: "..Database users may cause the system to generate or delete a surrogate, but they have no control over its value, nor is its value ever displayed to them ..."(Dr. Codd in ACM TODS, pp 409-410) and Codd, E. (1979), Extending the database relational model to capture more meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434.
This means that a surrogate ought to act like an index; created by the user, managed by the system and NEVER seen by a user. That means never used in queries, DRI or anything else that a user does.
Dr. Codd also wrote the following:
"There are three difficulties in employing user-controlled keys as permanent surrogates for entities.
(1) The actual values of user-controlled keys are determined by users and must therefore be subject to change by them (e.g. if two companies merge, the two employee databases might be combined with the result that some or all of the serial numbers might be changed.).
(2) Two relations may have user-controlled keys defined on distinct domains (e.g. one uses social security, while the other uses employee serial numbers) and yet the entities denoted are the same.
(3) It may be necessary to carry information about an entity either before it has been assigned a user-controlled key value or after it has ceased to have one (e.g. and applicant for a job and a retiree).
These difficulties have the important consequence that an equi-join on common key values may not yield the same result as a join on common entities. A solution - proposed in part [4] and more fully in [14] - is to introduce entity domains which contain system-assigned surrogates. Database users may cause the system to generate or delete a surrogate, but they have no control over its value, nor is its value ever displayed to them....." (Codd in ACM TODS, pp 409-410).
References
Codd, E. (1979), Extending the database relational model to capture more meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434
|
|
 | | From: | Ethan T | | Subject: | Re: Handling auto_increment primary keys in MySQL | | Date: | Tue, 18 Jan 2005 22:32:55 -0500 |
|
|
 | --CELKO-- wrote: > A quote from Dr. Codd: "..Database users may cause the system to > generate or delete a surrogate, but they have no control over its > value, nor is its value ever displayed to them ..."
Your entire post seems to assume that my primary key is user-controlled, yet I distinctly said it's an AUTO_INCREMENT key, which means the database determines it. In fact, each time the user saves the data in question, the database will recreate those keys and they'll change every time. That's the whole problem - the numeric value of the key keeps increasing.
AUTO_INCREMENT exists chiefly when generating primary keys in MySQL, so I seriously doubt "such a creature cannot be a key". That's its entire purpose, to be an invisible surrogate.
I appreciate the length of your post, but I think you misunderstand my question. Am I mistaken?
-- eth'nT
|
|
|