knowledge-database (beta)

Current group: comp.databases.

Handling auto_increment primary keys in MySQL

Handling auto_increment primary keys in MySQL  
Ethan T
 Re: Handling auto_increment primary keys in MySQL  
Christopher Browne
 Re: Handling auto_increment primary keys in MySQL  
Ethan T
 Re: Handling auto_increment primary keys in MySQL  
andrewst at onetel.com
 Re: Handling auto_increment primary keys in MySQL  
--CELKO--
 Re: Handling auto_increment primary keys in MySQL  
--CELKO--
 Re: Handling auto_increment primary keys in MySQL  
--CELKO--
 Re: Handling auto_increment primary keys in MySQL  
Ethan T
 Re: Handling auto_increment primary keys in MySQL  
Lee Fesperman
 Re: Handling auto_increment primary keys in MySQL  
ak_tiredofspam at yahoo.com
 Re: Handling auto_increment primary keys in MySQL  
--CELKO--
 Re: Handling auto_increment primary keys in MySQL  
Ethan T
From:Ethan T
Subject:Handling auto_increment primary keys in MySQL
Date:Tue, 18 Jan 2005 16:28:27 -0500
Hello all,

I'm working on an application that uses normal auto_increment primary
keys, but I'm also looking to the future when the value of this key
exceeds its bounds. At the moment, I'm working around it by simply using
an INT type, which is huge, but I'd rather set it to a more reasonable
type. The main problem is that this app often deletes big chunks of data
and then re-adds them, which means the key increases very quickly.

I suppose I could run a query before each insert to find the lowest
available value, but I'm wondering if anyone has a better way of
handling automatic primary keys other than the simple auto_increment.

--
eth'nT
From:Christopher Browne
Subject:Re: Handling auto_increment primary keys in MySQL
Date:19 Jan 2005 04:59:04 GMT
Oops! Ethan T was seen spray-painting on a wall:
> Hello all,
>
> I'm working on an application that uses normal auto_increment primary
> keys, but I'm also looking to the future when the value of this key
> exceeds its bounds. At the moment, I'm working around it by simply using
> an INT type, which is huge, but I'd rather set it to a more reasonable
> type. The main problem is that this app often deletes big chunks of data
> and then re-adds them, which means the key increases very quickly.
>
> I suppose I could run a query before each insert to find the lowest
> available value, but I'm wondering if anyone has a better way of
> handling automatic primary keys other than the simple auto_increment.

You might want to check with the vendor documentation to see if there
is some way to either:
a) Use another type
b) Use an object type that "cycles", perhaps using something
resembling the SQL standard "sequence" objects
c) Change that primary key to combine the (evidently NOT
honestly unique) "ID" field with something else that might
allow it to be unique.

Alternatively, if these generated values are unsuited for use as
primary keys, you might consider extracting primary keys from some
other source.

For instance, there are schemes for generating globally unique
identifiers aka "guids;" one was standard as part of the DCE project,
and was subsequently adopted for COM/DCOM. Most OSes have some
library available to allow you to generate guids.
--
output = reverse("gro.gultn" "@" "enworbbc")
http://cbbrowne.com/info/rdbms.html
MULTICS MAN!!!!
With his power ring PL-1, backed by the mighty resources of the powerful
H-6880, his faithful sidekick, the Fso Eagle, and his trusted gang: "The
System Daemons", he fights a never-ending battle for truth, security, and
the Honeywell Way!
-- T. Kenney
From:Ethan T
Subject:Re: Handling auto_increment primary keys in MySQL
Date:Wed, 19 Jan 2005 09:54:03 -0500
Christopher Browne wrote:
> You might want to check with the vendor documentation to see if there
> is some way to either:
> a) Use another type
> b) Use an object type that "cycles", perhaps using something
> resembling the SQL standard "sequence" objects
> c) Change that primary key to combine the (evidently NOT
> honestly unique) "ID" field with something else that might
> allow it to be unique.

Sounds good - I like choice B best, so I'll see if any such option
exists in MySQL.

--
eth'nT
From:andrewst at onetel.com
Subject:Re: Handling auto_increment primary keys in MySQL
Date:19 Jan 2005 10:09:55 -0800
--CELKO-- wrote:
> >> Your entire post seems to assume that my primary key is
> user-controlled, yet I distinctly said it's an AUTO_INCREMENT key,
> which means the
> database determines it. <<
>
> You missed the whole point. The database cannot determine a key.
The
> data model and the reality determine a key.

All right, I'll bite (and I know I'm going to regret this!)

Think of all the many situations in which data is assigned an
arbitrary, usually numeric, key. Like a posting on an internet forum,
or an accounting transaction, or a sales receipt, or a customer
identifier, etc. These can be useful, e.g. so that a customer can
phone up and say "my reference number is 1242137".

If we have a need for such arbitrary keys, why should the DBMS not
generate them for us? Maybe doing so is orthogonal to managing the
data, but it is damned convenient.

> Then this is an exposed physical locator and not a logical key at
all

No it isn't - it is an arbitrary number that bears NO RELATION WHATEVER
to a physical (or even a logical) location within the database.

> But then what is the logical key? The real PRIMARY KEY?

I do agree that real logical keys should be used instead of
auto-generated keys when possible. But it isn't ALWAYS possible, is
it? What is the natural key for a customer for example?
From:--CELKO--
Subject:Re: Handling auto_increment primary keys in MySQL
Date:19 Jan 2005 13:12:53 -0800
>> All right, I'll bite (and I know I'm going to regret this!) <<

I am not that bad! If I had any friends you could ask them :)

>> Think of all the many situations in which data is assigned an
arbitrary, usually numeric, key. Like a posting on an internet forum,
or an accounting transaction, or a sales receipt, or a customer
identifier, etc. These can be useful, e.g. so that a customer can phone
up and say "my reference number is 1242137". <<

Those are all very different cases. Accounting transactions use a huge
number of industry standards --ever see an IBAN? ABA number? Customer
identifiers usually have some intelligence in them and these days are
more likely to be your email address if you hate shopping at a real
store. Internet sites use timestamps, forum id and thread ids to
locate a message, etc.

A few months ago, someone made the same argument for his CD collection.
He argued that there was no industry standard code, completely
forgetting about the UPC code on the box and the Schwann catalog
numbers. The recording industry likes to track things .. all the way
to your hard drive :) 90+% of the time, you can find an ANSI. ISO, or
industry standard for things in a real database if you look. In the
old days, we had the excuse that we did not have the time to do the
research; today, you cannot serioulsy say "I do not want to talk to my
client and learn his business" or "I cannot spell 'GOOGLE', but I am a
professional programmer!"

>> If we have a need for such arbitrary keys, why should the DBMS not
generate them for us? Maybe doing so is orthogonal to managing the
data, but it is damned convenient.<<

Convenient? To whom? This job ain't about being convenient to the
programmer -- amateurs amuse themselves; professional protect the data
integrity of their client or employer.

For that <5% of the cases where you really are doing something new, I
have several chapter in a few books about how to create encoding
schemes. How many newbies even know what a check digit is?

>> No it isn't - it is an arbitrary number that bears NO RELATION
WHATEVER
to a physical (or even a logical) location within the database. <<

Wrong. If I put three rows into a table with an auto-numbering
pseudo-column, they get auto-numbered in temporal sequence by a counter
in the database. Very physical. if I insert a set of rows, they are
auto-numbered in the physical order they reside in the hardware, not in
one of the other (n!) ways they might have been inserted. How do I
insert the set in parallel? You cannot; they have to be queued up to
get their auto-number.

Think about it for two seconds. If they have neither logical nor
physical relationships to the row, then you would not be able to use
them to locate the row. Better yet, think about a hashing function.
The hash number has nothing to do with the data (most of them are built
from the binary representation of the data) or the physical location on
a disk drive. But you go from the hash number to the hash table to a
pointer that gives you the physical location on a disk drive. Think of
IDENTITY as bad hashing algorithm that has no collisions.

>> I do agree that real logical keys should be used instead of
auto-generated keys when possible. But it isn't ALWAYS possible, is it?
What is the natural key for a customer for example? <<

What is the best drug to prescribe for a patient? And I want one and
only one answer!! Silly, unh? Or very, very dangerous.

There is no single, magic answer. If my customers are businesses, then
I'd use their DUNS number. If my customers are website users, I'd use
their email address (and vaidate it with a password). If my customers
are credit card users, I'd use the credit card number (and perhaps tie
multiple credit cards together into a customer_id).

It takes SIX YEARS of full time employment to become a Journeyman union
carpenter in New York State. Not a Master carpenter. But any kid who
can pass an MS certification or read a book can get a job to design a
database and write code for a hospital, power plant, etc. in a year or
two.

How many of those kids would take the time to learn about and to write
a diherdal five check digit function when correctness and safety
matters in their system? From a more practical side, sequential
numbering tends to cluster physical records and lead to index tree
re-balancing. A non-repeating, near-random series is better. Ever see
anyone write the algorithms for this in their code when they could just
stick in an IDENTITY?
From:--CELKO--
Subject:Re: Handling auto_increment primary keys in MySQL
Date:20 Jan 2005 07:01:22 -0800
>> Given, say, a circuit analysis class, there are several bits of data
needed to make a unique record [sic]. Name: ECE 2040, section A,
lecture 2. I could use a primary key of "ece2040a2" .. <<

Or you could do it right and use a multi-column PRIMARY KEY. If the
business is that those columns have to be unique, then you have to
enforce it. Where did you get the idea that a key is one column?

In tree-index systems, the order of the columns in the key declaration
is important can buy you a lot of extra power; index (a,b,c) can also
search on (a,b) and (a). This is much better than a single column.
From:--CELKO--
Subject:Re: Handling auto_increment primary keys in MySQL
Date:19 Jan 2005 07:26:11 -0800
>> Your entire post seems to assume that my primary key is
user-controlled, yet I distinctly said it's an AUTO_INCREMENT key,
which means the
database determines it. <<

You missed the whole point. The database cannot determine a key. The
data model and the reality determine a key.

>> In fact, each time the user saves the data in
question, the database will recreate those keys and they'll change
every time. That's the whole problem - the numeric value of the key
keeps increasing. <<

Then this is an exposed physical locator and not a logical key at all

>> AUTO_INCREMENT exists chiefly when generating primary keys in MySQL,
so
I seriously doubt "such a creature cannot be a key". That's its entire
purpose, to be an invisible surrogate. <<

If it is invisible to the user, fine -- it is just like a pointer chain
or hash bucket (Codd has a long, boring quote about this). The system
takes care of it and you never worry about it.
But then what is the logical key? The real PRIMARY KEY?
From:Ethan T
Subject:Re: Handling auto_increment primary keys in MySQL
Date:Wed, 19 Jan 2005 11:05:42 -0500
--CELKO-- wrote:
> If it is invisible to the user, fine -- it is just like a pointer chain
> or hash bucket (Codd has a long, boring quote about this). The system
> takes care of it and you never worry about it.
> But then what is the logical key? The real PRIMARY KEY?

I see what you mean, but I'm not sure there really is a logical primary
key that can be summed up into a single value without reproducing the
rest of the data in the record. These are classes assigned to a
particular classroom (via its ID, another auto_increment, but the
'classrooms' table changes much less often). Given, say, a circuit
analysis class, there are several bits of data needed to make a unique
record. Name: ECE 2040, section A, lecture 2. I could use a primary key
of "ece2040a2", but

a) that's a string, which I would think is slower than an integer, and
b) it would have to be altered if the structure of the classes was
altered (like getting rid of sections).

Since there's no clear number that makes the record unique, I just use
an auto_increment primary key. The value isn't even referenced, I just
have it there because primary keys are needed for administration.
Modifying a record is much easier if there's a primary key, as you
probably know. I have other indices on the more useful fields.

I'm not just blindly shooting down your points, they're good ones, but I
think it's possible that such a "key" as you state doesn't always exist.

--
eth'nT
From:Lee Fesperman
Subject:Re: Handling auto_increment primary keys in MySQL
Date:Fri, 21 Jan 2005 02:04:22 GMT
--CELKO-- wrote:
>
> >> All right, I'll bite (and I know I'm going to regret this!) <<
>
> I am not that bad! If I had any friends you could ask them :)

Hey Joe, I'll vouch for you as a friend!

--
Lee Fesperman, FFE Software, Inc. (http://www.firstsql.com)
==============================================================
* The Ultimate DBMS is here!
* FirstSQL/J Object/Relational DBMS (http://www.firstsql.com)
From:ak_tiredofspam at yahoo.com
Subject:Re: Handling auto_increment primary keys in MySQL
Date:19 Jan 2005 15:25:54 -0800
>> >> No it isn't - it is an arbitrary number that bears NO RELATION
WHATEVER
to a physical (or even a logical) location within the database
.. <<
Wrong. If I put three rows into a table with an auto-numbering
pseudo-column, they get auto-numbered in temporal sequence by a counter
in the database. Very physical. if I insert a set of rows, they are
auto-numbered in the physical order they reside in the hardware, not in
one of the other (n!) ways they might have been inserted. How do I
insert the set in parallel? You cannot; they have to be queued up to
get their auto-number

....<<

Joe, this is incorrect even if an auto-numbering pseudo-column (such as
IDENTITY is MS SQL Server) is used. If there is a clustering index, it
will dictate the physical order. Yet the idenity values will
auto-increment in the same way as if clustering index did not impose a
specific physical order.

besides, there are other ways to generate a surrogate key
So this time your opponent is right
From:--CELKO--
Subject:Re: Handling auto_increment primary keys in MySQL
Date:18 Jan 2005 18:03:44 -0800
>> I'm working on an application that uses normal auto_increment
primary keys,.. <<

Actually, BY DEFINITION such a creature cannot be a key and has nothing
to do with normalizing your schema. Why don't you find an industry
standard or a natural key and use it instead of trying to mimic a
sequential tape file system in SQL? You seem need a course in data
modeling. Here are some quotes:

"Identity primary keys are for people who believe there's never time to
design a table right but there's always time to do it over." -- Ron
Soukup, former SQL Server project manager

A quote from Dr. Codd: "..Database users may cause the system to
generate or delete a surrogate, but they have no control over its
value, nor is its value ever displayed to them ..."(Dr. Codd in ACM
TODS, pp 409-410) and Codd, E. (1979), Extending the database
relational model to capture more meaning. ACM Transactions on Database
Systems, 4(4). pp. 397-434.

This means that a surrogate ought to act like an index; created by the
user, managed by the system and NEVER seen by a user. That means never
used in queries, DRI or anything else that a user does.

Dr. Codd also wrote the following:

"There are three difficulties in employing user-controlled keys as
permanent surrogates for entities.

(1) The actual values of user-controlled keys are determined by users
and must therefore be subject to change by them (e.g. if two companies
merge, the two employee databases might be combined with the result
that some or all of the serial numbers might be changed.).

(2) Two relations may have user-controlled keys defined on distinct
domains (e.g. one uses social security, while the other uses employee
serial numbers) and yet the entities denoted are the same.

(3) It may be necessary to carry information about an entity either
before it has been assigned a user-controlled key value or after it has
ceased to have one (e.g. and applicant for a job and a retiree).

These difficulties have the important consequence that an equi-join on
common key values may not yield the same result as a join on common
entities. A solution - proposed in part [4] and more fully in [14] -
is to introduce entity domains which contain system-assigned
surrogates. Database users may cause the system to generate or delete
a surrogate, but they have no control over its value, nor is its value
ever displayed to them....." (Codd in ACM TODS, pp 409-410).

References

Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434
From:Ethan T
Subject:Re: Handling auto_increment primary keys in MySQL
Date:Tue, 18 Jan 2005 22:32:55 -0500
--CELKO-- wrote:
> A quote from Dr. Codd: "..Database users may cause the system to
> generate or delete a surrogate, but they have no control over its
> value, nor is its value ever displayed to them ..."

Your entire post seems to assume that my primary key is user-controlled,
yet I distinctly said it's an AUTO_INCREMENT key, which means the
database determines it. In fact, each time the user saves the data in
question, the database will recreate those keys and they'll change every
time. That's the whole problem - the numeric value of the key keeps
increasing.

AUTO_INCREMENT exists chiefly when generating primary keys in MySQL, so
I seriously doubt "such a creature cannot be a key". That's its entire
purpose, to be an invisible surrogate.

I appreciate the length of your post, but I think you misunderstand my
question. Am I mistaken?

--
eth'nT
   

Copyright © 2006 knowledge-database   -   All rights reserved