Saturday, February 25, 2012

Auto-increment PK - Use MS or Grow your own ?

SQL 2000

I thought I would throw this out there for some feedback from others.

I'd like to know if you feel using MS auto-increment field is a good
solution these days or should one grow their own ?

Thanks,

Me.Definitely use SQL Servers to auto-increment the primary key field.
Then there is no chance of duplicates.|||I can give you a definite Maybe.

I've used both depending on:

* Do you care that numbers may be missing in the sequence if insert
transactions get rolled back?
* Are you making the Primary Key the table's clustered index?
* Do you need to populate the value of that key into other tables within
the same procedure?

<csomberg@.dwr.com> wrote in message
news:1109115558.223984.69950@.z14g2000cwz.googlegro ups.com...
> SQL 2000
> I thought I would throw this out there for some feedback from others.
> I'd like to know if you feel using MS auto-increment field is a good
> solution these days or should one grow their own ?
> Thanks,
> Me.|||This question is bound to start a few of the regulars off on one of
their favorite arguements. Still to add my tuppence worth, I'd say it's
OK to use it under some circumstances. When and where depends entirely
on the solution you are providing and the data you are working with.

Surrogate keys in my opinion are valid and often much more simple than
a primary key of several columns. Having said that, you do get
instances with 'gaps' in the number sequence so this may not be
something you want. You could go down the route of generating a
surrogate key yourself (your original question), but I would hesitate
if IDENTITY offers you the same advantages as you may introduce more
problems depending on your approach. If possible and where logical, use
a properly defined primary key, but use your head and decide what is
most appropriate.

I know full well that some of the regulars will not agree with me and
that some will. However, it is my opinion and they are entitled to
theirs. I would suggest looking up 'Occams Razor' for anyone who
disagrees ( http://pespmc1.vub.ac.be/ASC/OCCAM'_RAZOR.html )

Going back to your question, if you 'grow your own' then this seems
reasonable if you are doing something that using IDENTITY doesn't allow
you. If it's the same, then why bother ?

Ryan

csomberg@.dwr.com wrote:
> SQL 2000
> I thought I would throw this out there for some feedback from others.
> I'd like to know if you feel using MS auto-increment field is a good
> solution these days or should one grow their own ?
> Thanks,
> Me.|||If you want an artificial key then use the feature provided - it's the
most efficient method. The harder question is, do you really want an
artificial key?

--
David Portas
SQL Server MVP
--|||(csomberg@.dwr.com) writes:
> I thought I would throw this out there for some feedback from others.
> I'd like to know if you feel using MS auto-increment field is a good
> solution these days or should one grow their own ?

Depends. If you need consecutive numbers, forget about IDENTITY. If you
want high scalability and don't want to have a hot spot on the current
key value, use IDENTITY.

Another situation where IDENTITY is difficult is when you insert many rows
in one table, and then need to know the values for inserts into a child
table.

If none of this applies, it's a toss-up. IDENTITY is somewhat simpler to
use, but there are some gotchas in odd situations. Rolling your own
is simple as well.

A general remark is that whatever method you use, don't use it for
every table. For "top" concepts like customers, orders, products it
may be inevitable. But for derived concepts that refers to other
concepts, there is rarely any need for artificial keys.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||On 23 Feb 2005 01:16:27 -0800, "Ryan" <ryanofford@.hotmail.com> wrote:

>This question is bound to start a few of the regulars off on one of
>their favorite arguements. Still to add my tuppence worth, I'd say it's

Don't you mean "tupple's worth"? <g>|||On 22 Feb 2005 15:39:18 -0800, csomberg@.dwr.com wrote:

>SQL 2000
>I thought I would throw this out there for some feedback from others.
>I'd like to know if you feel using MS auto-increment field is a good
>solution these days or should one grow their own ?
>Thanks,
>Me.

I often use a mix.

As has been pointed out here, if you need to ensure there no gaps, IDENTITY is
not your friend. To me, however, the kind of numbering that needs no gaps
should be a logical key, not a physical/surrogate key, so that's no issue.

Where I have found IDENTITY to be limiting are as follows.

1. I want to be able to merge data sets from 2 or more separate databases
without using a proprietary replication system. In this case, I like to use
some kind of variation on the GUID. Using the global key as the primary key
keeps merge processes simpler than if the global key is used in addition to a
local primary key

2. I need to generate sequences of master-detail sets in stored procedures.

In case #2, the problem is that we want to try to use set operations, not
cursors, and there's no good way to figure out the IDs of the master records
created in one query, so you can use them to create matching details in a
subsequent query. On the other hand, if you have a shared counter, you obtain
a count of the master records to be added, get the current counter value, and
update the counter, adding the master-count to its value. Reading and
updating the counter can be in its own short transaction to reduce blocking
overhead, since we should not care if we add a gap, but don't end up adding
the records with those keys.|||"Steve Jorgensen" <nospam@.nospam.nospam> wrote in message
news:1f4q11d4hnd9sa440mst33f4chnis0ffma@.4ax.com...
> On 23 Feb 2005 01:16:27 -0800, "Ryan" <ryanofford@.hotmail.com> wrote:
>>This question is bound to start a few of the regulars off on one of
>>their favorite arguements. Still to add my tuppence worth, I'd say it's
> Don't you mean "tupple's worth"? <g
From the definition of the DIF file format:
It uses the terms vector and TUPLE.
You may generally interpret vector as column and tuple as row.

From the Mary Poppins Sound Track:
With TUPPENCE for paper and strings
You can have your own set of wings
With your feet on the ground
You're a bird in a flight
With your fist holding tight
To the string of your kite

I hope I've cleared that up,

Oh my ... It seems I'm out of Pinot.|||On Thu, 24 Feb 2005 05:02:55 GMT, "David Rawheiser" <rawhide58@.hotmail.com>
wrote:

>"Steve Jorgensen" <nospam@.nospam.nospam> wrote in message
>news:1f4q11d4hnd9sa440mst33f4chnis0ffma@.4ax.com...
>> On 23 Feb 2005 01:16:27 -0800, "Ryan" <ryanofford@.hotmail.com> wrote:
>>
>>>This question is bound to start a few of the regulars off on one of
>>>their favorite arguements. Still to add my tuppence worth, I'd say it's
>>
>> Don't you mean "tupple's worth"? <g>
>From the definition of the DIF file format:
> It uses the terms vector and TUPLE.
> You may generally interpret vector as column and tuple as row.
>From the Mary Poppins Sound Track:
> With TUPPENCE for paper and strings
> You can have your own set of wings
> With your feet on the ground
> You're a bird in a flight
> With your fist holding tight
> To the string of your kite
>I hope I've cleared that up,
>Oh my ... It seems I'm out of Pinot.

LOL|||Oh no ! I'm going to be humming that all day now ! :-)

> From the Mary Poppins Sound Track:
> With TUPPENCE for paper and strings
> You can have your own set of wings
> With your feet on the ground
> You're a bird in a flight
> With your fist holding tight
> To the string of your kite|||>> I'd like to know if you feel using MS auto-increment field is a good
solution these days or should one grow their own ? <<

What did you want to use it for?

It is fine for adding a reference number to a cursor, which is a
sequential file structure. But you would never use it inside the
schema for anything. That would that your data model is all screwed up
and has the PHYSICAL state of the machine mixed with the LOGICAL data
model.

Newbies often use IDENTITY and the like to substitute for the pointer
chains they had in IDMS, IMS, TOTAL and other pre-RDBMS databases.

if they are really screwed up, they use them for keys and do not have
natural keys. There is no way to verifiy or validate the data and the
schema loses data integrity.

If I told you that the best key is the 17 digit Hebrew number which God
assigned to all things in creation, you would think I was nuts. There
is no magic universal key; you actually have to do some work when you
design a schema.

We have a lot of problems with terminology on this one, so let me get
that out of the way.

There is no such thing as a "universal, one-size-fits-all" key. Just
as no two sets of entities are the same, the attributes that make them
unique have to be found in the reality of the data. Here is my
classification of types of keys:

natural artificial exposed surrogate
================================================== ================
Constructed from reality |
of the data model | Y N N Y
|
verifiable in reality | Y N N N
|
verifiable in itself | Y Y N N
|
visible to the user | Y Y Y N

1) A natural key is a subset of attributes which occur in a table and
act as a unique identifier. They are seen by the user. You can go to
the external reality and verify them. you would also like to have some
validation rule. Example: UPC codes on consumer goods (read the
package barcode) and validate them with a check digit or a
manufacturer's website, geographical co-ordinates (get a GPS).

2) An artificial key is an extra attribute added to the table which is
seen by the user. It does not exist in the external reality, but can
be verified for syntax or check digits inside itself.

Example: the open codes in the UPC scheme which a user can assign to
his own stuff. The check digits still work, but you have to verify
them inside your own enterprise.

If you have to construct a key yourself, it takes time to deisgn them,
to invetn a validation rule, etc.

3) An "exposed physical locator" is not based on attributes in the data
model and is exposed to user. There is no way to predict it or verify
it. The system obtains a value thru some physical process in the
storage hardware totally unrelated to the logical data model. Example:
IDENTITY columns, other proprietary, non-relaitonal auto-numbering
devices.

Technically, these are not really keys at all, sinc they are attributes
of the PHYSICAL storage and are not even part of the LOGICAL data
model. But they are handy for lazry, non-RDBMS programmers who don't
want to research or think! This is the worst way to program in SQL.

4) A surrogate key is system generated to replace the actual key behind
the covers where the user never sees it. It is based on attributes in
the table. Example: Teradata hashing algorithms, pointer chains.

The fact that you can never see it or use it for DELETE and UPDATE or
create it for INSERT is vital. When users can get to them, they will
screw up the data integrity by getting the real keys and these physical
locators out of synch. The system must maintain them.

** Notice that people get "exposed physical locator" and surrogate
mixed up; they are totally different concepts. **

An appeal to authority, with a quote from Dr. Codd: "..Database users
may cause the system to generate or delete a surrogate, but they have
no control over its value, nor is its value ever displayed to them
..."(Dr. Codd in ACM TODS, pp 409-410) and Codd, E. (1979), Extending
the database relational model to capture more meaning. ACM
Transactions on Database Systems, 4(4). pp. 397-434.

This means that a surrogate ought to act like an index; created by the
user, managed by the system and NEVER seen by a user. That means never
used in queries, DRI or anything else that a user does.

Codd also wrote the following:

"There are three difficulties in employing user-controlled keys as
permanent surrogates for entities.

(1) The actual values of user-controlled keys are determined by users
and must therefore be subject to change by them (e.g. if two companies
merge, the two employee databases might be combined with the result
that some or all of the serial numbers might be changed.).

(2) Two relations may have user-controlled keys defined on distinct
domains (e.g. one uses social security, while the other uses employee
serial numbers) and yet the entities denoted are the same.

(3) It may be necessary to carry information about an entity either
before it has been assigned a user-controlled key value or after it has
ceased to have one (e.g. and applicant for a job and a retiree).

These difficulties have the important consequence that an equi-join on
common key values may not yield the same result as a join on common
entities. A solution - proposed in part [4] and more fully in [14] -
is to introduce entity domains which contain system-assigned
surrogates. Database users may cause the system to generate or delete
a surrogate, but they have no control over its value, nor is its value
ever displayed to them...." (Codd in ACM TODS, pp 409-410).

References

Codd, E. (1979), Extending the database relational model to capture
more meaning. ACM Transactions on Database Systems, 4(4). pp. 397-434

The steps for finding a key are

1) Look for an industry standard and the trusted source that maintains
it.

2) Look for a natural key in the attributes. Example: (longitude,
latitude) makes a good key for a geographical location.

3) If you must design a new identifier, plan it carefully -- especially
if people will see and use it. You have to be able to validate it in
application programs, so you need a regular expression, other syntax
rule and/or check digits. You have to be able to be verify in the
reality of the model or with a trusted source.

No comments:

Post a Comment