Posts Tagged ‘Composite Key’

I ran into an interesting situation today helping a colleague with her homework.  This person does not have a background in programming or databases and is taking a series of courses related to HIT (health information technology).  In this course she was presented with an ERD (entity relationship diagram) which she had to modify based on the requirements of a given CCD (continuity of care document).  This is the diagram this person was given.

ERD Example

We have to remember this diagram was given to someone with almost no relational database experience.  I can clearly see a few issues where I would have done things differently.  I would assert that the technology one is using, and perhaps more importantly, the technology one has had experience with alters your view.  What do I mean by this?  A developer with 10 or more years has probably spent a lot of time with raw SQL before moving on to working with entities where a developer with only a year or two of experience may have never even worked with raw SQL.  A developer in a small firm with little QA that has had to run analysis queries at 3am versus a 9-5 developer at a large corporation with CMMI level 5 documentation and more than adequate QA will also change your view. This also can be determined by the technology.  I had a very experienced developer of 20+ years who had spent almost all of his time working on the Microsoft stack.  He only knew the very rudimentary basics of SQL as he always had a DBA to assist him with performance and optimization of queries (in fact I believe he just gave the DBA the basic of what was needed and depended on him).  This developer was able to easily pick up working with straight SQL and eventually created some advanced queries that were ingenious.

Getting back on topic, this diagram got me to thinking about the drawbacks of not using a sequence number on the row data and instead using a composite primary key.  In one of my past life’s we had a sequence number (acting as a persistent object identifier) on almost every table in the database.  I saw a discussion recently which was quite involved where the DBA’s where eschewing the concept of an identity column.  What are the advantages and disadvantages of using a sequence number on a table? From what I can tell:

Advantages:

  • Normalization
    • Join Dependency – DBA’s will often argue that a sequence number is superfluous and not needed.  A developer will often complain that it convolutes the join they need to code.  It really gets down to the details of the table.  In the ERD above to perform a join on the encounter to patient we have to reference two columns to obtain a truly unique id.  If the sequence number is a primary key we would be able to simplify this greatly.  In a typical entity this is a non-issue.
    • Consistency
      • Consistency across tables isn’t a big issue when using entities, you code once, and load the entity in the persistence manager and you’re done.  However, at 3am when a developer needs to perform an analysis quickly, half asleep it becomes an issue.  Say that you have to find a count of encounter per encounter date for patients with the last name starting with ‘A’.
 select count(e.encounter_id), to_char(e.encounter_date, ‘MM/dd/yyyy’) from person p
left outer join patient pa on pa.person_oid=p.person_oid and pa.person_id=p.person_id
left outer join encounter e on e.patient_oid=pa.patient_oid and e.patient_id=pa.patient_id
where upper(p.last_name) like ‘A%’ group by to_char(e.encounter_date, ‘MM/dd/yyyy’) 

Ugh, that’s gnarly. Why the separate table for a patient object? Remember I didn’t make these. Reformatted for simplicity we have.

Our SQL query now appears as:

select count(e.seqno), to_char(e.encounter_date, ‘MM/dd/yyyy’) from patient pa
left outer encounter e on e.patient=pa.seqno
where upper(pa.last_name) like ‘A%’ group by to_char(e.encounter_date, ‘MM/dd/yyyy’)

The query is simplified and is clearer to the developer at 3am.  Having consistent columns for uniqueness is advantageous.  The developer of course still has to recognize what index’s exist on the table but knowing the seqno is always an indexed column that can be directly referenced is advantageous.  In this case the LAST_NAME column would also most likely be indexed on the PATIENT table.

  • Abstraction from business modeling
    • Using a sequence number as a persistent object identifier abstracts your identity for the row from the business model logic.  Changing the business model logic will not affect the key and no changes will be necessary to other tables.  In the initial example the OID and ID combination’s may change.  A year later a new object model is implemented for the system, you could possibly have the same patient twice.  Once for the previous OID and another for the new OID.