They always say data conversions are a labor of love – you pour your heart and soul into the program, to get the most data converted as possible, and then, when you actually do run the code in production – it’s a one time event; toss the source files in the bit bucket.
Well, that’s what it was [almost] like in my first job – we sold Property Management systems along with implementation services, and those services invariably included a conversion from the customer’s legacy system. I say [almost], because we were reasonably thoughtful about the process; the code evolved into a data pump-like structure, where data from any source was converted into a standard format, then imported into our tables. And, of course, when we converted multiple customers from the same legacy systems (there were 5-10 we saw regularly), we could re-use that code.
Later on, at a different company, I saw the same “patterns” emerge in all the integration code we would create; after all, aren’t integrations a slightly more complicated form of conversion? I’m still parsing and transforming based on business rules; it’s just that I need to create stuff that traps better for exceptions and errors (and the scripts should be schedulable). You may be doing it in Gentran, but at the end of the day, it’s just a glorified filter program.
The same kind of program structures exist in our data warehouse work as well. ETL tools are just another flavor of BizTalk or GenTran or webMethods – even good ol’ DTS in SQL Server. They just add transactional smarts to the basic process.
Anyway, we’re currently implementing changes in the software (internal development) that supports our PMO. I am pushing the developers to build conversion processes that can be run and re-run on demand. The basic data pulls are in place, and we just add fields every time we need the next piece of information.
I always refer to this “design pattern” as If Exist Update Else Add …[code lang=”plain”]IF record exists THEN
UPDATE changed fields in record
ADD whole new record[/code]
If I’m writing code in VB or something like that, the pseudocode above gets the point across. In SQL, we had to do it backwards (If Not Exist Add, Then Update) …[code lang=”plain”]/* Add a record to tblContacts if we have a new Contact from the import file */
INSERT INTO dbo.tblContacts (ContactID)
SELECT ImpContactID FROM tblImportContactInfo AS s WHERE NOT EXISTS (
SELECT 1 FROM tblContacts AS d WHERE d.ContactID = s.ImpContactID
/* Update matching rows in tblContacts */
UPDATE tblContacts SET ContactName = s.FullName, ContactAddress = s.AddressLine1, ContactPhone =s.PhoneNum
FROM tblContacts AS d INNER JOIN tblImportContactInfo AS s
ON d.ContactID = s.ImpContactID WHERE (
(d.ContactName <> s.FullName)
OR (d.ContactAddress <> s.AddressLine1)
OR (d.ContactPhone <> s.PhoneNum)
OR 1d.ContactName is null) and (s.FullName is NOT null
OR 2d.ContactAddress is null) and (s.AddressLine1 is NOT null
OR 3d.ContactPhone is null) and (s.PhoneNum is NOT null
This is a pretty powerful pattern, in that it only updates the records that need to be updated – good for speed. In addition, it’s very extensible – we just add fields to this list every time we want a little more information, and they’ll all be picked up in the next run through the records.
AtDhVaAnNkCsE for your help!
Notes: [ + ]
|1.||↩||d.ContactName is null) and (s.FullName is NOT null|
|2.||↩||d.ContactAddress is null) and (s.AddressLine1 is NOT null|
|3.||↩||d.ContactPhone is null) and (s.PhoneNum is NOT null|