Forum Discussion

OwenMorgan's avatar
OwenMorgan
Qrew Member
6 years ago

Tool for merging records

We are importing 60k records that are currently one ITEM record, we are splitting these into CONTACTS. CASES and *ITEMS.

Because no contacts existed previously we will get one contact created for every ITEM

Some contacts have a postcode or email address which will show us if they are, for example the same person
eg. 
A Jones (andyjones@emailaccount.com)
Andy Jones (andyjones@emailaccount.com)

We need a tool that will identify possible matches, allow us to agree which matches are correct and reject erroneous ones and then update all relevant records with the CONTACT :[RECORD ID] that is to be kept and delete the duplicate contacts. 

NB we are already working with a QBSP, but we are opening to buying in a third party tool if someone has devloped one. 



------------------------------
Owen Morgan
------------------------------
  • Owen,

    You are cleaning up and normalizing your data.  There is a lot of ways to do this but I am not aware of a single product that claims to do this.  I am sure that the major ETL tools like MuleSoft or Talend can solve the issue, but you will have to know how to develop the logic in those to extract and clean the data.  However you are already subscribed to a the most flexible platform in the PaaS market, Quick Base.

    Create a new Quick Base application.  I always call it Normalization.   
    Import all 60K records into a blank table called Raw Data and have it auto create the fields.

    Now you need to build the reports and additional tables to clean up your Raw Data

    Assume that in Raw Data you have the field [Email Address] and this is going to determine if you have a unique Contact
    Create a Table Called Contact
    Use Excel to DeDup your 60K records on the field Email Address.   You will now have less than 60K records
    Import those to the Contacts Table
    Relate the Contact Table to the Raw Data table but use Email Address as the Reference Field

    Now you will have all unique Contacts and they will be related to full array of Demographic information in the Raw Data table


    Add a field to Raw Data [Approved Contact Record]
    Now reviewing the Contact Records you can do a Grid Edit and mark which record in Raw Data is going to be approved to go to your other application.


    Repeat this for Cases and Items

    This will take time and effort but you can systematically build the rules and relationships to go through and clean up your Raw Data for Export from the Normalization application to your production system with multiple tables.

    On the hand if you want to write out your rules, I know a Mule developer that can knock this out, but it is a standard development project.


    ------------------------------
    Don Larson
    Paasporter
    Westlake OH
    ------------------------------
    • OwenMorgan's avatar
      OwenMorgan
      Qrew Member
      Thanks Don, 
      We were considering this way of doing it but I also know that many bespoke databases already come with such features built in. I was really hoping we could do it another way. Partly because if duplicates are created in future we would be able to continue to combine/merge records.
      Thanks for these clear instructions which no doubt will be helpful if we are unable to find a tool 
      Owen

      ------------------------------
      Owen Morgan
      ------------------------------
      • DonLarson's avatar
        DonLarson
        Qrew Elite
        Owen,

        Quick Base does have some features that can help you but does not have a workflow that automatically does this.   On each field you can set whether it is required and unique.



        You will probably want these set in your production application.  However if this is a Sales app, your sales team might have a strong opinion on requiring the data.

        To your second point, new records after you go live will probably need a different process.   Cleaning full data sets for initial load is not the same as adding a record that will take on life cycle from scratch.  You need a data validation strategy for new records so that when they get entered you have new, non duplicative info.    Assuming that everything starts with a Contact and there is a business process where the Case and Item develop over time, requiring unique email addresses might solve it for you.

        Here is another strategy.  I have got a client with a large sales staff.  People constantly come and go.  A couple years ago it became clear that starting their data in the main system was a disaster.   We created a dedicated Onboard Offboard application for managing people.  Once the work flows were complete there validating a bunch , then the people are entered into the main system by the Ops team.




        ------------------------------
        Don Larson
        Paasporter
        Westlake OH
        ------------------------------