Discussions

 View Only
  • 1.  Building an Importer - Pipeline

    Posted 20 days ago

    I have data (Individual Units/Jobs) scrapped from builders' websites and want to import it into a specific table. I have built an "Import" table because I want my pipeline to check for duplicates before it imports into my "Units" table. The data that I scrap often is the same + new units.

    After checking for duplicates I want it to only import the unique or new units.

    Does anybody have a good reference to help, or know how I should build my pipeline? I assume I need to start with an import then run a query on the import table and the units table, and then create based on not finding any that match a specific field that is the "SKU" for the unit.



    ------------------------------
    Ryan Broderick
    ------------------------------


  • 2.  RE: Building an Importer - Pipeline

    Posted 20 days ago

    Are you trying to avoid importing if the unit already exists in your target table or are you looking for duplicates within the import set itself or both situations?



    ------------------------------
    Mark Shnier (Your Quickbase Coach)
    mark.shnier@gmail.com
    ------------------------------



  • 3.  RE: Building an Importer - Pipeline

    Posted 20 days ago

    Thanks for the reply! 

    I am trying to avoid importing if the unit already exists. I was planning on using that unique id to be the indicator to know if it is in my "Units" table or not. I know I will end up having duplicates in my "Import" table but only want one in the target table.



    ------------------------------
    Ryan Broderick
    ------------------------------



  • 4.  RE: Building an Importer - Pipeline

    Posted 20 days ago

    The technique is to create an admin record in a table with just one record and it will be record ID number one.  That gives you a convenient place to build buttons so that you can do various actions with just a click.  

    Then you would also create a scratchpad table where you can stage your scraped data before importing. You make sure that the field names match exactly with the column headings for the file you are importing.  

    Then on your admin record you can make a formula URL button for example to clear the scratchpad table and another button to open up that import window reload the scratch pad.  

    You then construct a formula field on your scratchpad table to build the Key field of your target table. Then I like to make a formula checkbox field on your target table called like [Unit Exists?] Formula of true.   Look that up down to the scratchpad table. That gives you a convenient hook to know whether or not you want to import the any individual scraped data record.

    So that deals with the issue of not overriding existing records.  

    Then build a saved table to table import to import from the scratchpad and map the fields into your target table. Then make a button on the admin table to run that import.  

    OK that's all great but you are bound to have duplicates in your scraped data table and the import will fail if you have duplicates.  That is where you need a formula query to detect duplicates and only import the first duplicate.  

    If you have not been exposed to formula queries they're very powerful but the syntax is a bit more challenging.  

    Not fully tested but you can make this field called for example [Record ID#s of all Units similar to this one] as a formula text field type to return a semi colon delimited list of all the [Record IDs] for the siblings of this Unit. 

    ToText(GetFieldValues(
    GetRecords("{99.EX." & [unit Identifier] & "}"),3))

    Replace the 99 with the field containing the unit identifier.

    Observe the results and where you have duplicates it should be a list of the Record IDs in Record ID sequence where the unit is duplicated.  

    Then you could make another field which will identify if the record is the first duplicate.

    ToNumber(Trim(Left([Record IDs for this unit],";"))) = [Record ID#]

    So then in your saved table to table import you would have a filter that the unit doesn't already exist and also filter that it is the first duplicate.  

       



    ------------------------------
    Mark Shnier (Your Quickbase Coach)
    mark.shnier@gmail.com
    ------------------------------



  • 5.  RE: Building an Importer - Pipeline

    Posted 20 days ago

    By the way, there is probably a whole other way to do this using pipelines but that's probably enough typing for me for one day. I prefer this method because everything happens instantly when you click. The user doesn't have to hang around waiting for a pipeline to run.  



    ------------------------------
    Mark Shnier (Your Quickbase Coach)
    mark.shnier@gmail.com
    ------------------------------



  • 6.  RE: Building an Importer - Pipeline

    Posted 19 days ago

     want a more simple solution with a pipeline you can still use the staging table approach and you can still detect whether the unit already exists. Then you can simply have a pipeline run but since your data probably includes duplicates you can't run the pipeline and just import them because that's going to throw a lot of errors whenever it hits a duplicate. The pipeline will run what they call asynchronously which means multithreaded all at once.

    So a way around that is that in the for each loop after you do a search step to retrieve all of the staged records, inside the loop you would create a bulk upsert add one record to it and then commit the upsert. I know it seems incredibly inefficient they have to create a bulk upsert upsert for just one record at a time but that way when the threads read asynchronously if the unit already exist then the second occurrence of that will just merge in, and not throw errors.   



    ------------------------------
    Mark Shnier (Your Quickbase Coach)
    mark.shnier@gmail.com
    ------------------------------