Building an Importer - Pipeline

Question

I have data (Individual Units/Jobs) scrapped from builders' websites and want to import it into a specific table. I have built an "Import" table because I want my pipeline to check for duplicates before it imports into my "Units" table. The data that I scrap often is the same + new units.
After checking for duplicates I want it to only import the unique or new units.
Does anybody have a good reference to help, or know how I should build my pipeline? I assume I need to start with an import then run a query on the import table and the units table, and then create based on not finding any that match a specific field that is the "SKU" for the unit.------------------------------Ryan Broderick------------------------------

markshnier__you · Answer

The technique is to create an admin record in a table with just one record and it will be record ID number one.&nbsp; That gives you a convenient place to build buttons so that you can do various actions with just a click.&nbsp;&nbsp;

Then you would also create a scratchpad table where you can stage your scraped data before importing. You make sure that the field names match exactly with the column headings for the file you are importing.&nbsp;&nbsp;

Then on your admin record you can make a formula URL button for example to clear the scratchpad table and another button to open up that import window reload the scratch pad.&nbsp;&nbsp;

You then construct a formula field on your scratchpad table to build the Key field of your target table. Then I like to make a formula checkbox field on your target table called like [Unit Exists?] Formula of true.&nbsp; &nbsp;Look that up down to the scratchpad table. That gives you a convenient hook to know whether or not you want to import the any individual scraped data record.
So that deals with the issue of not overriding existing records.&nbsp;&nbsp;
Then build a saved table to table import to import from the scratchpad and map the fields into your target table. Then make a button on the admin table to run that import.&nbsp;&nbsp;

OK that's all great but you are bound to have duplicates in your scraped data table and the import will fail if you have duplicates.&nbsp; That is where you need a formula query to detect duplicates and only import the first duplicate.&nbsp;&nbsp;
If you have not been exposed to formula queries they're very powerful but the syntax is a bit more challenging.&nbsp;&nbsp;
Not fully tested but you can make this field called for example [Record ID#s of all Units similar to this one] as a formula text field type to return a semi colon delimited list of all the [Record IDs] for the siblings of this Unit.&nbsp;
ToText(GetFieldValues(GetRecords("{99.EX." &amp; [unit Identifier] &amp; "}"),3))
Replace the 99 with the field containing the unit identifier.
Observe the results and where you have duplicates it should be a list of the Record IDs in Record ID sequence where the unit is duplicated.&nbsp;&nbsp;

Then you could make another field which will identify if the record is the first duplicate.

ToNumber(Trim(Left([Record IDs for this unit],";"))) = [Record ID#]
So then in your saved table to table import you would have a filter that the unit doesn't already exist and also filter that it is the first duplicate.&nbsp;&nbsp;

&nbsp; &nbsp;------------------------------Mark Shnier (Your Quickbase Coach)mark.shnier@gmail.com------------------------------

markshnier__you · Answer

By the way, there is probably a whole other way to do this using pipelines but that's probably enough typing for me for one day. I prefer this method because everything happens instantly when you click. The user doesn't have to hang around waiting for a pipeline to run.&nbsp;&nbsp;------------------------------Mark Shnier (Your Quickbase Coach)mark.shnier@gmail.com------------------------------

markshnier__you · Answer

Are you trying to avoid importing if the unit already exists in your target table or are you looking for duplicates within the import set itself or both situations?

------------------------------
Mark Shnier (Your Quickbase Coach)
mark.shnier@gmail.com
------------------------------

ryanbroderick · Answer

Thanks for the reply!&nbsp;
I am trying to avoid importing if the unit already exists. I was planning on using that unique id to be the indicator to know if it is in my "Units" table or not. I know I will end up having duplicates in my "Import" table but only want one in the target table.------------------------------Ryan Broderick------------------------------

markshnier__you · Answer

&nbsp;want a more simple solution with a pipeline you can still use the staging table approach and you can still detect whether the unit already exists. Then you can simply have a pipeline run but since your data probably includes duplicates you can't run the pipeline and just import them because that's going to throw a lot of errors whenever it hits a duplicate. The pipeline will run what they call asynchronously which means multithreaded all at once.

So a way around that is that in the for each loop after you do a search step to retrieve all of the staged records, inside the loop you would create a bulk upsert add one record to it and then commit the upsert. I know it seems incredibly inefficient they have to create a bulk upsert upsert for just one record at a time but that way when the threads read asynchronously if the unit already exist then the second occurrence of that will just merge in, and not throw errors.&nbsp; &nbsp;------------------------------Mark Shnier (Your Quickbase Coach)mark.shnier@gmail.com------------------------------

Forum Discussion

Building an Importer - Pipeline

5 Replies

Related Content

How to Start Building

Three Words of Advice on Building Pipelines - Brittany Scheid

Merge Imports

Pipeline after import (heavy traffic/transient error)

Pipeline to add Parent record from child import