Forum Discussion

MartinSuske1's avatar
MartinSuske1
Qrew Member
4 years ago

Questions on data import

Hello Quickbase Community,

I'm looking for some experienced opinions on how to approach a current requirement.
I basically need to upsert 25.000 data rows each day.

I have little to no control over the output format of the source systems, so my main concern is:
Will I be able to perform the necessary transformations within Quickbase or should I rely on an additional tool sitting between the source and Quickbase and transforming the data into appropriate formats for Quickbase to handle it.

I already worked with connected tables and found it pretty smart for the task.
But I found it processes only comma separated files with the headers matching the Quickbase table fields.

Then I tested the pipeline CSV Handler with 25 rows of testdata and had to wait almost 5 Minutes to load them. Doesn't seem to be the way to go with 25.000 rows. Although I could flexibly change the separator and match source to target fields.

I had a look into the pipeline "Bulk Record Set - Import with CSV" which seems to be suited for big loads of data and read about a 10.000 rows limit per pipeline execution what would lead to further preparation of my source data too.

My question is: 
Is there a way of dealing with data transformation and mass data imports in Quickbase or should I definitely use an ETL framework upfront? 

Any input is highly appreciated.

Thanks and best regards,
Martin



------------------------------
Martin Suske
------------------------------
  • Hi Martin;
    You may want to check EZ File Importer from Juiced Tech.
    I know it is easy to use and powerful.
    Here is the link for this add-on: https://www.juicedtech.com/ez-file-importer
    I hope it helps

    ------------------------------
    Razi D.
    Desta Tech LLC
    ------------------------------
  • Martin,
    This can be done natively without Pipelines. 
    1. Create a "scratch" temporary table with the same columns as your raw source data.
    2. Create a set of fields by formula to massage the data into he format you need.  Some fields may be OK as it, some may need to be replaced with formula fields.
    3. Create a saved Table to Table import to map the field across to your real table.  You say an Upsert, so that would be a "Merge" selection in the saved T2T import.  The saved T2T may also have filters to strip out rows that are not to be imported 

    I suggest setting up a single record in a new table to create button to do actions like,

    Clear the scratchpad,
    Import the scratch pad.
    Count the # of record in the scratch pad and count how may will be added vs how may will be Merged in to update existing records.

    Feel free to post back if you have question.​

    ------------------------------
    Mark Shnier (YQC)
    mark.shnier@gmail.com
    ------------------------------
    • MartinSuske1's avatar
      MartinSuske1
      Qrew Member
      Hi Mark,

      thanks for your answer. The T2T method sounds interesting and would solve the field mapping and transforming issues.

      I wonder though what do you suggest for the initial import? I assume I'll have to deal with CSV files separated by semicolon, tab or comma. And I need to fill that temporary table in the first place with 25.000 rows. Which interface would provide the performance and flexibility in file structure?

      I should have mentioned: In the end the whole import process should run automatically or based on some trigger. 

      Best regards,
      Martin

      ------------------------------
      Martin Suske
      ------------------------------
      • MarkShnier__You's avatar
        MarkShnier__You
        Icon for Qrew Legend rankQrew Legend
        For fully automated you will want the scratch file to be a connected Sync Table.  Your automated process would put the file there each day into an SFTP path or else in the cloud to Dropbox, Box, or Google Drive.

        Then on a scheduled basis a Pipeline can run to do the T2T Copy and delete the scratch file records.

        ------------------------------
        Mark Shnier (YQC)
        mark.shnier@gmail.com
        ------------------------------
  • Martin,

    Is the output format consistent?  I have got a client that gets data from 50 or so suppliers and everyone of them formats it differently AND will change the format without and warning.   The last part makes normalizing the data particularly challenging.  

    We built a Normalization app where every Supplier has their own table so there are unique formulas to get the data the way it is needed.  Included is some robust error checking to look for surprises each month.

    In hindsight I wish we had used a full ETL tool like Mule.

    https://www.mulesoft.com/

    The process would have been slower but the tools are much more sophisticated.


    ------------------------------
    Don Larson
    ------------------------------
    • MartinSuske1's avatar
      MartinSuske1
      Qrew Member
      Good point Don. I cannot be sure the output will remain constant with multiple data sources attached in the future.
      Also data quality can be an issue so error checking will be important in the process. Thanks for your input.

      ------------------------------
      Martin Suske
      ------------------------------
      • MarkShnier__You's avatar
        MarkShnier__You
        Icon for Qrew Legend rankQrew Legend
        Martin, if you have different data sources in the future then you would just replicate your process have a different Sync table for the different data sources,  do your mapping and feed it into the centralized table in a purified transformed form.

        ------------------------------
        Mark Shnier (YQC)
        mark.shnier@gmail.com
        ------------------------------