Forum Discussion

JenniferJuhasz's avatar
JenniferJuhasz
Qrew Cadet
2 years ago

Transitioning to UTF-8

Hi everyone,

Can any of you share with me how you navigated migrating your realm to UTF-8?

We've only had QB for 2 years, but apparently, UTF-8 wasn't enabled when we started.  Over the past year, I'm watching errors pile up because a lot of our data is in written form (vs. hard numbers).

For example, we'll have survey data and even the questions being asked contain apostrophe's and parentheses.  QB isn't handling them well, and all of these characters are generating errors.

To correct this, QB Tech Support had me install and run an encoding scanner to identify the errors in our realm, and it looks to be numbering in the thousands; nearly every record and a majority of our fields are flagged as problematic.

Some of the issues I see are that the encoding scanner doesn't tell me "what's" wrong with our data, just that this record or that field is an issue.

QB Tech Support said that to turn UTF-8 on will affect the entire realm AND I need to have all the errors fixed before they do this.

Our staff is inputting data daily; so the number of errors grows daily.

We cannot afford to spend thousands on an external company to help us scrub the special characters.  I was debating telling our staff to pause for a long weekend, so I can download the data to excel and do a find and replace function and then re-upload the data...but I'm just worried about breaking or losing data that we need.

Can any of you recommend a better way?  What did you do to manage this?

Thank you!!!

Jen



------------------------------
Jennifer Juhasz
FamilySmart
BC, Canada
------------------------------

8 Replies

  • Hi Jennifer,

    We are currently on a similar journey. When UTF-8 is enabled and you are using another 3rd party cloud service, like Workato, any non-acsii characters sent to the third party service will cause errors, this is the reason to clear the non-ascii characters. Here is what we are doing to repair the data. 

    I made a rich text formula field and entered the formula in the link below. The link below describes the formula and also the reference ascii Website where this information can be found. I have since removed several of the ascii characters in that formula that do not throw the error but this will get you started.

    What the formula does is display all the characters that will cause an error. You then can build a report and use one of three tools to clean up the data namely: Grid Edit, Find/Replace or (Export-correct-Import). You can test by turning UTF-8 on one App at a time and looking for the black diamond with the question mark. This process has to be done for each table.

    Of course if you're not using third party cloud services UTF-8 can be enabled anytime and then the data can be cleaned up when it is discovered but you won't know is hiding behind the black diamond question mark thingy. 

    Finding non- windows-1252 Characters in Text fields using a formula

    https://community.quickbase.com/discussion/finding-non-windows-1252-characters-in-text-fields-using-a-formula#bm7085aeaa-a2e5-46e1-98f5-0185cb476521

    Good luck.



    ------------------------------
    Jim Harrison
    transparency = knowledge + understanding : The Scrum Dudes
    ------------------------------
    • JenniferJuhasz's avatar
      JenniferJuhasz
      Qrew Cadet

      Hi Jim,

      Thank you so much for replying!  I think your reply was buried in my inbox with all of the other QB discussion threads, and I apologize for the delay in responding to you.

      I wasn't aware I can implement UTF-8 on an app-by-app basis; QB Tech Support told me they had to do it at the realm level - which made the project seem far more daunting (because then everything has to be perfect with no new imperfect data coming in).

      While I'm not using 3rd party apps that are directly importing to QB; we do use forms on our website to collect data, that I am building into Sync Files in Google Drive and syncing that data weekly.  

      Once I have enabled UTF-8, when importing data that contains non-ascii characters such as quotation marks, apostrophe's, or parentheses (these seem to be the biggest challenges in our data), will they continue to cause issues (in your experience and opinion)?

      Thank you again! This gives me the ability to develop a clear plan of action on how to tackle what otherwise felt like an overwhelming and impossible task.

      Take care,

      Jen



      ------------------------------
      Jennifer Juhasz
      FamilySmart
      BC, Canada
      ------------------------------
      • JimHarrison's avatar
        JimHarrison
        Qrew Champion

        Hi Jennifer,

        Enable UTF-8 per App by going to the App properties=>Advanced Settings and check the UTF-8 checkbox.

        Once UTF-8 is enabled all "newly entered" non-ascii characters display normally. The problem is dealing with the "old" non-ascii characters currently in the text fields of your Apps. All those characters need to be replaced or removed. 



        ------------------------------
        Jim Harrison
        transparency = knowledge + understanding : The Scrum Dudes
        ------------------------------
  • Just thinking out loud, but if I were to approach this, I would probably use the API to loop through all my apps, and within each app, loop through all my tables, and just convert the string fields of each table to utf-8 on the fly before inserting back into the table. If you have someone on your team that's familiar with python or could hire someone for cheap, they could probably whip something up in an hour or two.  Depends on how many tables you have and many records are in each, but the runtime for this could potentially be pretty quick.

    • JimHarrison's avatar
      JimHarrison
      Qrew Champion

      Hi Tyler,

      I think this is a great idea!
      Here are all the Quickbase objects that may contain non-ascii 1252 characters we have located so far. Some of these cannot be modified via API.

      Field values
      Field names
      Text-Multiple Choice (values even after the type has changed)
      Report names
      Report Descriptions
      Personal Report names
      Table names
      Notification customized text

      I wonder how the proposed script handles non-breaking spaces? 
      We have found the only way to locate and clear non-breaking spaces is to export to excel and then use ctrl+0160 in Find and Replace.

      There is a lot of hand work but I would guess the process could potentially take care of a bulk of the problems. 

      I will add that in some cases we found we could remove the non-ascii value (double-spaces) while others we replaced the value (single-space). It is hard to know when programmatically to choose one over the other without visually confirming the change.

      Lastly one fun filled example we used Quickbase Find/Replace to remove a non-ascii character that removed every space in the field value. A quick table restore put it back but it was a surprise to expect 200 fields (which is what the display said) to get updated and then watch as 200000 values were updated with no way to stop the process. In cases like this export/import is the most efficient.

      Anyhow, we talked with Data Collaborative about why their tool in the QB Exchange doesn't replace and the response was it was too complex.

      I'd be interested to try out a bulk removal tool if you have one, let me know and I'll give it a go. But keep in mind we are 4 Apps away from done and then hopefully we never look back.



      ------------------------------
      Jim Harrison
      transparency = knowledge + understanding : The Scrum Dudes
      ------------------------------

  • hhersch's avatar
    hhersch
    Quickbase Staff

    Hi Jennifer & others. I wanted to jump in and let everyone know that we've developed a tool which can significantly reduce the effort required to standardize data encoding. This tool will convert from Windows-1252 to UTF-8 in one sweep at an application level. While we can't guarantee it will have a 100% match rate, it should reduce the burden massively. The reason Data Collaborative couldn't do this was indeed complexity. To develop this against our APIs would require management of triggers (like notifications or integrations) as well as last modified/date modified and could cause issues for users. We've built this tool in a special way to bypass those issues. This is available in a private beta right now and should go GA next month. If you're interested in helping test it, please ensure you are registered in our beta app and reach out to me directly.



    ------------------------------
    Harrison Hersch
    ------------------------------