Forum Discussion

JeffLove's avatar
JeffLove
Qrew Trainee
3 years ago

stripping hidden characters from text strings to enable field equality

I am using a staging table to compare a .CSV import to a bunch of live data.
I have maybe 50 fields to compare and I am trying to check a few thousand records regularly and just flag the fields that have changed.

This is typically pretty straightforward as they are text fields and using a List function and a bunch of if statements, I can easily display the field names when Live Field A <> Import Field A.

My issue is lots of text fields are appearing as not equal, though they look the same when displaying them.  I have used the tricks I know to tidy up hidden characters...

var text A = SearchAndReplace([Live Field A],"\n"," ");
var text C = SearchAndReplace($B,"<br>"," ");

Trim(Upper($B))

This removes next line and carriage returns and makes all uppercase and trims extra spaces.  I apply this to both fields, they have the same length of characters, look identical, even copy/pasting them to Notepad++ and turning on hidden characters they look the same but according to QB they are not equal.

I have one more question path of enquiry... I have tried swapping between Text and Rich Text.  The <br> I think works in Rich Text but \r in text gets converted when I save to something else, perhaps that is the issue?

Any text gurus out there?
Ideas or assistance would be appreciated...

Kind Regards,
Jeff Love

3 Replies

  • I have fought this issue many times and here is what I have found as I hope this helps others out!

    The issue is with non-printing characters & there are different ways of dealing with "new lines" as Windows, Mac & Unix/Linux variants can deal with this differently. 

    Some only add <carriage return>, <New Line> or Both of them at the end of each line. yet there are actually 3 possibilities:

    • Carriage return means to return to the beginning of the current line without advancing downward. The name comes from a printer's carriage, as monitors were rare when the name was coined. This is commonly escaped as "\r", abbreviated CR, and has ASCII value 13 or 0xD.
    • Linefeed means to advance downward to the next line; however, it has been repurposed and renamed. Used as "newline", it terminates lines (commonly confused with separating lines). This is commonly escaped as "\n", abbreviated LF or NL, and has ASCII value 10 or 0xA. CRLF (but not CRNL) is used for the pair "\r\n".
    • Form feed means advance downward to the next "page". It was commonly used as page separators, but now is also used as section separators. Text editors can use this character when you "insert a page break". This is commonly escaped as "\f", abbreviated FF, and has ASCII value 12 or 0xC.

     

    So you might be able to get away with a formula like this: (Replaces the "newline" with "|"
       SearchAndReplace(
          SearchAndReplace(
             SearchAndReplace( [Notes], "r\n", "|"),
          "r", "|"),
       "\n", "|")

    If that does not work, you have other non-printable characters at play so use this site to help you see what your text actually has:
    https://www.soscisurvey.de/tools/view-chars.php

    Then you would need to UUEncode the text to "see" all of the problem characters, then run a SearchandReplace() for EACH DIFFERENT non-printable character, then UUDecode it back to text.  It is not pretty but it works well to find and remove Non-printing ASCII, Unicode & emoji Characters which cause problems, if your users & upstream processes can add these characters.

    NOTEs:

    • Quickbase HTML processing will modify what is displays on the view form, to see what is actually in the field you must EDIT the record and get the data from the field.
    • There is some issue that I nor support have been able to identify, where unicode characters in a record will cause Connected tables to throw update errors with existing records. I have found that I have to delete the offending record in Quickbase to get the new data. Luckily is it not frequent that this happens for me.

     

    I hope this will point you to a solution.

  • Ooops on my formula
    var text A = SearchAndReplace([Live Field A],"\n"," ");
    var text B = SearchAndReplace($A,"<br>"," ");

    Trim(Upper($B))


    ------------------------------
    Thanks, Jeff
    ------------------------------
    • JeffLove's avatar
      JeffLove
      Qrew Trainee

      I did find a solution to this eventually, in case others have a similar problem. 
      It turns out using the urlencode() function at the start converts all the strange characters, and there are a bunch of them into things like %0D%0A  %0A  %3A
      So you can basically trim out the garbage by SearchandReplacing all the different weird characters​ and replacing them with a space "%20"

      Upper(Trim(SearchAndReplace(SearchAndReplace(SearchAndReplace(SearchAndReplace(URLEncode([My TEXT Field]), "%0D%0A", "%20"), "%0A", "%20"), "%20", " "), "%3A", ":")))

      Then the text comparison is equivalent.



      ------------------------------
      Thanks, Jeff
      ------------------------------