Extracting a block of text from a large block of text

  • 0
  • 1
  • Question
  • Updated 7 months ago
  • Answered
We have a database that collects records from a Microsoft Exchange Connected data source.

The body of the email can vary in text volume. But one consistency is this block of information:

 

Incident Reported : 04/19/2018 11:30AM

Operating Company : Company Name

Location : 9999

Address : 12345 Anywhere Rd.

City, State, Zip : Arlington, VA 22206

Associate Name : JOHN DOE

Telephone Number : 512-123-4567

Description : DESCRIPTION OF THE INCIDENT BEING REPORTED. CAN BE ANY LENGTH OF CHARACTERS.

Notified Vendor: N

First Vendor Contacted: None

Brand Name: A NAME

UPC:

 

We’d like to be able to extract and isolate the above block to its own text field, then manipulate it from there.

Any help is appreciated.

Stephanie
Photo of Stephanie

Stephanie

  • 596 Points 500 badge 2x thumb

Posted 7 months ago

  • 0
  • 1
Photo of QuickBaseCoach App Dev./Training

QuickBaseCoach App Dev./Training, Champion

  • 51,346 Points 50k badge 2x thumb
You are in luck. You have a semi colon as s delimiter

Trim(
Part([My Text field], 2, ":"))

Just keep increasing the 2 to get the different parts.
Photo of Stephanie

Stephanie

  • 596 Points 500 badge 2x thumb
Thank you. I'll try that.
Photo of Ⲇanom the ultimate (Dan Diebolt)

Ⲇanom the ultimate (Dan Diebolt), Champion

  • 26,262 Points 20k badge 2x thumb
Bad idea to parse on "semicolons" (I think you mean colons).

Colons can be part of the data as in "11:30AM".

The way to do this is with script using regular expressions. Additionally using formulas will wastefully recompute what should be a one time parsing.

CORRECTION: QuickBase probably does not recompute formulas where the inputs do not change.

I  put this on my to do list.


(Edited)
Photo of Stephanie

Stephanie

  • 596 Points 500 badge 2x thumb
Since any amount of information can be present ahead of the desired block of information, the provided solution only works after the isolation of the block. We need to plow past any amount of data that is ahead of the desired block first, like CCs and BCCs, etc. We are also capturing the RawBody of the email. The block we want to isolate is prefaced by “<b>” markup.

There is also any amount of data after the desired block. Our objective for now is to isolate that needed block of information into its own field, and then from there we will be able to use the Part() function to pull it apart even more.

Any help is appreciated.

Photo of QuickBaseCoach App Dev./Training

QuickBaseCoach App Dev./Training, Champion

  • 51,346 Points 50k badge 2x thumb
Did you try the low tech method I described?
Photo of Stephanie

Stephanie

  • 596 Points 500 badge 2x thumb
Yes
Photo of Ⲇanom the ultimate (Dan Diebolt)

Ⲇanom the ultimate (Dan Diebolt), Champion

  • 26,522 Points 20k badge 2x thumb
It is almost always a better idea to clean up and parse your data before importing it into a new system (here QuickBase). I am going to assume you  have already looked at this option and are accepting the fact that the un-parsed data has landed in your application.

You used the word "consistency" in describing the information you want to parse out of the text field. More than likely your data is a lot less consistent than you assume as information could appear out of order, could be missing, or contain unexpected characters. One example of the data is insufficient to derive its actual format and structure. Because of this variability is probably best to initially manually supervise the parsing of the data rather than completely automate the process.

Here is a quick demo - Click the Parse button to parse the fields and click the Reset button to reset the fields:

Regexp Capturing Groups
https://haversineconsulting.quickbase.com/db/bnm9bcsys?a=er&rid=1

Pastie Databsase
https://haversineconsulting.quickbase.com/db/bgcwm2m4g?a=dr&rid=652

Notes:

(1) To simplify the maintenance of the demo I have made the [Text] field readonly on the HTML form and provided a Reset button to clear the parsed fields.

(2) For the City, State and Zip information I have parsed the string into individual fields.

(3) This is just a simple demo of parsing data out of a field using script and regular expressions. In fact I wrote the script to allow a user unfamiliar with regular expressions to modify the script without having to understand every detail. However, regular expressions are so powerful that it could easily be adopted for much more complex parsing requirements.

If you need further assistance with implementing this solution feel free to contact me off-world using the information in my profile:

https://getsatisfaction.com/people/dandiebolt/
(Edited)
Photo of Stephanie

Stephanie

  • 596 Points 500 badge 2x thumb

This works, but, you have to copy over the text from the source to a regular text box, save the record, then go back in and click the parse button.

Is there a line of jQuery we can put in the module.js that will automatically copy over the Connected data field [body] to the regular text field [Bodytwo]?

thank you,

Photo of Stephanie

Stephanie

  • 596 Points 500 badge 2x thumb
Also, can we automate selecting the parse button?
Photo of Ⲇanom the ultimate (Dan Diebolt)

Ⲇanom the ultimate (Dan Diebolt), Champion

  • 26,522 Points 20k badge 2x thumb
>This works, but ...

All that can be automated by binding the button to a script that runs a loop over all the records in a query, parses out the results and imports the parsed text in the appropriate fields / table. Those details are specific to your setup. Additionally, you will still need a button to kick off the automatic parsing process (the loop) as I don't know of a way to run script after a connected table updates.

However, I would still approach the problem but using a button on a single records so you can supervise and observe the parsing. It is unlikely that that one example of data captures the true variability of you incoming data and my regular expression may fail in special cases. For example, (1) missing or extra spaces in the City, Stat and Zip information or (2) multiple newline characters in the Description information will cause the current regular expressions to fail. Regular Expression are powerful enough to address the variance in your data but you have to learn these characteristics by manually supervising the script parsing before going for a full automation.

If you need further assistance with implementing this solution feel free to contact me off-world using the information in my profile:

https://getsatisfaction.com/people/dandiebolt/
Photo of Stephanie

Stephanie

  • 596 Points 500 badge 2x thumb
Thank you. We will try this and see how it goes.