Create a Random sort based on the Record ID#

  • 0
  • 1
  • Question
  • Updated 8 months ago
  • Answered
I have a table which will have 40,000 records loaded and flushed out each day.  So it will have a high number with 6 or 7 digits for the [Record ID#] in a few days once we go live.

Is there an easy formula to create a field to use as a Random sort, except that I do not want that sort to change during the day, so not random based on anything like now(), but random just based on the Record ID#.

Tomorrow will be a new day, a new set of 40,000 records and a new random sort.

I figure it would have something to do with using the mod() function.  The Record IDs are likely to be created in time of day sequence and I want to be able to do sampling of the first X records on the report to audit, but want to ensure they were spread out all though the day so it can't be a sampling based on the lowest Record ID#'s.

Would it just something like mod([Record ID#], "and a big prime number here").

My sample size is in the ball park of 600 records to be audited per day, so I need 600 of the 40,000 randomly at the top of the report.
Photo of QuickBaseCoach App Dev./Training

QuickBaseCoach App Dev./Training, Champion

  • 52,826 Points 50k badge 2x thumb

Posted 8 months ago

  • 0
  • 1
Photo of Matt NZ

Matt NZ

  • 296 Points 250 badge 2x thumb
If you records are getting populated steadily throughout the day and not all at once, you could sort on a text field with a formula something like this:

var TimeofDay DT = ToTimeOfDay([Date Created]);

Second($DT)
&"."&
Minute($DT)
&"."&
Hour($DT)
Photo of QuickBaseCoach App Dev./Training

QuickBaseCoach App Dev./Training, Champion

  • 52,826 Points 50k badge 2x thumb
The records will be loaded all at once by a Connected Table Sync process first thing in the morning.

So that is why I wanted to base the randomizing on the Record ID field.
Photo of QuickBaseCoach App Dev./Training

QuickBaseCoach App Dev./Training, Champion

  • 52,826 Points 50k badge 2x thumb
I have been th8nkimg about this.


My current plan is to take the Record ID, divide by a prime number, then take the first 5 digits after the decimal. That should give me a number between 0 and 99,999 to use as my sort. I’m not sure if it matters what prime I divide by, but maybe a prime of at least 4 digits.

https://en.m.wikipedia.org/wiki/List_...
(Edited)
Photo of Ⲇanom the ultimate (Dan Diebolt)

Ⲇanom the ultimate (Dan Diebolt), Champion

  • 27,754 Points 20k badge 2x thumb
There is no source of entropy (randomness) within QuickBase's formula language or other platform features that is going to allow you to do proper sampling. So you are going to have to add that entropy before, during or after the import process through script or some other mechanism. Since you are importing through a connected table the entropy is going to have to be added before or after importing because there is no way to tap into the connected table import process. I am going to assume you looked into adding the entropy - random number - before importing and ruled that option out. So you need a script that will add a random value to the 40,000 records which I assume will have consecutive [Record ID#]s as they are imported at the same time. 

No matter what option you pursue I think you have to account for the possibility that the import will fail or partially complete as 40,000 records could be a lot of data  There could many fields in the imported data and we are talking about a second import through script that will add an additional random field to help sample the 600 records.

I know you are smitten with this idea of using prime number and modulo arithmetic on the [Record ID#] but this will not produce random sampling. You might think it appears random because there is no pattern discernible to the human eye but the sampling will be biased and correlated. 

This is the most important point: In a situation where 40,000 records are imported and purged every day I would have to assume the whole point of the import it to track some statistical information. You didn't say what the imported records represent but it could be manufacturing tolerances, stock prices other critical data.  I think in some cases it might even be negligent to not use proper sampling.

Bottom line: I would write a short script that populate a numeric field with a random number after the connected table import and make additional workflow considerations of what to do when there is a problem with either of the two imports.