Forum Discussion
_anomDiebolt_
8 years agoQrew Elite
There is no source of entropy (randomness) within QuickBase's formula language or other platform features that is going to allow you to do proper sampling. So you are going to have to add that entropy before, during or after the import process through script or some other mechanism. Since you are importing through a connected table the entropy is going to have to be added before or after importing because there is no way to tap into the connected table import process. I am going to assume you looked into adding the entropy - random number - before importing and ruled that option out. So you need a script that will add a random value to the 40,000 records which I assume will have consecutive [Record ID#]s as they are imported at the same time.
No matter what option you pursue I think you have to account for the possibility that the import will fail or partially complete as 40,000 records could be a lot of data There could many fields in the imported data and we are talking about a second import through script that will add an additional random field to help sample the 600 records.
I know you are smitten with this idea of using prime number and modulo arithmetic on the [Record ID#] but this will not produce random sampling. You might think it appears random because there is no pattern discernible to the human eye but the sampling will be biased and correlated.
This is the most important point: In a situation where 40,000 records are imported and purged every day I would have to assume the whole point of the import it to track some statistical information. You didn't say what the imported records represent but it could be manufacturing tolerances, stock prices other critical data. I think in some cases it might even be negligent to not use proper sampling.
Bottom line: I would write a short script that populate a numeric field with a random number after the connected table import and make additional workflow considerations of what to do when there is a problem with either of the two imports.
No matter what option you pursue I think you have to account for the possibility that the import will fail or partially complete as 40,000 records could be a lot of data There could many fields in the imported data and we are talking about a second import through script that will add an additional random field to help sample the 600 records.
I know you are smitten with this idea of using prime number and modulo arithmetic on the [Record ID#] but this will not produce random sampling. You might think it appears random because there is no pattern discernible to the human eye but the sampling will be biased and correlated.
This is the most important point: In a situation where 40,000 records are imported and purged every day I would have to assume the whole point of the import it to track some statistical information. You didn't say what the imported records represent but it could be manufacturing tolerances, stock prices other critical data. I think in some cases it might even be negligent to not use proper sampling.
Bottom line: I would write a short script that populate a numeric field with a random number after the connected table import and make additional workflow considerations of what to do when there is a problem with either of the two imports.