User Profile

MikeHammond

Quickbase Staff

Joined 6 years ago

4 Posts9 Likes

View All Badges

User Widgets

Contributions

"Floating Point" Problems, Explanations, and Workarounds
We get a lot of questions about surprising behavior when numbers have digits after the decimal point. Sometimes numbers don't look right, sometimes they don't seem to behave right mathematically. Often someone thinks they've found a bug, and it's difficult to explain how the behavior is normal and that nearly all computer software shares the same behavior. As a software QA guy with a mathematics background (see my introduction in the second paragraph of my first Quickbase community post), I find this kind of question particularly interesting. Let me explain a little bit about how computers represent numbers and do math. Then I'll show a few common questions that people ask and walk through explanations - and workarounds, where possible. A quick note: most of the examples discussed are around formulas. That is just because they are usually easier to illustrate with. The same caveats and information applies to any part of Quickbase where you are comparing numbers. This includes, but is not limited to: report filters, custom data rules, permissions, etc. Floating-Point Arithmetic The phrase that computer people throw around to describe how computers do math is "floating-point representation." This phrase refers to a standard, called ANSI/IEEE 754, that describes how computers are expected to represent fractional numbers (I'm only giving you this link so you can fact-check me later if you want; you don't need to go there now). Most popular computer chips, operating systems, and languages have been following this standard, at least more-or-less, since the early 1990s. I think it's fair to say that all popular computer operating systems and languages follow this standard today. That means that most software applications, like spreadsheets, databases, Quickbase, and so on, also follow this standard. Let me show you one example in JavaScript (outside of Quickbase). Use this w3schools sample on the JavaScript “toFixed” function. Change var n = num.toFixed(2) to var n = num.toFixed(16) Click “Run” at the top. Click “Try it” on the right. Notice that your browser is showing you the value 5.5678900000000002 even though we gave it the number 5.56789? Let me explain the two main things that are going on with this standard for computer mathematics. There's a limited amount of space to store a number. The amount of space a computer has to store a number means it can store about sixteen digits' worth of stuff before it has to chop off the rest. So here's a very simple example: Say you have a formula that computes 1/3. This cannot be stored exactly. The computer stores the number as something very close to 0.3333333333333333 and has to chop it off there. Computer mathematics has no special notion of fractions that are somehow simple; all numbers are stored the same way, as "just numbers". Computers don't do math in decimal; they do it in binary. Binary is just a series of 1s and 0s that make up instructions for a computer. The result is that most numbers that have digit(s) after the decimal point cannot be exactly represented in binary. This is a little harder for some people to understand and accept than the idea of limited space. It's pretty easy to accept that 1/3 can't be represented exactly, because you can see how it looks in decimal. But it turns out that most numbers that only have a few digits after the decimal point can't be represented exactly in binary. If you are already familiar with this, skip down to the next bullet ("It's easy to forget that displayed decimals and actual precision are different things.") If you are interested in some more context, let's dig into this a little more. Say you have two measuring sticks. One is a super-precise meter stick. It has a big mark at one meter, smaller marks every decimeter, smaller marks every centimeter, smaller marks every millimeter, and so on down to sixteen levels' worth of marks. Any (decimal) number that you can write with sixteen or fewer digits after the decimal point will correspond exactly to some mark on this meter stick. The smallest marks are 1/10000000000000000 of a meter apart. (That big number is a one followed by sixteen zeros.) Now say you have a super-precise yardstick. Let's ignore the big marks at each foot, and start with the smaller marks every inch. Below that, there are smaller marks every half-inch, smaller marks every quarter-inch, even smaller marks every eighth-inch, and so on. If this stick has about fifty-two different sizes of marks, the interval between two of the tiniest lines will be 1/9007199254740992 of an inch. See how there are sixteen digits in that big number? That means that there are about as many marks between each inch, on this stick, as there are between meters on the other one. But the difference is very important. One difference is that we can't represent, say, 1/10 of an inch exactly: It's less than a half an inch (0.5"). It's less than a quarter inch (0.25"). It's less than 1/8 of an inch (0.125"). It's more than 1/16 of an inch (0.06125"). It's more than 3/32 of an inch (0.093125"). It's less than 7/64 of an inch (0.1090625"). It's less than 13/128 of an inch (0.10109375"). It's more than 25/256 of an inch (0.097109375"). It's more than 51/512 of an inch (0.099609375"). It's less than 103/1024 of an inch (0.1005859375"). It's less than 205/2048 of an inch (0.10009765625"). It's more than 509/4096 of an inch (0.099853515625"). It's more than 1019/8192 of an inch (0.0999755859375"). I'm going to stop there, but I want to make two points: (1) This eventually settles into a pattern. If you write 1/10 out in binary, it's 0.0001100110011001100… where that "1100" repeats, and this corresponds to the pattern of "less than" and "more than" on this yardstick. (2) Even if you aren't fully aware of how this settles into a pattern, look at how the decimal expression of those fractional inches is running away. Each step of this process, we get one (or sometimes two) more digits, ending with a five. As we get closer and closer to 0.1 inches, we're picking up more and more digits at the very end there. Another way to say this is that the only numbers we can represent exactly on this yardstick are numbers whose fractional representation has a denominator that is some power of two. And since 1/10 has a denominator that is not a power of two, we're never going to be able to represent it exactly on this yardstick. The same goes for 1/100, 1/1000, and so on. So the vast majority of numbers that only take a few digits after the decimal place are not exactly representable to a computer, since the computer is "using a yardstick" (binary) instead of "using a meter stick" (decimal). It's easy to forget that displayed decimals and actual precision are different things. Most software applications allow you some way to choose how many digits you wish to display after the decimal point. Many systems automatically choose to display fewer digits than would be possible when the value is very close to a short value. For example, if you have the number 0.3499999999999999, many systems will automatically choose to display this value as "0.35". In Quickbase, if you go to the field properties page for a numeric field, you'll find in the "Display" section a setting called "Decimal places". Remember that this is only changing the maximum number of digits the application uses to show you the approximate value - it does not change the actual underlying value. Bringing that back to Quickbase, let's combine both of the above concepts. We can look at a scenario where we key in one of the above numbers, like this shown below. Rounded off, this "looks like" it is .10 - but it really isn't. Frequently Asked Questions, with Explanations and Workarounds So now that you're picturing computer arithmetic as being on a very (but not infinitely) precise yardstick, and now that you're keeping in mind that displayed decimals is different from mathematical precision, let's get into some typical questions and discuss workarounds. I have a formula that does some math and the computer's getting the last digit wrong. What's up with that? Is that a problem? That's just a normal outcome of the fact that computers have a limited amount of space to represent a number. The easy example to think through is if you have a formula that does 1 / 3 * 3. It's relatively easy to picture the computer doing the 1/3 part, getting 0.3333333333333333, and having to chop it off there. Once you picture that, it should be pretty easy to see that when it does the *3 part, the answer will be 0.9999999999999999 instead of exactly 1. The computer has "forgotten" that last little piece of the number after the sixteenth digit. The trickier situation is when you do some math on fractional numbers and it looks like it should work out based on the display values you are staring at on screen. Say you have a formula that does 1 / 10 * 10. When you're thinking in decimal, it seems that the 1/10 part should just be 0.1, and then when you multiply it by 10 the answer should just be 1. But remember the computer is doing math on a yardstick. So the 1/10 part is .0001100110011001100… in binary, which has to get rounded off somewhere, just like the above example. When that number gets rounded off, and then you multiply it by 10, the little error that crept in because of the rounding off will remain. So when you do 1 / 10 * 10, you are likely to get the answer 1.0000000000000001 rather than simply 1, because the closest binary number to 1/10 is just a little bit bigger. A simple visualization of this in Quickbase is mileage reimbursement. This looks quite straightforward. But after keying in the request, we can see a few issues manifest. Workaround: If you're only concerned about how the number looks, this is a great place to use the "Displayed decimals" property of the field. Say you reduce the displayed decimals of the result to eight digits. Quickbase will (in a manner of speaking) round off the answer to .10000000, recognize it does not need to display the trailing zeroes, and display the number as "0.1". If you're concerned about how the number behaves mathematically, keep reading. These two numbers sure look the same to me. Why doesn't the "=" in this formula say they're the same? This is illustrated in the mileage example above, and usually happens when at least one of the numbers is the result of some calculation - especially when you're comparing it to a fixed value with only a few decimal places, like "[Total Cost] = 19.98". Remember that the value ".98" is not exactly representable on the computer's yardstick. Nor are most of the cost values you're adding up to get to this total. Since all of these numbers are being rounded off a little bit before they get added up, it's possible we could run into a set of numbers where more of them are getting rounded in the same direction, and their sum is just a little bit different from how 19.98 gets rounded. Saying this another way - if you were considering writing a formula that said "[Total Cost] = 33.33333333333333", and you knew your formula took simple numbers and divided them by three before adding them up, you would probably be a little wary about expecting it to work. Remember not to be fooled by a number that looks simple in decimal, like 19.98, because in binary it's going to have to get rounded off just the same. Workaround: There are two common strategies to work around this problem. (1) Whenever you're comparing numbers with decimal places, compare them to some kind of tolerance. So, for example, instead of saying If ( [Cost] = 1.1, "Yes", "No" ) , in a formula, you might consider saying If ( [Cost] > 1.09999 and [Cost] < 1.10001, "Yes", "No" ) (2) Round the values to some number of decimal places before comparing. You should round both sides of the equality to the same number of decimal places - even if one of them is just a constant! - and you should still be aware that, with this strategy, there could be some very rare cases where things don't behave exactly as you'd expect. If ( ROUND([Cost],.00001) = ROUND(1.1,.00001), "Yes", "No" ) When I round a number to a particular decimal place, it's not handling the "point fives" consistently or correctly. Why is that? (For example, if you're rounding to two decimal places, you might notice that 0.265 rounds up to 0.27, and 0.275 rounds up to 0.28, but 0.285 rounds down to 0.28.) This is another side effect of the fact that the computer stores fractional numbers in binary, not decimal. That number that looks like 0.265 when you display it in decimal might actually be just a tiny bit more, so it rounds up. That number that looks like 0.285 might actually be just a tiny bit less, so it rounds down. Workaround: The general strategy here is to round numbers as late as possible, to as many digits as possible. One example we've seen a few times now is when someone is computing a unit price for a large order. Some math gets done that comes up with a small price per item, that looks like it's got exactly half a cent in it (like the 0.285 example above). The application developer rounds this rate to the nearest cent before multiplying the number of units. The business owner expects this to be 0.29 cents per unit, but Quickbase computes it as 0.28 cents per unit, and the one-cent difference times tens of thousands of units comes up to a hundreds-of-dollars "discrepancy". In this case, we suggest that you don't round the unit rate to two digits. Consider rounding it to three or four digits, or even not rounding it at all and just displaying it to three or four digits, and then round the price after you multiply by the number of units. When I display a number to a particular number of decimal places, sometimes the last digit is wrong. Sometimes it's different from what I get when I round the number to the same number of decimal places. What's happening? Quick Base goes through different code paths when it is rounding numbers and when it is choosing how to display numbers. All it takes is a tiny little difference in the algorithms to cause rounding and display to make different decisions about that last digit. Workaround: There really isn't a direct workaround. The only thing I know how to suggest is that you learn to expect variability in the very last digit of any fractional number. This is really the most important principle of the whole story, right here. If you learn to not expect that last digit to be exactly right, you will recognize and figure out specific workarounds to any problems like these you encounter in the future. I have a custom key field (or I'm trying to merge on a numeric field). I'm getting duplicate entries. What's the problem? This is another symptom of the fact that two fractional numbers can look the same, even when displayed to full precision, but be mathematically different way down in the smallest bit or two. Remember that a value that looks simple in decimal, like 1.4, is not exactly representable in binary. The value already stored in a record might have come from some mathematical operation and be the binary number just bigger than 1.4, and when you type 1.4 in directly it might be the binary number just smaller than 1.4. Those numbers are not equal, so Quickbase thinks you're adding a new record, not editing an existing one. Workaround: As with the previous question, there is no direct workaround. If you use fractional values in an existing key field, you are almost guaranteed to eventually run into this problem. So the first rule is don't use fractional values in a numeric key field, or other matching criteria. If it turns out that a field that has fractional values in it is natural to use as a key field, or a merge field, the best recommendation I can give you is to tweak how you define the field so that its value is always an integer. For example, if you have a [Cost] field that contains values that look like dollars and cents, and for some reason you need to use this as a merge field or a key field, I recommend that you redesign your application so that you have a [Cost in pennies] field instead, whose values are all integers. This will be safe to use as a key field, merge value or matching criteria. Hopefully this helps. We encourage you to reach out to our Care team for assistance with specific build patterns. ------------------------------ J. Michael Hammond Senior Software Engineer in Test Quickbase, Inc. ------------------------------
5 years ago Place Quickbase Discussions
863Views
5likes
0Comments
Re: Can't use pandas.read_json() with new QB RESTFul API
Hello Alexander! I do a lot of Python coding against Quick Base's RESTful JSON API, but I use the requests module. I wasn't familiar at all with the pandas module. As you can imagine, we don't know every language and every module out there, and it's tough for me to get too deeply into something that's happening within a customer's code. I was curious, though, so I went and looked up some documentation on pandas. I'm guessing here, but I think your problem is one of two things. It's possible that pandas assumes that a JSON response has one of a specific set of structures, and our JSON simply doesn't conform to any of their expectations. If this is the case, I'd suggest you use requests to just convert our response to a python dictionary and then do your own coding (like you're already showing us) to re-format it into whatever structure pandas requires for subsequent operations. It's also possible that if you pick the right values of the orient and typ parameters in pandas.read_json() you can get it to work. That error message you show suggests that you were running a Quick Base report, and that pandas picked up the data element in our response but then didn't know what to do with the fields and metadata elements. Let me know if that helps! ------------------------------ J. Michael Hammond Senior Software Engineer in Test Quick Base ------------------------------
6 years ago Place Quickbase Discussions
3Views
1like
0Comments
Re: RESTful API: An Engineering Perspective on Designing Intuitive API Responses
Hi Mark! It's a poker story, and the one-sentence version is "I once caught a jack of diamonds on the river to win a bad-beat jackpot." There are different ways I can tell it as a live story, depending on the situation and the poker knowledge of the other people. I haven't figured out how to write out a one-paragraph version. It's either one sentence or a ten-minute read. I'll stick with API reference documentation for now - enforced writing discipline! Thank you! ------------------------------ J. Michael Hammond ------------------------------
6 years ago Place Quickbase Discussions
52Views
0likes
1Comment
RESTful API: An Engineering Perspective on Designing Intuitive API Responses
"As an API user, I would like an HTTP 200 with line-by-line errors in 'Add/Update Record'." I hope that didn't scare you off. It's the summary line of a ticket I filed recently while testing Quick Base's new RESTful JSON API, and it needs some explanation. I'm going to use it as the excuse to introduce myself, and to discuss what it's like to be the QA guy on a team that is designing and implementing a new API layer on top of a platform that's been around as long as Quick Base has. This specific technical point deserves explaining, of course, but we'll be sure to get it into the API portal documentation somewhere as well. About Me I go by "JMike", which is my first initial and middle name. It sounded odd to me at first, in my teens, but after a few years it grew on me. Forty years down the road, now, it sounds completely natural. My first software job was to find bugs in MATLAB at The MathWorks in 1991. After a few years of that, I took a break to get a master's degree in numerical analysis and take some time off to play cards. That jack of diamonds in my profile picture isn't just for "J" Mike; I have a pretty good poker story about that card if you ever want to hear it. I came back to software testing and have been working in the Boston area ever since. I tend to work on products that are scientific or financial applications or software tools, and a software tester, I try very hard to achieve and maintain the perspective of a power user of the product. I use that perspective to help guide my judgment. That judgment is important, especially in the early going of the development of new functionality. It's easy and comfortable to write a bug ticket saying "When I pass a negative number to foobar() it crashes", because that's entirely a verifiable observation of fact that will probably be seen as useful and important. It gets more delicate when you want to say something more like "This whole new function is undocumented. I'm not sure what your intent was for this parameter, but can I suggest that it's not working like anyone would expect it to be used, so can you do something about that. If you're open to suggestion, how about something like this?", because there's so much judgment wrapped up in it. Development is a Team Effort I've been at Quick Base for almost a year now, and I've spent the last four months with the team working on our new RESTful APIs. There's a lot of legacy information to learn here! But I'm finding bugs and having productive discussions with the team, so I feel welcome and useful. For our new RESTful APIs, one thing I've been pushing is to sharpen up the syntax throughout. Sometimes, when you're primarily the user of an API rather than the developer of that API, it's easier for you to be objective and to see its errors, gaps, or just things that just make you go "huh?". I'd like to think that my background makes me a little more sensitive to syntactic consistency than some people are - and as a result I file a lot of picky syntax tickets and the rest of the team plows through them. So with all the background, here's a chance to segue into this specific technical point. I'll try to go from general to specific here. Bear with me. Being RESTful When we say that the new Quick Base API is "RESTful", we're basically saying that we understand the structural and design concepts of the Representational State Transfer (REST) architecture, and that we're developing with those concepts in mind. That means doing our best to provide a system that behaves according to those concepts, and to provide an API to that system that satisfies clients' expectations about the functionality and syntax available to the user of such a system. But that gets pretty deep. I guess the main abstract thing I've learned to be aware of in a RESTful system is that the server side might be layered, and some layer might be caching a bit of information. And the main concrete things I've learned are what the different HTTP methods mean, what URLs tend to look like, and what HTTP response codes mean. The result is that, in the new API, we are presenting operations in terms of standard HTTP methods, and using them in a way that we think will make intuitive sense to its users. If you're retrieving information, it will be an HTTP GET. If you're causing computation on the server, it will be an HTTP POST. If you're deleting something, it will be an HTTP DELETE. HTTP Status Codes If you are a builder who uses the Quick Base XML API, you may know that you're building and executing an HTTP POST call, sometimes with an XML body wrapped up in a <qdbapi> tag. You may have also seen that you usually get a response from our API with an HTTP 200 and a chunk of XML. (By "usually", I mean that if you set the X_QUICKBASE_RETURN_HTTP_ERROR parameter as described in https://help.quickbase.com/api-guide/optional_parameters.html, Quick Base will respond with an HTTP 400 when there's an API error. But that parameter is not commonly used by most of our builders today.) To make things more intuitive, we are making an effort to return a standard HTTP return status to all calls, rather than defining - and requiring you to learn - something like our own unique home-brewed <errcode> list from the XML API. In the broadest sense this means we will return something in the 200s (usually just 200, but sometimes more specific codes when appropriate) when your request has succeeded, something in the 400s when the request fails for a reason that is usually on the client's (your) side, and something in the 500s when your request has failed for a reason that is usually on the server's (our) side. We also have come up with a general agreement that when you call something in the new RESTful API, if we return a 400- or 500-level response, we will promise that nothing has changed on the server side - in other words, nothing has changed in your app - as a result of this request. We will also send you some additional details in the JSON body of the response: { "message": a brief error message, "details": more details about the error } Responding to Record Adds/Updates So let's get to the specific API Insert/Update records. In database terminology this is an "upsert" API, which allows you to create records and update others, even in the same API call. (I can't resist a little digression here - note that it is possible you might specify an existing record and "update" it with the same data it already has, in which case it doesn't change anything.) We have made the general architectural decision that, when you attempt to upsert records, we will allow successful row(s) to go through, while reporting specific errors on other row(s) that could not be processed. The fact that we are subdividing the request into individual rows means that each row might have a different status. And there's no industry consensus on exactly what HTTP code(s?) to return, in what structure or format, when some or all of the individual records fail in a combined request. So we made the decision to return an HTTP 200 if an insert/update records request has the correct format and "makes sense". We then process all rows that have no error, and we include a lineErrors element in the JSON body to indicate the specific error(s) in the specific row(s) that failed. Let me give you a specific example. Say your app has a table with ten records in it, whose record IDs run from 1 through 10. Say field 6 is the only required field in the table, and it is a numeric field. If you submit a request to "Insert/Update Records" with this JSON body: { "to": table_dbid, "data": [ {"6": {"value": 100}}, {"6": {"value": "illegal string value"}}, {"6": {"value": 200}} ] } you will get this response: { "data": [], "metadata": { "createdRecordIds": [ 11, 12 ], "lineErrors": { "2": [ "Incompatible value for field with ID \"6\"." ] }, "totalNumberOfRecordsProcessed": 3, "unchangedRecordIds": [], "updatedRecordIds": [] } } The way this lineErrors element got into the response is a pretty good illustration of my role on the team. Very early on in the development of this API, if you submitted a request with an error in one or more lines, you wouldn't get any direct indication about which line(s) had failed. This fits into a pattern I've seen many times in my many years as a software tester, by the way: when you are working with an API, the normal success cases are generally much better polished than the error cases. (Sometimes it's an oversight, and sometimes - as it turns out was true in this case - the developers had thought about this case but hadn't yet decided on what the result should look like.). And when a specific piece of information is missing in an API in a specific error situation, it's sometimes a lot easier for the user of the interface to see that gap in a specific situation than it was for the builder to have anticipated the error. So I filed a ticket for the developers - one of those uncomfortable ones that is about halfway between dry fact and possibly-controversial opinion. "As an API user, I would like record-by-record error messaging from the upsert API." I didn't have a specific syntax in mind; I just knew that the user would require the information. This ticket started a good conversation among the team. A solution was designed and implemented, I did a few more test iterations to find smaller problems, and now we have an error response that seems reasonable and natural. What's Next My point in bringing up this specific example is to show that our team sees API syntax as an important point, because it affects you, our customers. As you begin to learn and use this new API, you may come up with a syntax idea of your own, or see a confusing result that you wish had more documentation in the API portal. If that happens, I encourage you to let us know. Feel free to report this kind of issue as you would any others - if something in API syntax is broken or incorrect, report it to Care; if you have a suggestion for improvement you can let us know by voting or adding a new idea in Uservoice. As always, we will be listening for new ideas, and looking for ways to make your experience better than ever. ------------------------------ J. Michael Hammond ------------------------------
6 years ago Place Quickbase Discussions
APIs and custom code
92Views
3likes
3Comments