Module talk:TableTools: Difference between revisions
→removeDuplicate does not remove duplicate NaN: what is intended usage? |
|||
| Line 47: | Line 47: | ||
:: That's the Lua interpretation anyway. Even if it has a single Nan value (no distinction between signaling and non-signaling ones, or Nan's carrying an integer type, like in IEEE binary 32-bit float and 64-bit double formats, neither does Java...), there are some apps that depend on using Nan as a distinctive key equal to itself, but still different from nil. | :: That's the Lua interpretation anyway. Even if it has a single Nan value (no distinction between signaling and non-signaling ones, or Nan's carrying an integer type, like in IEEE binary 32-bit float and 64-bit double formats, neither does Java...), there are some apps that depend on using Nan as a distinctive key equal to itself, but still different from nil. | ||
:: The other kind of usage of Nan is "value not set, ignore it": when computing averages for example, Nan must not be summed and not counted, so all Nan's should be removed from the table. For this case May be there should be an option to either preserve all Nan's, or nuke them all from the result: the kill option would be tested in the if-branch of your first version, and a second alternate option tested after it would be to make Nan's unique in the result.... The first case being quite common for statistics when it means "unset", while nil means something else (such as compute this value before determinig if it's a Nan, nil bring used also for weak references that can be retreived from another slow data store, and the table storing nil being a fast cache of that slow data store) [[User:Verdy p|verdy_p]] ([[User talk:Verdy p|talk]]) 08:29, 2 February 2014 (UTC) | :: The other kind of usage of Nan is "value not set, ignore it": when computing averages for example, Nan must not be summed and not counted, so all Nan's should be removed from the table. For this case May be there should be an option to either preserve all Nan's, or nuke them all from the result: the kill option would be tested in the if-branch of your first version, and a second alternate option tested after it would be to make Nan's unique in the result.... The first case being quite common for statistics when it means "unset", while nil means something else (such as compute this value before determinig if it's a Nan, nil bring used also for weak references that can be retreived from another slow data store, and the table storing nil being a fast cache of that slow data store) [[User:Verdy p|verdy_p]] ([[User talk:Verdy p|talk]]) 08:29, 2 February 2014 (UTC) | ||
:::I had a quick look at the functions, but cannot work out the intended usage—that usage would probably determine what should happen with NaNs. The docs for <code>removeDuplicates</code> says that keys that are not positive integers are ignored—but what it means is that such keys are ''removed''. On that principle, I think it would be better to default to removing NaNs. I cannot imagine a usage example where I would want to call this function and have a NaN in the result—what would I do with it? If it were desirable to test for the presence of NaN members, why not return an extra value that is true if one or more NaNs are encountered and omitted? How would it help to ever have both <code>0/0</code> and <code>-0/0</code> (or multiple instances of <code>0/0</code>) in the result? | |||
:::Regarding verdy_p's code: I don't think it is a good idea to deviate from Lua's idioms, and if there were a <code>uniqueNan</code> parameter, it should not be tested for explicit "false" and "true" values. The function regards "hello" as neither false nor true and while that is a very defensible position, it's not how Lua code is supposed to work. [[User:Johnuniq|Johnuniq]] ([[User talk:Johnuniq|talk]]) 03:26, 3 February 2014 (UTC) | |||
Revision as of 03:26, 3 February 2014
removeDuplicate does not remove duplicate NaN
<source lang="lua"> function p.removeDuplicates(t) checkType('removeDuplicates', 1, t, 'table') local isNan = p.isNan local ret, exists = {}, {} for i, v in ipairs(t) do if isNan(v) then -- NaNs can't be table keys, and they are also unique, so we don't need to check existence. ret[#ret + 1] = v else if not exists[v] then ret[#ret + 1] = v exists[v] = true end end end return ret end </source> This should be: <source lang="lua"> function p.removeDuplicates(t, uniqueNan) checkType('removeDuplicates', 1, t, 'table') local ret, isNan, exists, hasNan = {}, p.isNan, {}, nil for _, v in ipairs(t) do -- NaNs can't be table keys in exists[], and they are also equal to each other in Lua. if isNan(v) then -- But we may want only one Nan in ret[], and there may be multiple Nan's in t[]. if uniqueNan == nil or uniqueNan == false or uniqueNan == true and not hasNan then hasNan = true ret[#ret + 1] = v end else if not exists[v] then exists[v] = true ret[#ret + 1] = v end end end return ret end </source> -- verdy_p (talk) 07:50, 2 February 2014 (UTC)
- @Verdy p: This was by design, as comparing two NaNs always results in
false. My reasoning was that since two NaNs can never be equal to each other - even if they were made by the exact same calculation - then they shouldn't be treated as duplicates by the algorithm. Although if there's some sort of precedent for doing things a different way, please let me know. I'm fairly new to the world of NaNs, after all. — Mr. Stradivarius ♪ talk ♪ 08:01, 2 February 2014 (UTC)- That's the Lua interpretation anyway. Even if it has a single Nan value (no distinction between signaling and non-signaling ones, or Nan's carrying an integer type, like in IEEE binary 32-bit float and 64-bit double formats, neither does Java...), there are some apps that depend on using Nan as a distinctive key equal to itself, but still different from nil.
- The other kind of usage of Nan is "value not set, ignore it": when computing averages for example, Nan must not be summed and not counted, so all Nan's should be removed from the table. For this case May be there should be an option to either preserve all Nan's, or nuke them all from the result: the kill option would be tested in the if-branch of your first version, and a second alternate option tested after it would be to make Nan's unique in the result.... The first case being quite common for statistics when it means "unset", while nil means something else (such as compute this value before determinig if it's a Nan, nil bring used also for weak references that can be retreived from another slow data store, and the table storing nil being a fast cache of that slow data store) verdy_p (talk) 08:29, 2 February 2014 (UTC)
- I had a quick look at the functions, but cannot work out the intended usage—that usage would probably determine what should happen with NaNs. The docs for
removeDuplicatessays that keys that are not positive integers are ignored—but what it means is that such keys are removed. On that principle, I think it would be better to default to removing NaNs. I cannot imagine a usage example where I would want to call this function and have a NaN in the result—what would I do with it? If it were desirable to test for the presence of NaN members, why not return an extra value that is true if one or more NaNs are encountered and omitted? How would it help to ever have both0/0and-0/0(or multiple instances of0/0) in the result? - Regarding verdy_p's code: I don't think it is a good idea to deviate from Lua's idioms, and if there were a
uniqueNanparameter, it should not be tested for explicit "false" and "true" values. The function regards "hello" as neither false nor true and while that is a very defensible position, it's not how Lua code is supposed to work. Johnuniq (talk) 03:26, 3 February 2014 (UTC)
- I had a quick look at the functions, but cannot work out the intended usage—that usage would probably determine what should happen with NaNs. The docs for