Jump to content

Optimizing rotation (normalization and NLERP)


2Tessa
 Share

Recommended Posts

Yesterday I was looking for ways to normalize rotations faster and i found this way that increase the speed by ~20%:

instead to execute 4 divisions:

rotation NormRot(rotation Q)
{
    float MagQ = llSqrt(Q.x*Q.x + Q.y*Q.y +Q.z*Q.z + Q.s*Q.s);
    return <Q.x/MagQ, Q.y/MagQ, Q.z/MagQ, Q.s/MagQ>;
}

you can execute 4 multiplication by the inverse

rotation NormRot(rotation Q)
{
    float iMagQ = 1/llSqrt(Q.x*Q.x + Q.y*Q.y +Q.z*Q.z + Q.s*Q.s);
    return <Q.x*iMagQ, Q.y*iMagQ, Q.z*iMagQ, Q.s*iMagQ>;
}

 

I also tried to find ways to compute the SQRT faster for NLERP rotation interpolation

rotation NLERP(rotation a, rotation b, float t) {
    float ti = 1-t;
    rotation r = <a.x*ti, a.y*ti, a.z*ti, a.s*ti>+<b.x*t, b.y*t, b.z*t, b.s*t>;
    float m = llSqrt(r.x*r.x+r.y*r.y+r.z*r.z+r.s*r.s); // normalize
    return <r.x/m, r.y/m, r.z/m, r.s/m>;
}

and in C there are plenty ways, but they use operators which are not available in LSL, or they lose the speed when interpreted.

Who have in-depth knowledge can understand if any of these ways can be effective in LSL.

The fastest SQRT hacks the 32bit memory representation of the float value.

float sqrt7(float x)
 {  // convert float to bitwise but LSL don't have ADDRESS type and unary operators * and &
   unsigned int i = *(unsigned int*) &x; 
   i  += 127 << 23;
   i >>= 1; 
   return *(float*) &i;
 }   

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;  
    i  = 0x5f3759df - ( i >> 1 );  
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );  

    return y;
}

float fastsqrt(float val)  {
        union
        {
                int tmp;
                float val;
        } u;
        u.val = val;
        u.tmp -= 1<<23;
        u.tmp >>= 1; 
        u.tmp += 1<<29; 
        return u.val;
}

func NewtonSqrt(x float64) float64 {
  z := 1.0 // let initial guess to be 1
  for i := 1; i <= 10; i++ {
    z -= (z*z - x) / (2*z) // MAGIC LINE!!
  }
  return z
}

double ApproximateSqrtBabylonian(double x, int precision) {
    double res = x;    
    for (int n = 0; n < precision; n++) {
        res = 0.5f * (res + x / res);
    }    return res;
}

/* by Jim Ulery */
static unsigned julery_isqrt(unsigned long val) {
    unsigned long temp, g=0, b = 0x8000, bshft = 15;
    do {
        if (val >= (temp = (((g << 1) + b)<<bshft--))) {
           g += b;
           val -= temp;
        }
    } while (b >>= 1);
    return g;
}

double fsqrt (double y) {
    double x, z, tempf;
    unsigned long *tfptr = ((unsigned long *)&tempf) + 1;

    tempf = y;
    *tfptr = (0xbfcdd90a - *tfptr)>>1; /* estimate of 1/sqrt(y) */
    x =  tempf;
    z =  y*0.5;                        /* hoist out the “/2”    */
    x = (1.5*x) - (x*x)*(x*z);         /* iteration formula     */
    x = (1.5*x)  (x*x)*(x*z);
    x = (1.5*x)  (x*x)*(x*z);
    x = (1.5*x)  (x*x)*(x*z);
    x = (1.5*x)  (x*x)*(x*z);
    return x*y;
    }

 

Edited by 2Tessa
Link to comment
Share on other sites

  • 2Tessa changed the title to Optimizing rotation (normalization and NLERP)

i think the best we can get without using any LSL library function is the Newton  example. Probably using the precision parameter from the babylonian example for finer control on the result

not sure tho that it would be any faster than just using llSqrt

the quake method (Q_rsqrt) might be possibly faster if there is a substitute method for the magic number 0x5f3759df, but I dunno what that method might be, or even if there is a substitute at all

 

  • Thanks 1
Link to comment
Share on other sites

3 minutes ago, elleevelyn said:

the quake method (Q_rsqrt) might be possibly faster if there is a substitute method for the magic number 0x5f3759df, but I dunno what that method might be, or even if there is a substitute at all

There's no substitute for it.

The trick relies on type-punning the value with pointers.

&i ; creates an integer pointer to the value of i.

(float *) &i ; typecasts the integer pointer to a float pointer.

* (float *) &i ; dereferences the float pointer into the value being pointed to.

The "pun" here is that the integer value of "i" is now interpreted as if it was a float all along. And since LSL has no pointers...

  • Like 1
Link to comment
Share on other sites

Like Wulfie said, most of these are no help without type punning and/or unions. There's also the fact that user-defined functions are *slow*.

From a quick test, 100000 square roots takes less than a second; 100000 user-defined square root function calls to a single-iteration Newton approximation (which is obviously terrible and useless) already takes more than twice as long. With a reasonable number of iterations, we're talking ~10x as long.

Consider what you are using these rotations for, too: nlerp is already meant to be fast and loose, functions that don't require normalization (like plain old llSetRot/PRIM_ROT) will work even if the results might not be accurate, that probably wasn't the goal in the first place. Functions that do require normalization (llSetKeyframedMotion comes to mind) won't be satisfied with a sloppy approximation, they need the magnitude of the rotation to be close enough to 1.

TL;DR stick with multiplying by 1/llSqrt(mag) for accuracy, leave it out entirely for speed if it works.

  • Thanks 1
Link to comment
Share on other sites

As alternatives of:

llSqrt( llFrand(1) )

I tried:

llPow( e , 0.5 * llLog( llFrand(1) ) )

llPow( 10 , 0.5 * llLog10( llFrand(1) ) )

llPow( llFrand(1) , 0.5 )

but even the single call of them [ llPow(), llLog(), llLog10() ] is 50-100% slower than llSqrt()  😆

Link to comment
Share on other sites

6 hours ago, Frionil Fang said:

From a quick test, 100000 square roots takes less than a second; 100000 user-defined square root function calls to a single-iteration Newton approximation (which is obviously terrible and useless) already takes more than twice as long. With a reasonable number of iterations, we're talking ~10x as long.

 

i haven't done any testing, but I think with Newton then is about up to 24 iterations  (precision) on a 32-bit float type, breaking early on match

thinking about the quake method a bit more then we can get a union  of some kind using fui() and uif() from the LSL Combined Library which I am not going to test as would be terribly slower (two calls to llPow) than llSqrt

this said, even if we did this then as I haven't tested it (so not sure what the fui() bit arrangement is) then if the bit arrangement is something other than a C-like union then the magic number would be something other than quake which would be discoverable but yeah

so agree with your TLDR. Use llSqrt

  • Thanks 1
Link to comment
Share on other sites

Posted (edited)

Very interesting to know that float can be stored into integer 😀

//=================================================================
// Float Union Integer (see LSL wiki)

integer fui(float input)
{
    if((input) != 0.0){
        integer sign = (input < 0) << 31;
        if((input = llFabs(input)) < 2.3509887016445750159374730744445e-38)
            return sign | (integer)(input / 1.4012984643248170709237295832899e-45);
        integer exp = llFloor((llLog(input) / 0.69314718055994530941723212145818));
        return   (0x7FFFFF & (integer)(input * (0x1000000 >> sign))) 
               | (((exp + 126 + (sign = ((integer)input - (3 <= (input /= 
                 (float)("0x1p"+(string)(exp -= ((exp >> 31) | 1)))))))) << 23 ) | sign);
    }
    return ((string)input == (string)(-0.0)) << 31;
}

float iuf(integer input)
{
    return llPow(2.0, (input | !input) - 150) 
           * (((!!(input = (0xff & (input >> 23)))) << 23) 
           | ((input & 0x7fffff))) * (1 | (input >> 31));
}

//-----------------------------------------------------------------
float e = 2.71828182845904523536028747135266249;
integer inte;
float floe;

default
{
    touch_start(integer tot_num)
    {
        llOwnerSay("Float Union Integer (float stored into integer and way back)");
        llOwnerSay("e: "+Float2Sci(e)); // does not truncate the decimals
        inte = fui(e);
        llOwnerSay("inte = fui(e): "+(string)inte);
        floe = iuf(inte);
        llOwnerSay("floe = iuf(inte): "+Float2Sci(floe)); // does not truncate the decimals
        llOwnerSay("floe = iuf(inte): "+(string)floe+" (typecasted)");
    }
}

 

Edited by 2Tessa
Link to comment
Share on other sites

Posted (edited)
int_value = fui(fl_value);
int_value += 127 << 23;
int_value = int_value >> 1; 
fl_result = iuf(int_value);

this simple code works as SQRT for large float values:

llSqrt( 3.40282346e38 ) gives: 1.84467429e19

bitwiseSqrt( 3.40282346e38 ) gives: -1.84467429e19   only the sign is wrong

but lose precision for small values:

llSqrt( 2 ) gives: 1.41421353

bitwiseSqrt( 2 ) gives: 1.5  😆

Edited by 2Tessa
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...