Optimizing rotation (normalization and NLERP)

2Tessa · March 15

Yesterday I was looking for ways to normalize rotations faster and i found this way that increase the speed by ~20%:

instead to execute 4 divisions:

rotation NormRot(rotation Q)
{
    float MagQ = llSqrt(Q.x*Q.x + Q.y*Q.y +Q.z*Q.z + Q.s*Q.s);
    return <Q.x/MagQ, Q.y/MagQ, Q.z/MagQ, Q.s/MagQ>;
}

you can execute 4 multiplication by the inverse

rotation NormRot(rotation Q)
{
    float iMagQ = 1/llSqrt(Q.x*Q.x + Q.y*Q.y +Q.z*Q.z + Q.s*Q.s);
    return <Q.x*iMagQ, Q.y*iMagQ, Q.z*iMagQ, Q.s*iMagQ>;
}

I also tried to find ways to compute the SQRT faster for NLERP rotation interpolation

rotation NLERP(rotation a, rotation b, float t) {
    float ti = 1-t;
    rotation r = <a.x*ti, a.y*ti, a.z*ti, a.s*ti>+<b.x*t, b.y*t, b.z*t, b.s*t>;
    float m = llSqrt(r.x*r.x+r.y*r.y+r.z*r.z+r.s*r.s); // normalize
    return <r.x/m, r.y/m, r.z/m, r.s/m>;
}

and in C there are plenty ways, but they use operators which are not available in LSL, or they lose the speed when interpreted.

Who have in-depth knowledge can understand if any of these ways can be effective in LSL.

The fastest SQRT hacks the 32bit memory representation of the float value.

float sqrt7(float x)
 {  // convert float to bitwise but LSL don't have ADDRESS type and unary operators * and &
   unsigned int i = *(unsigned int*) &x; 
   i  += 127 << 23;
   i >>= 1; 
   return *(float*) &i;
 }   

float Q_rsqrt( float number )
{
    long i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( long * ) &y;  
    i  = 0x5f3759df - ( i >> 1 );  
    y  = * ( float * ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );  

    return y;
}

float fastsqrt(float val)  {
        union
        {
                int tmp;
                float val;
        } u;
        u.val = val;
        u.tmp -= 1<<23;
        u.tmp >>= 1; 
        u.tmp += 1<<29; 
        return u.val;
}

func NewtonSqrt(x float64) float64 {
  z := 1.0 // let initial guess to be 1
  for i := 1; i <= 10; i++ {
    z -= (z*z - x) / (2*z) // MAGIC LINE!!
  }
  return z
}

double ApproximateSqrtBabylonian(double x, int precision) {
    double res = x;    
    for (int n = 0; n < precision; n++) {
        res = 0.5f * (res + x / res);
    }    return res;
}

/* by Jim Ulery */
static unsigned julery_isqrt(unsigned long val) {
    unsigned long temp, g=0, b = 0x8000, bshft = 15;
    do {
        if (val >= (temp = (((g << 1) + b)<<bshft--))) {
           g += b;
           val -= temp;
        }
    } while (b >>= 1);
    return g;
}

double fsqrt (double y) {
    double x, z, tempf;
    unsigned long *tfptr = ((unsigned long *)&tempf) + 1;

    tempf = y;
    *tfptr = (0xbfcdd90a - *tfptr)>>1; /* estimate of 1/sqrt(y) */
    x =  tempf;
    z =  y*0.5;                        /* hoist out the “/2”    */
    x = (1.5*x) - (x*x)*(x*z);         /* iteration formula     */
    x = (1.5*x) – (x*x)*(x*z);
    x = (1.5*x) – (x*x)*(x*z);
    x = (1.5*x) – (x*x)*(x*z);
    x = (1.5*x) – (x*x)*(x*z);
    return x*y;
    }

Edited March 15 by 2Tessa

elleevelyn · March 15

i think the best we can get without using any LSL library function is the Newton example. Probably using the precision parameter from the babylonian example for finer control on the result

not sure tho that it would be any faster than just using llSqrt

the quake method (Q_rsqrt) might be possibly faster if there is a substitute method for the magic number 0x5f3759df, but I dunno what that method might be, or even if there is a substitute at all

Wulfie Reanimator · March 15

3 minutes ago, elleevelyn said:

the quake method (Q_rsqrt) might be possibly faster if there is a substitute method for the magic number 0x5f3759df, but I dunno what that method might be, or even if there is a substitute at all

There's no substitute for it.

The trick relies on type-punning the value with pointers.

&i ; creates an integer pointer to the value of i.

(float *) &i ; typecasts the integer pointer to a float pointer.

* (float *) &i ; dereferences the float pointer into the value being pointed to.

The "pun" here is that the integer value of "i" is now interpreted as if it was a float all along. And since LSL has no pointers...

Frionil Fang · March 15

Like Wulfie said, most of these are no help without type punning and/or unions. There's also the fact that user-defined functions are *slow*.

From a quick test, 100000 square roots takes less than a second; 100000 user-defined square root function calls to a single-iteration Newton approximation (which is obviously terrible and useless) already takes more than twice as long. With a reasonable number of iterations, we're talking ~10x as long.

Consider what you are using these rotations for, too: nlerp is already meant to be fast and loose, functions that don't require normalization (like plain old llSetRot/PRIM_ROT) will work even if the results might not be accurate, that probably wasn't the goal in the first place. Functions that do require normalization (llSetKeyframedMotion comes to mind) won't be satisfied with a sloppy approximation, they need the magnitude of the rotation to be close enough to 1.

TL;DR stick with multiplying by 1/llSqrt(mag) for accuracy, leave it out entirely for speed if it works.

2Tessa · March 15

As alternatives of:

llSqrt( llFrand(1) )

I tried:

llPow( e , 0.5 * llLog( llFrand(1) ) )

llPow( 10 , 0.5 * llLog10( llFrand(1) ) )

llPow( llFrand(1) , 0.5 )

but even the single call of them [ llPow(), llLog(), llLog10() ] is 50-100% slower than llSqrt() 😆

elleevelyn · March 15

6 hours ago, Frionil Fang said:

From a quick test, 100000 square roots takes less than a second; 100000 user-defined square root function calls to a single-iteration Newton approximation (which is obviously terrible and useless) already takes more than twice as long. With a reasonable number of iterations, we're talking ~10x as long.

i haven't done any testing, but I think with Newton then is about up to 24 iterations (precision) on a 32-bit float type, breaking early on match

thinking about the quake method a bit more then we can get a union of some kind using fui() and uif() from the LSL Combined Library which I am not going to test as would be terribly slower (two calls to llPow) than llSqrt

this said, even if we did this then as I haven't tested it (so not sure what the fui() bit arrangement is) then if the bit arrangement is something other than a C-like union then the magic number would be something other than quake which would be discoverable but yeah

so agree with your TLDR. Use llSqrt

2Tessa · March 16

Very interesting to know that float can be stored into integer 😀

//=================================================================
// Float Union Integer (see LSL wiki)

integer fui(float input)
{
    if((input) != 0.0){
        integer sign = (input < 0) << 31;
        if((input = llFabs(input)) < 2.3509887016445750159374730744445e-38)
            return sign | (integer)(input / 1.4012984643248170709237295832899e-45);
        integer exp = llFloor((llLog(input) / 0.69314718055994530941723212145818));
        return   (0x7FFFFF & (integer)(input * (0x1000000 >> sign))) 
               | (((exp + 126 + (sign = ((integer)input - (3 <= (input /= 
                 (float)("0x1p"+(string)(exp -= ((exp >> 31) | 1)))))))) << 23 ) | sign);
    }
    return ((string)input == (string)(-0.0)) << 31;
}

float iuf(integer input)
{
    return llPow(2.0, (input | !input) - 150) 
           * (((!!(input = (0xff & (input >> 23)))) << 23) 
           | ((input & 0x7fffff))) * (1 | (input >> 31));
}

//-----------------------------------------------------------------
float e = 2.71828182845904523536028747135266249;
integer inte;
float floe;

default
{
    touch_start(integer tot_num)
    {
        llOwnerSay("Float Union Integer (float stored into integer and way back)");
        llOwnerSay("e: "+Float2Sci(e)); // does not truncate the decimals
        inte = fui(e);
        llOwnerSay("inte = fui(e): "+(string)inte);
        floe = iuf(inte);
        llOwnerSay("floe = iuf(inte): "+Float2Sci(floe)); // does not truncate the decimals
        llOwnerSay("floe = iuf(inte): "+(string)floe+" (typecasted)");
    }
}

Edited March 16 by 2Tessa

2Tessa · March 16

int_value = fui(fl_value);
int_value += 127 << 23;
int_value = int_value >> 1; 
fl_result = iuf(int_value);

this simple code works as SQRT for large float values:

llSqrt( 3.40282346e38 ) gives: 1.84467429e19

bitwiseSqrt( 3.40282346e38 ) gives: -1.84467429e19 only the sign is wrong

but lose precision for small values:

llSqrt( 2 ) gives: 1.41421353

bitwiseSqrt( 2 ) gives: 1.5 😆

Edited March 16 by 2Tessa

Optimizing rotation (normalization and NLERP)

Recommended Posts

2Tessa

Link to comment

Share on other sites

elleevelyn

Link to comment

Share on other sites

Wulfie Reanimator

Link to comment

Share on other sites

Frionil Fang

Link to comment

Share on other sites

2Tessa

Link to comment

Share on other sites

elleevelyn

Link to comment

Share on other sites

2Tessa

Link to comment

Share on other sites

2Tessa

Link to comment

Share on other sites

Please sign in to comment

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us

Forums

Blogs

Activity