static/stationary source text packer/unpacker

irihapeti · June 4, 2014

this an example static/stationary source text packer/unpacker

is not a compressor that uses a predictor model. It just encodes the position of chars using ari. Can mod to use a predictor model as you like

+

the ari codes is a LSL port of this guys codes: Mark Nelson
http://marknelson.us/1991/02/01/arithmetic-coding-statistical-modeling-data-compression/

who ported it from these guys: Ian H. Witten, Radford Neal, and John Cleary
http://dl.acm.org/citation.cfm?id=214762.214771

who got it off this guy: Jorma Rissanen
http://domino.watson.ibm.com/tchjr/journalindex.nsf/4ac37cf0bdc4dd6a85256547004d47e1/53fec2e5af172a3185256bfa0067f7a0?OpenDocument

who got the idea from this guy: Claude Shannon
http://cm.bell-labs.com/cm/ms/what/shannonday/paper.html

+

i havent tested it on all possible UTF-16 text inputs bc can be 1,114,112 different UTF-16 chars. But seems ok so far. if you break it on some combination then I will fix it if can repro it

+

also about pigeons

http://en.wikipedia.org/wiki/Pigeonhole_principle

+

the codes:

ETA: missing mask in put(). so 1b

ETA again: 1c. Change the license text. see convo below. Also added a mask to Flush bc is a rare case when will send a extra UTF char to encoded when not necessary if not masked

ETA: 1d: fixes for 3 catatonic conditions

1e: fix for (q * (p + 1)) where q = 65536 and p = 32767. in practice this condition wont be reached. but is messy to leave and not fix

// example static/stationary UTF-16 string packer/unpacker
// version: packer 1e
// Public Domain June 2014 by irihapeti
// credits:
// - ari encoder: is a LSL port of the Mark Nelson coder
// - chr() and ord() are LSL derivatives of code owned by 
//   Unicode Consortium
// notes:
// - while this example packer uses arithmetic encoder it
//   do not include a predictor. Mod if you want this for
//   own usecase
// - it wont pack every possible string bc pigeons
// - works on SL viewer 3.7.9 (290131) May 19 2014 18:13:17
//   and Windows 8.1 Update. dunno about earlier or later
//   versions or LSL Editor or TPVs or other OS
// usage:
//   e = Pack(s)
//   s = Unpack(e)
// when s cant be packed then Pack() returns empty string ""
//
// pkA is static alphabet. General case is alphabet = ""
// - if have usecase where inputs typically use same chars:
//   punctuation, math, common language symbols, etc; then
//   when include in a static alphabet those chars dont need
//   to be transmitted to unpacker when it has the same
//   static alphabet as packer, for some further space saving
// - chars not found in static alphabet are added to encoded
//   output so they can be decoded
// - example static alphabets:
//   string pkA = " .,;:!?'\"()-etaoinshrdlumcféóù";  // multilang convo
//   string pkA = " .,$-0123456789";                  // currency
//  string pkA = "w.htps:\\comrgne";                  // url

string pkA = "";

string chr(integer n)
{   // map n to UTF-16 char range >= 0x800
    return llBase64ToString(llIntegerToBase64(
        0xE0808000 | (((n + 0x800) & 0xF000) << 12) |
        (((n + 0x800) & 0x0FC0) << 10) | ((n & 0x003F) << 8) ));
}

integer ord(string s)
{   // unmap n from UTF-16 char range >= 0x800
    integer n = llBase64ToInteger(llStringToBase64(s));
    return (((n >> 12) & 0xF000) | ((n >> 10) & 0x0FC0) |
        ((n >> 8) & 0x003F)) - 0x800;
}

integer gpB;  // bit buffer
integer gpC;  // count of bits in bit buffer
integer gpI;  // index into encoded string
string  gpS;  // encoded string

integer get()
{   // get a bit from encoded string
    if (gpC == 0)
    {
        gpB = ord(llGetSubString(gpS, gpI, gpI));
        gpI++;
        gpC = 15;
    }
    return (gpB >> (--gpC)) & 0x1;
}

put(integer n)
{   // put a bit to encoded string
    gpB = (gpB << 1) | (n & 0x1);
    gpC++;
    if (gpC == 15)
    {
        gpS += chr(gpB & 0x7FFF);
        gpC = 0;
    }
}

// --- packer ---

string Pack(string s)
{
    integer w = llStringLength(s);
    if ((w < 1) || (w > 32767)) return "";  // > 32767 will overflow

    // initialise
    integer i = 0;   // index of current char in input s
    string b = pkA;  // set alphabet buffer
    if (b == "")     // get 1st char into buffer and inc to next char
    {
        b = llGetSubString(s, 0, 0);
        i = 1;
    }
    integer m = 1 + llStringLength(b);  // ari magnitude
    integer y = 0xFFFF;                 // ari high
    integer x = 0;                      // ari low
    integer u = 0;                      // ari underflow
    integer q;                          // ari range

    // encode
    for(; i < w; i++)
    {
        string c = llGetSubString(s, i, i);
        integer p = llSubStringIndex(b, c);
        if (p < 0)  // is not in b so point to length of b
            p = m - 1;
        q = (y - x) + 1;                    // get the range
        y = x + (((q * (p + 1)) / m) - 1);  // set ari high
        x += ((q * p) / m);                 // set ari low
        for( ; 1 ; )
        {
            if ((x & 0x8000) == (y & 0x8000))
            {   // send same MSB and any underflow to output
                q = ((x & 0x8000) == 0x8000);
                put(q);
                for (; u > 0; u--) put(!q);
            }
            else if (((x & 0x4000) == 0x4000) && ((y & 0x4000) == 0))
            {   // x = 01.. y = 10.. so
                u++;               // inc underflow count
                x = (x & 0x3FFF);  // deduct from ari low
                y = (y | 0x4000);  // add to ari high
            }
            else jump break;       // done when above not true
            // remove MSBs bc no longer needed and mask bc LSL int32
            x =  (x << 1) & 0xFFFF;
            y = ((y << 1) | 0x1) & 0xFFFF;  // and add 1 to ari high
        }
        @break;
        
        if (p == (m - 1)) // is newfound char c not in buffer so
        {
            b += c;  // append newfound c to alphabet buffer
            m++;     // inc ari magnitude
        }
    }

    // flush ari
    q = ((x & 0x4000) == 0x4000);
    put(q);
    for ( ; u >= 0; u--) put(!q);    
    while(gpC) put(0);  // align/fill last output char

    // finish
    // when packed output is less than input return packed else return empty
    s = "";
    b = llGetSubString(b, llStringLength(pkA), m - 1);
    m = llStringLength(b);
    if (2 + m + llStringLength(gpS) < w)
        // length(unpackedinput) + length(extraalphabet) + extraalphabet + encoded
        s = chr(w) + chr(m) + b + gpS;
    // free global memory and tidy
    gpS = ""; gpC = 0;

    return s;
}
// --- end packer ---

// --- unpacker ---

string Unpack(string s)
{
    // initialise
    integer w = ord(llGetSubString(s, 0, 0));  // length(unpackedoutput)
    integer m = ord(llGetSubString(s, 1, 1));  // length(extraalphabet)

    if ((w < 1) || (w > 32767) || (m > w)) return "";

    string b = pkA;  // set alphabet buffer
    if (m > 0)       // append extraalphabet to buffer
        b += llGetSubString(s, 2, 2 + m);
    gpS = llDeleteSubString(s, 0, 1 + m);  // set feed to 1st encoded char
    s = "";                                // reuse for output

    // prep ari
    integer y = 0xFFFF;  // ari high
    integer x = 0;       // ari low
    integer z = 0;       // ari code
    integer q;           // ari range

    integer i;
    for ( ; i < 16; i++)
        z = (z << 1) | get();

    i = 0;       // init output counter
    m = 1 + llStringLength(pkA);      // ari magnitude
    if (m == 1)  // is no static alphabet so
    {
        s = llGetSubString(b, 0, 0);  // send 1st char to output
        i = 1;                        // inc output counter
        m = 2;                        // inc ari magnitude
    }

    // decode
    for ( ; i < w; i++)
    {
        q = (y - x) + 1;                            // get ari range
        integer p = ((((z - x) + 1) * m) - 1) / q;  // get position of char in buffer b
        y = x + (((q * (p + 1)) / m) - 1);          // set ari high
        x = x + ((q * p) / m);                      // set ari low
        for ( ; 1 ; )
        {
            if ((x & 0x8000) == (y & 0x8000)){}     // skip for below
            else if (((x & 0x4000) == 0x4000) && ((y & 0x4000) == 0))
            {
                x = (x & 0x3FFF);  // deduct from ari low
                y = (y | 0x4000);  // add to ari high
                z = (z ^ 0x4000);  // flip ari code bit
            }
            else jump break; // done when above not true
            // remove MSBs bc no longer needed and mask bc LSL int32
            x =  (x << 1) & 0xFFFF;
            y = ((y << 1) | 1) & 0xFFFF;      // and add 1 to ari high
            z = ((z << 1) | get()) & 0xFFFF;  // and get new bit from encoded
        }
        @break;
        
        s += llGetSubString(b, p, p);  // append char in b to output s
        if (p == (m - 1)) m++;         // inc ari magnitude when newfound char
    }

    // finish
    gpS = ""; gpC = 0; gpI = 0;        // free global memory and tidy
    
    return s;
}
// --- end unpacker ---


// --- some test codes ---

default
{
    touch_end(integer total_number)
    {
        string s = "\"Ca va Mademoiselle? :-) Parlez-vous francais?\" Perdóname, yo sólo hablo Inglés. \"Anglais! Enchanté! vous êtes belle. ;-)\" Parli più lentamente per favore, non parlo molto bene l'italiano. \"Non! lol je suis francais!\" really? \"Oui! really.\" like relly really? \"Why you make it difficult for me? lol.\" nã te mea me aha hoki q; (: \"grr! ;-p\" jejejje (:";
        
        llOwnerSay("begin... pack/unpack check");
        llOwnerSay(s);
        llOwnerSay("length input: " + (string)llStringLength(s));

        string e = Pack(s);
        llOwnerSay(e);
        llOwnerSay("length packed: " + (string)llStringLength(e));

        string u = Unpack(e);
        llOwnerSay(u);
        llOwnerSay("length unpacked: " + (string)llStringLength(u));

        if (u == s)
            llOwnerSay("ok: unpacked is the same as input");
        else
            llOwnerSay("err: unpacked is not the same as input");

        llOwnerSay("...end");
    }
}

LepreKhaun · June 4, 2014

I'm delighted that you are following my suggestion to properly comment your code, but if you wish to be taken seriously as a programmer you should correctly use licenses when including or modifying the work of other people. Disregarding the terms of both the CC-BY-3.0 (which covers anything published under that license) and the CC-BY-SA (which covers anything extracted from the LSL Portal wiki) might lead beginners to believe that these can be ignored with impunity, which is definitely not the case. Full, correct attribution and following licensing requirements is a sign of professionalism.

Here's a link to a introduction to these licenses and how to correctly use them. Please take the time to study it and have enough respect for your fellow content creators and other beginners to follow them properly.

irihapeti · June 5, 2014

LepreKhaun wrote:

I'm delighted that you are following my suggestion to properly ...

LOLcat - Betterthanyouness.jpg

Qwalyphi Korpov · June 5, 2014

Thank you for gently and politely pointing out some of the licensing issues of this item.

It's a puzzle to see a mention of CC-BY-3 and then no one listed for attribution. ("chr() and ord() are ccby3 license: LSL Common Library") I'm unable to find an LSL Common Library. Perhaps it was the Combined Library.

I'll make a final tiny point - The LSL Portal WIKI is CC-BY-SA-3 unless otherwise noted.

irihapeti · June 6, 2014

true

i have removed the ref to LSL and replace with the proper credit

+

all of the UTF mapping codes in the LSL Combined Library (and every other UTF codes found on the LSL wiki) can be sourced to Unicode Consortium (as can every other derivative codes in whatever language ever written)

references to Unicode Consortium sourcecode are:

http://www.unicode.org/faq/utf_bom.html
http://gears.googlecode.com/svn/trunk/third_party/convert_utf/ConvertUTF.c

other reference Unicode Consortium members sources include:

IBM: http://site.icu-project.org/

Apple: http://www.opensource.apple.com/source/JavaScriptCore/JavaScriptCore-721.26/wtf/unicode/UTF8.cpp

+

you mention multiple issues? if have another specific one then I be happy to address as well

Qwalyphi Korpov · June 6, 2014

irihapeti wrote:

true

i have removed the ref to LSL and replace with the proper credit

+

all of the UTF mapping codes in the LSL Combined Library (and every other UTF codes found on the LSL wiki) can be sourced to Unicode Consortium (as can every other derivative codes in whatever language ever written)

references to Unicode Consortium sourcecode are:

http://www.unicode.org/faq/utf_bom.html

http://gears.googlecode.com/svn/trunk/third_party/convert_utf/ConvertUTF.c

other reference Unicode Consortium members sources include:

IBM:

http://site.icu-project.org/

Apple:

http://www.opensource.apple.com/source/JavaScriptCore/JavaScriptCore-721.26/wtf/unicode/UTF8.cpp

+

you mention multiple issues? if have another specific one then I be happy to address as well

First I should say that I'm not a lawyer or copyright licensing expert. So no one should rely on my interpretation or sue me. Also copyright licensing is a swamp that I won't spend much time in.

One other issue was declaring a derivative of CC-BY work to be Public Domain. The Creative Commons people recommend against doing that.

A more difficult issue is your way of handling attribution. You call it giving proper credit.

Copyright provides protection for original works of authorship. It doesn't protect ideas, facts, or methods of operation. (http://www.copyright.gov/help/faq/faq-protect.html) Scripts (Computer Software) can be copyrighted. However there are limitations to what copyright protects. (http://www.copyright.gov/circs/circ61.pdf) For example: Ideas, program logic and algorithms are not copyright protected.

Creative Commons licenses are copyright licenses. (http://en.wikipedia.org/wiki/Creative_Commons_license) CC-BY licenses require proper attribution of the copyright holder.

Now when you list a "proper credit" it's unclear whether you are attributing a source of ideas (not copyright protected) or attributing regarding a copyrighted source you derived from. Perhaps it's both. Which makes it difficult for users of your contribution to properly attribute for copyright licensing.

You appear to believe that because content in the LSL Combined Libary is in some way based on content of the Unicode Consortium you are free to ignore it's CC-BY licensing. This makes me think that your ideas about licensing and copyright have little connection to mine.

irihapeti · June 8, 2014

about copyrights

copyrights dont override the licenses of codes from which a written work is derived

the issue isnt the texts of copyright notices or the opinions of people who use Creative Commons or even the designers of the CCs themselves or any other copyright notice writers and users

the issue is the mis-application of copyright notices applied to derivative written works that subvert the intent of property holders licenses whether the subversion is unintentional or not

if you not sure what this means then please consult a actual IP lawyer

+

about algos

i dunno where you are in the RL but in the USA algos/processes are patentable. So need be careful about that when typing stuff up

+

about public domain

Public Domain isnt a use license or a copyright. Is a declaration. I declare that as the author of this written work I give up all legal and moral rights that I may have to it now and forever. Is not a transfer of rights to anyone. Is an abandonment/forfeiture by me the author of any lawful rights and claims to this property that I do have

a person taking possession need be aware that taking this property dont confer/transfer any rights I had to them. They are only the possessor of a abandoned/forfeited property

their taking dont give them license or ownership of any property within it that belongs to someone else who has not abandoned their rights. In this case Unicode Consortium use licenses. The publishing rights of Mr Nelson and the intellectual properties owned by IBM as they relate to this LSL port of arithmetic encoding. A new possessor of this work can use it and do whatever with it as long as they secure for themself the appropriate use rights/licenses/permissions from those owners

they are free tho to rip out the headers and any mention of me in their own workings or representations. They have no obligations to me and I have none to them either

+

about the chr() and ord()

i get the chr() from the author in a convo about something else sometime ago now. Was pretty cool that he done that. I since modded it for different purposes and other things. After one mod I also reverse that mod to make the ord() bc code symmetry

when he gave me he never confer property rights on me. nor did he assert property rights for himself. he just gave bc nice person and he thought was interesting what I was doing and how it might help to make that go better. So I was pretty happy about that. At the time I tell him that and I would see what else can be done with it

this packer/unpacker codes is a what else

this and the other stuffs I have posted are I think only really of academic interest to people interested in these kinds of things. That somebody might find a actual use for them is cool but is not really why I do stuff like this. I just like puzzles and how might a puzzle be solved. Is not always the best/optimum answer but it is a answer and can be explored further by people interested

+

another what else

can rip out the ari encoder/decoder and replace with canonical huffman or a bitstuff of some kind which are unencumbered by others licenses and property rights

a example bitstuff encoder is something like: (i havent tested this tho. just type it up here off the top my head. but is how it goes generally)

// Public Domain June 2014 by irihapetiw = llStringLength(s);b = llGetSubString(s, 0, 0);m = 1;q = 1;for (i = 1; i < w; i++){   c = llGetSubString(s, i, i);   p = llSubStringIndex(b, c);   if (p < 0) p = m;   for (j = 0; j < q; j++)      put((p >> j) & 0x1);   if (p == m)   {        b += c;      m++;      if (m > ((1 << q) - 1)) q++;   }      }while (gpC) put(0);

it dont pack as well as ari does tho for all cases. In the case where m is a power of 2 then is as good as stationary ari and only then

Qwalyphi Korpov · June 8, 2014

Your explanations me give me no confidence in your script licensing information.

irihapeti · June 8, 2014

a Public Domain declaration is not a license. but then I already said that

irihapeti · June 10, 2014

Qwalyphi Korpov wrote:

I'll make a final tiny point - The LSL Portal WIKI is CC-BY-SA-3

unless otherwise noted

.

i been meaning to pick up on this point so I do now

it perputates a myth about the LSL ToS as it relates to this topic

a bit like the myth about posting opensource scripts to the LSL scripting forum that took hold on here for months. Which lead some forumites to indulge themselfs in quite toxic and unpleasant behaviours toward new resident scripters for months. Until was pointed out to them that their toxic behaviours were actual based on a myth

+

anybody who thinks that the LL ToS relating to the LSL wiki CCbySA3 applies to all codes unless otherwise stated needs to rethink this

is trivially provable this by example

take a GPL or Apache or BSD or any other licensed code. Retype it in LSL, mod it, and post to the wiki without credit or reference to the license source

the absence of these references (as meant by the ToS) does not mean that the license of the sources from which the port/mod has been derived is somehow negated and CCbySA3 can be used with impunity in lieu

nor can it be inferred to the reader that a wiki contributors own attached "license" or "copyright notice" makes it somehow ok sans absence. Whether that inferral is unintentional or not

the ToS means that you cannot post codes to the LSL wiki derived from licensed works without crediting the sources and informing the wiki reader of this

the ToS means that only original works can be posted to the LSL wiki without reference. And is only those codes to which CCbySA3 applies when the creator of the original work does not state otherwise

sometimes tho people who are hobby/recreational scripters are unaware of this. So is understandable when they do post stuff like this unintentionally

but for industry workers who do this then is not understandable or acceptable. Due diligence yes??

+

is really weird to still find people on the LSL scripting forums who will baw in support of the illegimate claims to rights of wiki contributors when those contributions are not sourced themselves to the licensed works from which they are derived

is kinda like LL should like do something about them copybotters what like stole my stuff that I like stole off someone else. like totally ! totes even

Qwalyphi Korpov · June 23, 2014

irihapeti wrote:

Qwalyphi Korpov wrote:

I'll make a final tiny point - The LSL Portal WIKI is CC-BY-SA-3

unless otherwise noted

.

i...

I won't comment on your comments I will say that my statement is based on the following line which appears at the bottom of every page of the LSL Portal WIKI.

Creative Commons Attribution-Share Alike 3.0

unless otherwise noted."

irihapeti · June 25, 2014

Qwalyphi Korpov wrote:

irihapeti wrote:

Qwalyphi Korpov wrote:

I'll make a final tiny point - The LSL Portal WIKI is CC-BY-SA-3

unless otherwise noted

.

i...

I won't comment on your comments I will say that my statement is based on the following line which appears at the bottom of every page of the LSL Portal WIKI.

Creative Commons Attribution-Share Alike 3.0

unless otherwise noted."

can accept that this can give people the impression that you have

is not true tho. LL cannot change the license of a license for a derived works, just bc the poster/contributor didnt note that their posted work is covered by a existing license given to the poster by the licensor

was a little chat about this in a other thread. Where I think it gets messy is when we mix up our copyrights and licenses

in the other chat was mentioned how HiFi are doing it. Each script posted to their example library has both a copyright notice and a license. Which is a good thing I think

+

i just reiterate what I said about this script I posted. I had copyright until I declare it public domain. The IP in it is not mine. Never was. The IP is covered by the license of the respective owners. Free use license for any purpose from Unicode. Commercial fee (patent) payable license from IBM if use their IP commercially. Is no fee payable if used for academic purposes

if dont want to enter into a commercial relationship with IBM then can mod to use the bitstuff algo instead. Is based on Shannon-Fano algo to which no IP/patent applies

Qwalyphi Korpov · June 25, 2014

irihapeti wrote:

Qwalyphi Korpov wrote:

irihapeti wrote:

Qwalyphi Korpov wrote:

I'll make a final tiny point - The LSL Portal WIKI is CC-BY-SA-3

unless otherwise noted

.

i...

I won't comment on your comments I will say that my statement is based on the following line which appears at the bottom of every page of the LSL Portal WIKI.

Creative Commons Attribution-Share Alike 3.0

unless otherwise noted."

can accept that this can give people the impression that you have

...

I responded to your previous post for a two reasons.

You quoted me out of context.
You removed a portion of my post without indication.

The statement you quoted was a small clarification related to this statement by LepreKhaun:

"Disregarding the terms of both the CC-BY-3.0 (which covers anything published under that license) and the CC-BY-SA (which covers anything extracted from the LSL Portal wiki) might lead beginners to believe that these can be ignored with impunity, which is definitely not the case."

My post in full which you quoted in part was:

"Thank you for gently and politely pointing out some of the licensing issues of this item.

It's a puzzle to see a mention of CC-BY-3 and then no one listed for attribution. ("chr() and ord() are ccby3 license: LSL Common Library") I'm unable to find an LSL Common Library. Perhaps it was the Combined Library.

I'll make a final tiny point - The LSL Portal WIKI is CC-BY-SA-3 unless otherwise noted."

Now, rather than accept my statement for what it was, you continue to distort it. I see that further responses will be of no use.

irihapeti · June 26, 2014

i never distorted anything. i looked at what you wrote. I investigate the assertion of CC-by-3/x to the LSL codes dealing with unicodes on the wiki

can be shown that they are derivatives. So i change the text in the posted script to reflect ownership

LepreKhaun · June 28, 2014

Qwalyphi Korpov wrote:

... I see that further responses will be of no use.

The OP apparently has some agenda they are attempting to further here or some point they are trying to make that precludes any appeal to reason or fact. It's also obvious at this point that they thrive on argument, dissension and the attention they get by flaunting the rules. The only way I see to deal with that is to do no more than point out their misbehavior so others don't get the idea that it's appropriate in our forums and, if they persist, report it to the moderators. That would at least lower the incidence of disruptive noise until this problem is resolved.

For those that do wish to be recognized as creditable programmers, modifying or incorporating other people's work into your own efforts is allowable, but only as long as you follow the licensing that applies to your source. It's not difficult to do and shows others that you take enough pride in your creation to show due respect to the original author(s) whose work you are building upon.

irihapeti · June 29, 2014

LepreKhaun wrote:

For those that do wish to be recognized as creditable programmers, modifying or incorporating other people's work into your own efforts is allowable, but only as long as you follow the licensing that applies to your source. It's not difficult to do and shows others that you take enough pride in your creation to show due respect to the original author(s) whose work you are building upon.

agree

static/stationary source text packer/unpacker

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Please sign in to comment

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us