Trying to pull data but no dice

Laufiair Hexicola · January 23, 2018

Morning everyone, thought I'd try my hand at making a translator and so far it hasn't been going well. I have a pre-loaded string set up from Google Translator for now instead of local speech. Here's what i'm working with.

key http_request_id;
 
default
{
    state_entry()
    {
        http_request_id = llHTTPRequest("https://translate.google.com/#en/ja/a%20test", [], "");
    }
 
    http_response(key request_id, integer status, list metadata, string body)
    {
        if (request_id == http_request_id)
        {
            list mine = llParseString2List(body,["result_box"],[" "]);
            llSetText(mine, <1,1,1>, 1);
        }
    }
}

This is supose to call out to the results page, put the page into a list, and call the clause result_box and put it into llSetText. The full clause that gives the translated result is <span id=result_box class="short_text"></span>. When it compiles though, it says the alpha amount of llSetText is Function Call Mismatches type or number of arguments. Tried changing it to 0 and no results. If I comment out llSetText, it compiles fine. Moving llSetText doesn't work because then it can't find the list mine. I know i'm doing something wrong here, I just don't know what.

Love Zhaoying · January 23, 2018

You are trying to pass a list variable “mine” as the first llSetText() parameter. That needs to be a string parameter. If you know which string element of “mine” to get, you could use llList2String() to get the list entry.

http://wiki.secondlife.com/wiki/LlSetText

Love Zhaoying · January 23, 2018

P.S. Once you get the script to compile with llSetText (see my reply above), you may find this thread on llHTTPRequest() useful:

Laufiair Hexicola · January 23, 2018

yeah I just thought of that and was playing with it, only I think the list or string is truncated.

key http_request_id;
float gap = 5.0;
 
default
{
    state_entry()
    {
        http_request_id = llHTTPRequest("https://translate.google.com/#en/ja/a%20test", [HTTP_USER_AGENT, "LSL_(Mozilla Compatible)"],"");
        llSetTimerEvent(gap);
    }
    
    http_response(key request_id, integer status, list metadata, string body)
    {
        if (request_id == http_request_id)
        {
            list result = llParseString2List(body,["<html><body>", ",", "</body></html>"],[]);
            string songtitle = llList2String(result,90);
            llSetText ((string)songtitle,<1,1,1>,1);
        }
    }
    
    timer()
    {
        llResetScript();
    }
}

Is there a way to specifically search for the <span id=result_box class="short_text"></span> clause? I did try passing the list to whisper but it gave an error, and attempting to add the span i'm looking for after the <html><body> tags gave an error because the class is quited, but won't search for it otherwise. Also attempting to find the clause using the llList2String gave blank results after 40 so I figured it was truncated.

Edited January 23, 2018 by Laufiair Hexicola

Rolig Loon · January 23, 2018

From the LSL wiki:

"The response body is limited to 2048 bytes by default, see HTTP_BODY_MAXLENGTH above to increase it. If the response is longer, it will be truncated."

The default length is 2048, but you can set to a maximum of 16384.

Edited January 23, 2018 by Rolig Loon

Laufiair Hexicola · January 23, 2018

13 minutes ago, Rolig Loon said:

From the LSL wiki:

"The response body is limited to 2048 bytes by default, see HTTP_BODY_MAXLENGTH above to increase it. If the response is longer, it will be truncated."

The default length is 2048, but you can set to a maximum of 16384.

Morning Rolig, I did see that but couldn't figure out where to put it where it didn't give me a syntax error(even after defining the integer).

Love Zhaoying · January 23, 2018

4 minutes ago, Laufiair Hexicola said:

Morning Rolig, I did see that but couldn't figure out where to put it where it didn't give me a syntax error(even after defining the integer).

It goes in the parameter list which you have left empty. You don’t need to “define” the constant. You would pass it as example: [HTTP_BODY_MAXLENGTH, 16384]. The parameters list is “key/Value” pairs.

http://wiki.secondlife.com/wiki/LlHTTPRequest

Laufiair Hexicola · January 23, 2018

ok got that situated and still no dice. The page is over the limit(it's a google page). Is there a way to filter the list so there isn't as much in it? As previously stated attempting to add the span i'm looking for to the llParseString2List after HTML and BODY doesn't result anything. Heck attempting to look at list itself doesn't do anything either.

Edited January 23, 2018 by Laufiair Hexicola

Rolig Loon · January 23, 2018

nvm, Love got it.

Edited January 23, 2018 by Rolig Loon

Laufiair Hexicola · January 23, 2018

that does work but as it's a google document filled with fluff, it pushes the document's size over the limit. Tried whispering and instant messaging me the document and it gave me a stack_heap error. There has to be a better way to search for the span id i'm looking for.

Love Zhaoying · January 23, 2018

52 minutes ago, Laufiair Hexicola said:

that does work but as it's a google document filled with fluff, it pushes the document's size over the limit. Tried whispering and instant messaging me the document and it gave me a stack_heap error. There has to be a better way to search for the span id i'm looking for.

This feels like a long-shot, but make sure you have "compile as Mono" checked. It doesn't sound to me like that small script + 16384 bytes of data should cause a stack-heap error. Non-Mono ("LSO") scripts only allow 16384 total memory.

Love Zhaoying · January 23, 2018

1 hour ago, Rolig Loon said:

nvm, Love got it.

Even a broken clock is right twice a day!

Love Zhaoying · January 23, 2018

Have you considered googling for existing LSL scripts to see how they do it?

Laufiair Hexicola · January 23, 2018

i have and their outdated - like 10 years outdated. Most translators broke when google changed their api. Of course it'd also be easier if there was a language pack to use but that's difficult with the language i'd like to use this for.

Innula Zenovka · January 23, 2018

1 hour ago, Laufiair Hexicola said:

that does work but as it's a google document filled with fluff, it pushes the document's size over the limit. Tried whispering and instant messaging me the document and it gave me a stack_heap error. There has to be a better way to search for the span id i'm looking for.

I've just tried it with

        http_request_id = llHTTPRequest("https://translate.google.com/#en/ja/a%20test", [HTTP_USER_AGENT, "LSL_(Mozilla Compatible)",HTTP_BODY_MAXLENGTH,16384 ],"");

and it's worked to the extent that it's not crashed with a stack-heap collision error. However, my cube now displays the hovertext ".gbnd. gbmt", which may not be the intended result.

I'm sorry but I don't think this is going to work. The caveats in the wiki say

Quote

The response body is limited to 2048 bytes by default, see HTTP_BODY_MAXLENGTH above to increase it. If the response is longer, it will be truncated.

I take that to mean that anything in the body after the first 16384 bytes is discarded by the server before the response reaches the script, so if what you're looking for comes after the cut-off point there's nothing to be done by the time the script receives the response.

Laufiair Hexicola · January 23, 2018

13 minutes ago, Innula Zenovka said:
I've just tried it with
        http_request_id = llHTTPRequest("https://translate.google.com/#en/ja/a%20test", [HTTP_USER_AGENT, "LSL_(Mozilla Compatible)",HTTP_BODY_MAXLENGTH,16384 ],"");
and it's worked to the extent that it's not crashed with a stack-heap collision error. However, my cube now displays the hovertext ".gbnd. gbmt", which may not be the intended result.

I'm sorry but I don't think this is going to work. The caveats in the wiki say

I take that to mean that anything in the body after the first 16384 bytes is discarded by the server before the response reaches the script, so if what you're looking for comes after the cut-off point there's nothing to be done by the time the script receives the response.

yeah i'm realizing that more and more, which is done from a lag and security point of view. Looks like I get to go translate the entire english dictionary so I have an offline translator(wa-hoo). This process use to be alot simpler from what I remember(being able to remove items from the list so that other characters can fill in).

Love Zhaoying · January 23, 2018

Instead of converting to a list, why not just search for a string constant in the response, then if found parse everything past that?

Laufiair Hexicola · January 23, 2018

I admit i'm not sure how to do that.

Love Zhaoying · January 23, 2018

It’s ok..wouldn’t matter if your desired Data isn’t contained in the response due to space. Point of why I suggested it is that converting the whole response to a list probably uses a lot of memory. You’ll get some pointers from others also, I don’t want to hog the responses.

Laufiair Hexicola · January 23, 2018

you've been very helpful Love and i've appreciated it. Only other thing i can think of is to filter out the body so that i can get the next 16k of data(which is what removing characters from the list would do I think) but iwasn't able to view what's in the list to begin with lol. I'd email it to me, but that's throttled also.

Innula Zenovka · January 23, 2018

As an initial test, why not simply look for a particular string --

		if(~llSubStringIndex(body, "whatever")){
			//if the string "whatever" is to be found in the body
		}

Then if it's there -- which I doubt it will be, but no harm in looking -- then we can try to think of ways of parsing the body into a list.

Edited January 23, 2018 by Innula Zenovka

Laufiair Hexicola · January 23, 2018

10 minutes ago, Innula Zenovka said:
As an initial test, why not simply look for a particular string --
		if(~llSubStringIndex(body, "whatever")){
			//if the string "whatever" is to be found in the body
		}
Then if it's there -- which I doubt it will be, but no harm in looking -- then we can try to think of ways of parsing the body into a list.

It wont be - i went through the entire 16k of memory that is there and the string isn't there(f there was a way to set two httprequests to pull different parts of the body, it'd be in the second call, but I wasn't able to find out if that was even possible). I purchased a translator that works around the Google change and messaged them so I'll wait to hear back how they did it. This is in hiatus at the moment. Thanks for all the pointers and stuff folks, appreciated it.

Xiija · January 23, 2018

not sure if this will help..... i tried the api site.. and this kinda works?

key  XMLRequest;
string sourceLang = "en";
string targetLang = "de";
string msg = "a bunny";
string msg2 = "a%20test";
string url2;
default
{
    state_entry()
    {   url2 = "http://translate.googleapis.com/translate_a/single?client=gtx&sl=" +
        sourceLang + "&tl=" + targetLang + "&dt=t&q=" + msg + "&ie=UTF-8&oe=UTF-8";     
    }
    touch_start(integer total_number)
    {
       XMLRequest =
             llHTTPRequest( url2 , [HTTP_USER_AGENT, "XML-Getter/1.0 (Mozilla Compatible)", 
             HTTP_METHOD, "GET", 
             HTTP_MIMETYPE, "text/html;charset-utf8", 
             HTTP_BODY_MAXLENGTH,16384,
             HTTP_PRAGMA_NO_CACHE,TRUE], "");     
    }
    http_response(key k,integer status, list meta, string body)
    { 
        if(k ==  XMLRequest)   
        {    string playing =  body ;    
             playing = llUnescapeURL( playing );   
             list my_list = llParseString2List(playing,[ "[", "]" ],["."]);            
             list tmp = llCSV2List( llList2String(my_list,0) );
             string one = llList2String(tmp,0);
             llOwnerSay("got: \n" + playing);
             llOwnerSay("parsed: \n" + one);
        }
    }
}

Callum Meriman · January 24, 2018

As Xiija mentions; Rather than trying to scrape the web page, use Google's REST interface

https://cloud.google.com/translate/docs/reference/translate

You will need to register an google account to use it, but you get back simple XML and without worrying about trying to find a needle in a haystack.

Edited January 24, 2018 by Callum Meriman

Laufiair Hexicola · January 24, 2018

I'll look into it in the morning but looks promising, thanks to both of ya - owner of the translator hud said his calls out to a remove server first before sending and receiving the commands to Google. The process sounded very cumbersome and laggy to be honest.

Trying to pull data but no dice

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Please sign in to comment

Linden Lab

Tilia

Second Life

Connect With Us

Partner With Us