Jump to content

Trying to pull data but no dice


Laufiair Hexicola
 Share

You are about to reply to a thread that has been inactive for 2320 days.

Please take a moment to consider if this thread is worth bumping.

Recommended Posts

Morning everyone, thought I'd try my hand at making a translator and so far it hasn't been going well. I have a pre-loaded string set up from Google Translator for now instead of local speech. Here's what i'm working with.

key http_request_id;
 
default
{
    state_entry()
    {
        http_request_id = llHTTPRequest("https://translate.google.com/#en/ja/a%20test", [], "");
    }
 
    http_response(key request_id, integer status, list metadata, string body)
    {
        if (request_id == http_request_id)
        {
            list mine = llParseString2List(body,["result_box"],[" "]);
            llSetText(mine, <1,1,1>, 1);
        }
    }
}

This is supose to call out to the results page, put the page into a list, and call the clause result_box and put it into llSetText. The full clause that gives the translated result is <span id=result_box class="short_text"></span>. When it compiles though, it says the alpha amount of llSetText is Function Call Mismatches type or number of arguments. Tried changing it to 0 and no results. If I comment out llSetText, it compiles fine. Moving llSetText doesn't work because then it can't find the list mine. I know i'm doing something wrong here, I just don't know what.

Link to comment
Share on other sites

yeah I just thought of that and was playing with it, only I think the list or string is truncated.

key http_request_id;
float gap = 5.0;
 
default
{
    state_entry()
    {
        http_request_id = llHTTPRequest("https://translate.google.com/#en/ja/a%20test", [HTTP_USER_AGENT, "LSL_(Mozilla Compatible)"],"");
        llSetTimerEvent(gap);
    }
    
    http_response(key request_id, integer status, list metadata, string body)
    {
        if (request_id == http_request_id)
        {
            list result = llParseString2List(body,["<html><body>", ",", "</body></html>"],[]);
            string songtitle = llList2String(result,90);
            llSetText ((string)songtitle,<1,1,1>,1);
        }
    }
    
    timer()
    {
        llResetScript();
    }
}

Is there a way to specifically search for the <span id=result_box class="short_text"></span> clause? I did try passing the list to whisper but it gave an error, and attempting to add the span i'm looking for after the <html><body> tags gave an error because the class is quited, but won't search for it otherwise. Also attempting to find the clause using the llList2String gave blank results after 40 so I figured it was truncated.

Edited by Laufiair Hexicola
Link to comment
Share on other sites

13 minutes ago, Rolig Loon said:

From the LSL wiki:

"The response body is limited to 2048 bytes by default, see HTTP_BODY_MAXLENGTH above to increase it. If the response is longer, it will be truncated."

The default length is 2048, but you can set to a maximum of 16384.

Morning Rolig, I did see that but couldn't figure out where to put it where it didn't give me a syntax error(even after defining the integer).

Link to comment
Share on other sites

4 minutes ago, Laufiair Hexicola said:

Morning Rolig, I did see that but couldn't figure out where to put it where it didn't give me a syntax error(even after defining the integer).

It goes in the parameter list which you have left empty. You don’t need to “define” the constant. You would pass it as example: [HTTP_BODY_MAXLENGTH, 16384]. The parameters list is “key/Value” pairs.

http://wiki.secondlife.com/wiki/LlHTTPRequest

  • Thanks 1
Link to comment
Share on other sites

ok got that situated and still no dice. The page is over the limit(it's a google page). Is there a way to filter the list so there isn't as much in it? As previously stated attempting to add the span i'm looking for to the llParseString2List after HTML and BODY doesn't result anything. Heck attempting to look at list itself doesn't do anything either.

Edited by Laufiair Hexicola
Link to comment
Share on other sites

52 minutes ago, Laufiair Hexicola said:

that does work but as it's a google document filled with fluff, it pushes the document's size over the limit. Tried whispering and instant messaging me the document and it gave me a stack_heap error. There has to be a better way to search for the span id i'm looking for.

This feels like a long-shot, but make sure you have "compile as Mono" checked.  It doesn't sound to me like that small script + 16384 bytes of data should cause a stack-heap error. Non-Mono ("LSO") scripts only allow 16384 total memory.

Link to comment
Share on other sites

1 hour ago, Laufiair Hexicola said:

that does work but as it's a google document filled with fluff, it pushes the document's size over the limit. Tried whispering and instant messaging me the document and it gave me a stack_heap error. There has to be a better way to search for the span id i'm looking for.

I've just tried it with 

        http_request_id = llHTTPRequest("https://translate.google.com/#en/ja/a%20test", [HTTP_USER_AGENT, "LSL_(Mozilla Compatible)",HTTP_BODY_MAXLENGTH,16384 ],"");

and it's worked to the extent that it's not crashed with a stack-heap collision error.   However, my cube now displays the hovertext ".gbnd. gbmt", which may not be the intended result.

I'm sorry but I don't think this is going to work.  The caveats in the wiki say 

Quote
  • The response body is limited to 2048 bytes by default, see HTTP_BODY_MAXLENGTH above to increase it. If the response is longer, it will be truncated.

I take that to mean that anything in the body after the first 16384 bytes is discarded by the server before the response reaches the script, so  if what you're looking for comes after the cut-off point there's nothing to be done by the time the script receives the response.

  • Like 1
Link to comment
Share on other sites

13 minutes ago, Innula Zenovka said:

I've just tried it with 


        http_request_id = llHTTPRequest("https://translate.google.com/#en/ja/a%20test", [HTTP_USER_AGENT, "LSL_(Mozilla Compatible)",HTTP_BODY_MAXLENGTH,16384 ],"");

and it's worked to the extent that it's not crashed with a stack-heap collision error.   However, my cube now displays the hovertext ".gbnd. gbmt", which may not be the intended result.

I'm sorry but I don't think this is going to work.  The caveats in the wiki say 

I take that to mean that anything in the body after the first 16384 bytes is discarded by the server before the response reaches the script, so  if what you're looking for comes after the cut-off point there's nothing to be done by the time the script receives the response.

yeah i'm realizing that more and more, which is done from a lag and security point of view. Looks like I get to go translate the entire english dictionary so I have an offline translator(wa-hoo). This process use to be alot simpler from what I remember(being able to remove items from the list so that other characters can fill in).

Link to comment
Share on other sites

It’s ok..wouldn’t matter if your desired Data isn’t contained in the response due to space. Point of why I suggested it is that converting the whole response to a list probably uses a lot of memory. You’ll get some pointers from others also, I don’t want to hog the responses.

Link to comment
Share on other sites

you've been very helpful Love and i've appreciated it. Only other thing i can think of is to filter out the body so that i can get the next 16k of data(which is what removing characters from the list would do I think) but  iwasn't able to view what's in the list to begin with lol. I'd email it to me, but that's throttled also.

Link to comment
Share on other sites

As an initial test, why not simply look for a particular string -- 

		if(~llSubStringIndex(body, "whatever")){
			//if the string "whatever" is to be found in the body
		}

Then if it's there -- which I doubt it will be, but no harm in looking -- then we can try to think of ways of parsing the body into a list.

Edited by Innula Zenovka
Link to comment
Share on other sites

10 minutes ago, Innula Zenovka said:

As an initial test, why not simply look for a particular string -- 


		if(~llSubStringIndex(body, "whatever")){
			//if the string "whatever" is to be found in the body
		}

Then if it's there -- which I doubt it will be, but no harm in looking -- then we can try to think of ways of parsing the body into a list.

It wont be - i went through the entire 16k of memory that is there and the string isn't there(f there was a way to set two httprequests to pull different parts of the body, it'd be in the second call, but I wasn't able to find out if that was even possible). I purchased a translator that works around the Google change and messaged them so I'll wait to hear back how they did it. This is in hiatus at the moment. Thanks for all the pointers and stuff folks, appreciated it.

Link to comment
Share on other sites

not sure if this will help..... i tried the api site.. and this kinda works?

key  XMLRequest;
string sourceLang = "en";
string targetLang = "de";
string msg = "a bunny";
string msg2 = "a%20test";
string url2;
default
{
    state_entry()
    {   url2 = "http://translate.googleapis.com/translate_a/single?client=gtx&sl=" +
        sourceLang + "&tl=" + targetLang + "&dt=t&q=" + msg + "&ie=UTF-8&oe=UTF-8";     
    }
    touch_start(integer total_number)
    {
       XMLRequest =
             llHTTPRequest( url2 , [HTTP_USER_AGENT, "XML-Getter/1.0 (Mozilla Compatible)", 
             HTTP_METHOD, "GET", 
             HTTP_MIMETYPE, "text/html;charset-utf8", 
             HTTP_BODY_MAXLENGTH,16384,
             HTTP_PRAGMA_NO_CACHE,TRUE], "");     
    }
    http_response(key k,integer status, list meta, string body)
    { 
        if(k ==  XMLRequest)   
        {    string playing =  body ;    
             playing = llUnescapeURL( playing );   
             list my_list = llParseString2List(playing,[ "[", "]" ],["."]);            
             list tmp = llCSV2List( llList2String(my_list,0) );
             string one = llList2String(tmp,0);
             llOwnerSay("got: \n" + playing);
             llOwnerSay("parsed: \n" + one);
        }
    }
}

 

  • Thanks 1
Link to comment
Share on other sites

As Xiija mentions; Rather than trying to scrape the web page, use Google's REST interface

https://cloud.google.com/translate/docs/reference/translate

You will need to register an google account to use it, but you get back simple XML and without worrying about trying to find a needle in a haystack.

Edited by Callum Meriman
  • Thanks 1
Link to comment
Share on other sites

You are about to reply to a thread that has been inactive for 2320 days.

Please take a moment to consider if this thread is worth bumping.

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
 Share

×
×
  • Create New...