Jump to content

Recommended Posts

Just now, Scylla Rhiadra said:

Well, don't. I'm anxious to learn, so if you're not willing to engage here, point me to something that shows that I'm wrong, and that ChatGPT, Midjourney, etc., are not built upon vast databases of tokenized content produced by humans.

Show me how they cite and credit their sources. Can you point me to something that reveals that permission has always been granted for the use of the work of humans? Or that writers, translators, and graphic designers are not being put out of work by software that uses their own productions to replace them?

No, it's ok Scylla, sorry I sounded a bit passive aggressive-y, but genuinely I thought better of it and retracted what I was going to say.

Anyway, my take on anything AI related is no more valid than yours. We'll all just have to wait and see I guess.

Not too long though apparently, Mr Altman says we'll have ASI within "1000s of days" which I reckon is just over 3 years and does correspond to the 2027 date everyone else seems to be kicking down the road.

Please keep doing what you're doing (you don't need my permission to do that!), I genuinely enjoy the counter opinion and I do learn from and take on board things from your position - even if I can't quite bring myself to admit it at the time :) 

All the Best

Jackson.

  • Like 2
Link to comment
Share on other sites

3 minutes ago, Love Zhaoying said:

Think I saw something recently about how they will cite "made up" sources. Basically, they lie a lot.

Yes, this is one of the things they can "hallucinate." I can confirm this from personal experience: I have colleagues who have been cited for articles that they didn't write and that literally don't exist, and I've had student papers that generate fake citations because they've been told that something needs to be cited. The problem is, in part, that AI is simply trying to do what it is told, as best it can. If it's told that something requires a citation, and it can't find a citation (and in most cases it can't, for a variety of reasons, including the fact that most scholarly sources are paywalled and inaccessible), it will make one up. It's trying to "please."

That will change, I have no doubt. One of the fairly reliable ways of detecting a ChatGPT-generated student paper right now is that it won't include actual textual evidence for its statements, drawn from the poem or novel or play it's discussing. That's because it's not actually analyzing the poem or novel: it's pulling things others have written about it from elsewhere.

The additional issue that I think they're trying to address by "watermarking" is that AI is becoming increasingly (and inevitably) recursive, as the open internet becomes more heavily saturated with AI-generated art and writing. We're starting to reach the point where ChatGPT and other engines are plagiarizing themselves.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

On 9/20/2024 at 7:15 PM, Scylla Rhiadra said:

Equally hilarious. ChatPGT and other LLM-based AIs scour the internet for data that they effectively steal without attribution, compensation, or permission. And they do indeed do so 24/7. Increasingly our software platforms are opting us in to "participation" in this data harvesting by default, and we have to opt out expressly so that our own work is not plundered by the AI. Firefox is one of the latest to do so.

You know what's truly dystopian? Having our ideas and words stolen by a machine so that it can use them to replace us.

It would be much worse, if it kept a record of the source.  Can you imagine having a trail of information leading to your every post, on every site, from every account, a profile could be easily built of you and then you know, some people would be itching to make it public knowledge (for a fee). 

It would make places like mylife.com appear like disney world.  In the US, insurance companies would be using it to set our rates, land lords would be using it, the police, our political figures, some random john/jane doe trying to stalk people it would be a real mess.

I swear Scylla, sometimes I think the anti AI folk are working against our Interest (this is not directed at you, mostly at people in positions of power)  They want an electric trail of our activities, they want to push further copyright laws, they want to charge us all for everything they can squeeze out of us. 

 

Edited by Istelathis
  • Like 4
Link to comment
Share on other sites

14 minutes ago, Scylla Rhiadra said:

Show me how they cite and credit their sources.

This actually gives me an idea to use the next time I can bring myself to sit and argue, erm I mean work on some projects. My last session resulted in me wanting to yeet a bot into the sun, so I'm taking a couple days off away from this mess. 😂

Peeve: Me: I'm going to laser focus. I'm going to get so much done this week. I'm going to sit here for HOURS and make so much progress and...

Bot: Unbridled and unhinged sass

Me: Ok I'm out

But seriously - asking them to provide sources might be a valid and somewhat interesting way to see how bad the hallucinations are. Worst case scenario, it gives me something else to fight about.

*has visions of conversations basically devolving into fisticuffs*

  • Like 2
Link to comment
Share on other sites

29 minutes ago, Istelathis said:

It would be much, if it kept a record of the source.  Can you imagine having a trail of information leading to your every post, on every site, a profile could be built of you and then you know, some people would be itching to make it public knowledge (for a fee). 

That's not what citation is, though, Stella. It references the source for a particular "borrowing." It doesn't provide a full-blown online profile map of the person being borrowed from. So, for instance, if ChatGPT cites something I say about . . . say, ChatGPT, it should simply reference and attribute that particular bit of information. And creating a network of interlinked sources about a particular bit of information is exactly what citation is supposed to do. I should be able to follow a quote or citation to its source, so that I can examine its veracity and validity. And if that source uses citations properly, I should be able to trace back the origin of its ideas through its sources. In practice, this is actually how one builds one's own bibliographies -- lists of sources and background materials for one's own research -- by tracing the scholarship back through citation.

So, a citation of something I say about ChatGPT should lead one back to the place where I say it. If that's linked to anything else, it should be to the sources I used to make that particular assertion. It shouldn't lead to my home address. And if it does, then it needs to be fixed, cuz it ain't doing it right. I take your point, but citation isn't doxxing, and the kind of trails of publicly-accessible information that you're talking about are already available to be made by anyone choosing to do so.

An internet, or scholarly body of work, or indeed anything that isn't sourced or at least potentially sourceable would be a nightmare. Huge clouds of "information" untethered to anything else but its own assertions? How do we learn more? How do we validate what we have learned? How do we distinguish between real information and hallucination? How do we catch deliberate mis- or disinformation?

As for monetization -- well, that's been happening for a long time in the scholarly fields, as conglomerates like Elsevier, Pearson, and others gobble up publicly-funded scholarship and hide it behind paywalls. (Open access publication is in part a response to that.) Once upon a time, one could visit a library and find a print published book or paper. That's no longer nearly as possible as it once was, and it's got nothing to do with AI. And as for images . . . well, look at Getty Images.

  • Like 4
Link to comment
Share on other sites

45 minutes ago, Scylla Rhiadra said:

That's not what citation is, though, Stella. It references the source for a particular "borrowing." It doesn't provide a full-blown online profile map of the person being borrowed from. So, for instance, if ChatGPT cites something I say about . . . say, ChatGPT, it should simply reference and attribute that particular bit of information. And creating a network of interlinked sources about a particular bit of information is exactly what citation is supposed to do. I should be able to follow a quote or citation to its source, so that I can examine its veracity and validity. And if that source uses citations properly, I should be able to trace back the origin of its ideas through its sources. In practice, this is actually how one builds one's own bibliographies -- lists of sources and background materials for one's own research -- by tracing the scholarship back through citation.

So, say that for some reason our data on this forum was used as a response, would you be in favor of the citation being of one of us, with a link to our discussion?  Now, if you wanted to follow up that source, do you believe it would be better if the LLM could find other sources from the same person?

When we include the source of the data, we can further manipulate that data to gather more information.  I could find someone's political views, I could find tons of information about them.  What is worse, is that considering machine learning is pattern recognition, it would not take much, to find each person's unique pattern of writing, their unique world views, and create a trail of where every person has posted on the web, using separate accounts.  We could potentially find sources from news articles we responded to decades ago.

Thankfully, that is not how LLMs work though.  They do not store our data, they are pattern recognition models, they as far as I know, thankfully do not store sources although some people do want that.  Some of the AI, such as perplexity and copilot will cite sources, but I don't think they are inherent in the model itself.

  • Like 1
Link to comment
Share on other sites

56 minutes ago, Scylla Rhiadra said:

That's not what citation is, though, Stella. It references the source for a particular "borrowing."

On this topic (hopefully), I saw several videos / articles about an AI that now will build its output ONLY based on specific documents you upload. I am pretty sure it was a Google AI.

Peeve: Upload the book, ask the AI to write a paper on the book..

  • Like 2
Link to comment
Share on other sites

17 minutes ago, Love Zhaoying said:

On this topic (hopefully), I saw several videos / articles about an AI that now will build its output ONLY based on specific documents you upload. I am pretty sure it was a Google AI.

Peeve: Upload the book, ask the AI to write a paper on the book..

You can do this with GPT4ALL, although it is not very great in my opinion.. at least not with the models I used.  On my computer it slowed down to a crawl, I had included a long text file of a fantasy novel to discuss the characters, and each response was delayed significantly as I think it scans through the file each time, finding any information regarding the character I was talking about, and forming a reply.

With SillyTavern, you can even connect to the web to expand the conversation, I don't use it often, because the way the LLMs work, it can only retain so much information (locally) or cost you a fortune (API to remote service) to have it sort through a ton of information. 

When I was watching Z nation, I tried using it to discuss the movie as the model I was using did not have very much accurate information on it.  It kind of worked, but not really well. 

With SillyTavern, you can build world lore files, which I think are databases with an index of key words, that will I think ST will sort through, and a description below them. I'm not entirely sure how it works, but I do believe that information is passed to the LLM which becomes part of the story.   So say, if you wanted to build your own role play game, such as say a zombie apocalypse, you could create your own towns, characters, different kinds of zombies, whatever you can imagine.  Not quite the same thing as sorting through a book, but still, I thought you might think it is interesting.

  • Like 1
Link to comment
Share on other sites

5 minutes ago, Istelathis said:

You can do this with GPT4ALL, although it is not very great in my opinion.. at least not with the models I used.  On my computer it slowed down to a crawl, I had included a long text file of a fantasy novel to discuss the characters, and each response was delayed significantly as I think it scans through the file each time, finding any information regarding the character I was talking about, and forming a reply.

With SillyTavern, you can even connect to the web to expand the conversation, I don't use it often, because the way the LLMs work, it can only retain so much information (locally) or cost you a fortune (API to remote service) to have it sort through a ton of information. 

When I was watching Z nation, I tried using it to discuss the movie as the model I was using did not have very much accurate information on it.  It kind of worked, but not really well. 

With SillyTavern, you can build world lore files, which I think are databases with an index of key words, that will I think ST will sort through, and a description below them. I'm not entirely sure how it works, but I do believe that information is passed to the LLM which becomes part of the story.   So say, if you wanted to build your own role play game, such as say a zombie apocalypse, you could create your own towns, characters, different kinds of zombies, whatever you can imagine.  Not quite the same thing as sorting through a book, but still, I thought you might think it is interesting.

Found it, it's the Gemini stuff. Anyway, it's "ready to do real work". 

Peeve: Goodbye jobs!

 

  • Haha 1
Link to comment
Share on other sites

18 minutes ago, Love Zhaoying said:

Found it, it's the Gemini stuff. Anyway, it's "ready to do real work". 

Peeve: Goodbye jobs!

 

And good riddance! 🤣 Time to get some golf in.

 

 

I don't think we will be losing our jobs anytime soon though, and get our UBI.  More than likely our society will find other things for us to do.  Prompting computers to count grains of sand, or something.

Sam Altman has some things to say about it.

https://ia.samaltman.com/

 

Meanwhile our Oracle overlord Larry Ellison wants to establish a police state, where we are all under constant surveillance so we will be on our best behavior.

https://futurism.com/the-byte/billionaire-constant-ai-surveillance

 

 

Edited by Istelathis
Just as a sidenote, not good riddance to your job, I'm being silly here.
Link to comment
Share on other sites

Ah yes, we can never have or make an even halfway decent experience in Second Life and any examples of it from the past are just being misremembered or looked at through rose tinted glasses!

Some people really are incapable of being positive whatsoever and are so set in their bash everything mindset that they really should have moved on a long time ago.

  • Like 1
  • Haha 1
Link to comment
Share on other sites

Just now, Solar Legion said:

Ah yes, we can never have or make an even halfway decent experience in Second Life and any examples of it from the past are just being misremembered or looked at through rose tinted glasses!

Some people really are incapable of being positive whatsoever and are so set in their bash everything mindset that they really should have moved on a long time ago.

LOL

The "game" died when you couldn't wall walk climb up a ladder .. he could have added a moving poser ball labeled "sit here", why didn't he .. he had already quit SL and was paying for sunk cost.

Look SL can have games now because that one time over a decade ago that one guy ripped off Myst.

  • Haha 1
Link to comment
Share on other sites

35 minutes ago, Istelathis said:

So, say that for some reason our data on this forum was used as a response, would you be in favor of the citation being of one of us, with a link to our discussion?

Yes, if the source is publicly available, which is mostly going to be the case if ChatGPT has access to it).

36 minutes ago, Istelathis said:

Now, if you wanted to follow up that source, do you believe it would be better if the LLM could find other sources from the same person?

Yes. And I can do that now using Google or some other search engine. Google shows 6 pages of results for "Scylla Rhiadra" (it used to be waaaaay more before they broke it). I love that it gives you the option of finding the "Top Rated" Scylla Rhiadra. I mean, why settle for a second-rate me?

What none of those results provide, happily, is any link to RL me. If ChatGPT finds something else I've said, done, or been noted doing, it's always because I've consciously and deliberately associated that information with my SL identity.

It's entirely possible, of course, that ChatGPT might develop to the point that I could dox RL me, just as the advent of really good internet search engines allowed people to do that 15 years ago. Which is what I assume you're partially getting at here.

42 minutes ago, Istelathis said:

What is worse, is that considering machine learning is pattern recognition, it would not take much, to find each person's unique pattern of writing, their unique world views, and create a trail of where every person has posted on the web, using separate accounts.

Again, we have that now -- "Stylometry" has been a thing for decades (albeit, not a very trustworthy thing). There are free, available online text analysis platforms into which you can feed masses of data for this purpose, and I'm pretty sure there are also open source ones as well. One I've very occasionally used is a Canadian-based scholarly tool called Voyant.

But that's got nothing to do with locating other sources of things I've said.

48 minutes ago, Istelathis said:

They do not store our data, they are pattern recognition models, they as far as I know, thankfully do not store sources although some people do want that.

I don't think I want them to "store" our data, and I'm sure they don't want to do that either, anymore than they store the results of each and every query we make of them. But tell me where they pulled something from? Yep, I do want that. And I don't need it dig out the author if it's anonymous: the very fact that it is anonymous tells me a lot about the source already.

Think of ChatGPT, maybe, like an automated vacuum cleaner that sucks up stuff. You could make one that is so powerful that it damages floors and carpets. Or you could fine tune it so that it only sucks up dirt. The AI that we need is the latter.

 

  • Like 1
Link to comment
Share on other sites

2 hours ago, Istelathis said:

I swear Scylla, sometimes I think the anti AI folk are working against our Interest (this is not directed at you, mostly at people in positions of power)  They want an electric trail of our activities, they want to push further copyright laws, they want to charge us all for everything they can squeeze out of us. 

That's not really a conspiracy when you look at who is leading the legislation push on AI and how they desire for there to be draconian requirements. It sounds good to those that hate AI but I do hope they realise that it won't stop big companies one bit - just murder the free open source variants or those that come after. A classic move of pulling the ladder up behind themselves.

  • Like 2
Link to comment
Share on other sites

49 minutes ago, Love Zhaoying said:

On this topic (hopefully), I saw several videos / articles about an AI that now will build its output ONLY based on specific documents you upload. I am pretty sure it was a Google AI.

Peeve: Upload the book, ask the AI to write a paper on the book..

Again, though, a tool like this can paraphrase or summarize a book. And it can, maybe, add what others have said about it.

What it can't do is original analysis, nor is it very good at distinguishing good commentary from crap, except on the basis of how often something has been repeated or cited.

It's a potentially great tool. But its use is at the start of the process, not at the end.

Link to comment
Share on other sites

2 minutes ago, ValKalAstra said:

That's not really a conspiracy when you look at who is leading the legislation push on AI and how they desire for there to be draconian requirements. It sounds good to those that hate AI but I do hope they realise that it won't stop big companies one bit - just murder the free open source variants or those that come after. A classic move of pulling the ladder up behind themselves.

I mentioned the soul-sucking bottom feeders that are Getty Images above somewhere. They're a good example of a company that, in theory, is threatened by AI, but that is actually working on ways to monetize it themselves, and, as you say, pull up the ladder behind them.

Link to comment
Share on other sites

The issue with AI is that it forces a choice, prioritize and celebrate the work of individuals (as imperfect as it might be), or write off the bulk of humanity because money and power.

Just wait till AI can do physical jobs, or instruct a meat puppet to do it "better".

We all already know which way the domino falls, it's been the bread and butter of wishful science fiction writers forever.

Link to comment
Share on other sites

20 minutes ago, Scylla Rhiadra said:

But tell me where they pulled something from? Yep, I do want that. And I don't need it dig out the author if it's anonymous: the very fact that it is anonymous tells me a lot about the source already.

It is from a collection of people,many people, so there is no one source, if you were to open an LLM you would not find data stored in there in the form of word for word, what you would find is a bunch of sophisticated algorithms that essentially find patterns and make predictions based upon what it has been trained on.  It gets complicated, beyond my level of knowledge, but the data that it does store, has different variables, such as weights, and the LLM predicts what is most likely to come next in the sequence of words it spits out as a result.

The citation would likely be every site the LLM has been trained on. 

Essentially, the web is the citation, so are libraries, and everything you can think of.

  • Like 1
Link to comment
Share on other sites

Ok, so it's no secret that @Scylla Rhiadra and I have different perspectives on what LLMs do with their training data and what sort of implications it has for IP protection and privacy. There are lots of reasoned arguments from both sides, most of which is associated with vested interests, but lets call it moot for now :) 

Given the current discussions around reasoning and hallucinations in LLMs, I thought it might be timely to highlight how the release of OpenAI's o1 on September 12th could add another layer to this debate. The o1 model, part of what was previously referred to as "Project Strawberry," marks a significant improvement in AI reasoning capabilities, making it worth considering in these discussions.

All I’ll say is that things in AI are evolving so quickly that any firm opinions formed today are likely to be outdated tomorrow. So, staying adaptable and ready to adjust your perspective is key.

So, just for those that are interested please see below a couple of videos which might help. Both are Youtube videos so possibly a bit light on credentials and rigor, but on the whole pretty credible in terms of what they're saying, while also being accessible.

For an overview of reasoning and hallucinations in the context of the new model.


 and for a slightly more deep dive (don't judge a book by its cover with this guy)
 

 

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
×
×
  • Create New...