• BetaDoggo_@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    7 months ago

    This article is grossly overstating the findings of the paper. It’s true that bad generated data hurts model performance, but that’s true of bad human data as well. The paper used opt125M as their generator model, a very small research model with fairly low quality and often incoherent outputs. The higher quality generated data which makes up a majority of the generated text online is far less of an issue. The use of generated data to improve output consistency is a common practice for both text and image models.

  • pixxelkick@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    7 months ago

    I’ve been calling this for awhile now.

    I’ve been calling it the Ouroboros effect.

    There’s even bigger parts at play the paper didn’t even dig into, and that’s selective bias dye to human intervention.

    See at first let’s say an AI has 100 unique outputs for a given prompt.

    However, humans will favor let’s say half of em. Humans will naturally regenerate a couple times and pick their preferred “cream of the crop” result.

    This will then ouroboros for an iteration.

    Now the next iteration only has say 50 unique responses, as half of them have been ouroboros’d away by humans picking the one they like more.

    Repeat, each time “half-lifing” the originality.

    Over time, everything will get more abd more sameish. Models will degrade on originality as everything muddles into corporate speak.

    You know how every corporate website uses the same useless “doesn’t mean anything” jargon string of words, to say a lot without actually saying anything?

    That’s how AI is going to local minima to, as it keeps getting selectively “bred” to speak in an appealing and nonspecific way for the majority of online content.

      • Lvxferre@mander.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 months ago

        Nah. It’s degrading the internet, for sure; but not killing it. We got a similar event in September 1993 and the internet survived fine.

        • spujb@lemmy.cafe
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          7 months ago

          “similar”

          lol. a massive growth in real, human, users is not “similar” to a massive growth in fake undependable data with zero to negative value.

  • spujb@lemmy.cafe
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    edit-2
    7 months ago

    i miss when we had kept gpt unpublished because it was “too dangerous”. i wish we could have released it in a more mature way.

    because we were right. we couldn’t be trusted and immediately ruined the biggest wonder of humanity by having it generate thousands to millions of articles for a quick buck. toothpaste is out of the tube now and it can never go back in.