• drspod@lemmy.ml
    link
    fedilink
    arrow-up
    7
    ·
    5 days ago

    So imagine the language model can produce grammatically correct and semantically meaningful dolphin language, how does it translate that to a human language?

    The reason LLMs can do this for human languages is that we have an enormous corpus of Rosetta stones for every language that allow the model to correlate concepts in each language. The training data for human to dolphin is going to be just these “behavioural notes.”

    So the outcome is that the bullshitting machine will bullshit the scientists that it knows what they’re saying when it’s actually just making stuff up.

    It’s a big problem with LLMs that they very rarely answer, “I don’t know.”

    • jarfil@beehaw.org
      link
      fedilink
      arrow-up
      1
      ·
      4 days ago

      LLMs use a tokenizer stage to convert input data into NN inputs, then a de-tokenizer at the output.

      Those tokens are not limited to “human language”, they can as well be positions, orientations, directions, movements, etc. “Body language”, or the flight pattern of a bee, are as tokenizable as any other input data.

      The concepts a dolphin language may have, no matter what they are, could then be described in a human language, and/or matched to human words for the same description.