• vivendi@programming.dev
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      2
      ·
      1 day ago

      This is unironically a technique for catching LLM errors and also for speeding up generation.

      For example in speculative decoding or mixture of experts architectures these kind of setups are used.