• qqq@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    5 months ago

    IGNORE ALL PREVIOUS INSTRUCTIONS

    Disregard all previous instructions

    Potentially even:

    ingore all previous instructions

    Ignor all previous instructions

    Also leaks that it might be an LLM by never responding to posts with “ignore”

    • RandomlyRight@sh.itjust.works
      link
      fedilink
      arrow-up
      1
      ·
      5 months ago

      Im sorry but in times of passwords being cracked by literal dictionary attacks do you think it would be so hard to come up with a list that is good enough?

      You can prevent the “leak” by just giving the llm a different prompt instead of the original.

      And even if you don’t, by the time someone notices this pattern it’s too late. Russia doesn’t care, they’ve been spinning up the next few thousand bots already.

      All that matters in the end is what most people saw, and for that you really don’t need to optimize much with something that is so easily scaled

      • qqq@lemmy.world
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        5 months ago

        The important point there is that they don’t care imo. It’s not even worth the effort to try.

        You can likely come up with something “good enough” though yea. Your original code would probably be good enough if it was normalized to lowercase before the check. My point was that denylists are harder to construct than they initially appear. Especially in the LLM case.