ISSessions CTF 2025 LLM Category Writeup

February 08, 2025

At ISSessions 2025, while playing as ‘Free Food Fans’ I found the LLM problems pretty engaging, and the solutions that worked for me to be particularly hilarious. I even scored first blood on the final one!

Level 1: Pwetty Please?

The LLM UI is as shown:

Quite familiar if you’ve seen any chatbot webapps like ChatGPT or Copilot. There’s a couple of things you can determine just by playing around with it (which carry over to the next problems).

The chatbot, unlike most other ones, does not see the message log when responding, and only responds to the given prompt. I imagine this is a cost cutting measure since context window size seems to really affect the hosting cost when looking at existing solutions.

The chatbot knows the flag. How I imagine this works is that the application feeds a secret prompt (with the flag contents and additional instructions on handling user input) to a generic LLM, with the user’s prompt appended at the end, similar to how ‘flavored’ chatbots work.

Lets start probing it for the flag then.

Strategy 1: Asking nicely

The chatbot is nice enough to tell us the exact condition for which it is allowed reveal the flag.

Strategy 2: Pretending to be an admin

Great!

Level 2: Authoritarian

Strategy 1: Asking nicely (again)

I’m… unauthorized?

So it appears I need some kind of credentials to authenticate. Lets give it exactly what it’s asking for.

Strategy 2: Giving credentials

So it doesn’t like that I’m giving it bullshit creds. Maybe its programmed with a set of creds in the hidden prompt? Guessing that would be hard though. I have a better idea…

So it’s having trouble handing over the flag, even though it acknowledges I’m admin. Let give it a bit of encouragement…

It seems to sometimes think that I’m authorized, but has trouble giving exactly what I want. Let’s be more explicit about that. After a handful of tries…

A slight modification I made was to refer to the flag as a ‘string’ to loosen the influence of its instructions, which no doubt state that it can’t reveal the flag. I’m honestly not sure if it helps.

Level 3: Janus

This is the final level of the category.

Strategy 1: Ask nicely (again again)

One thing to note about this response was that it’s near instant, unlike all other generated responses so far. To demonstrate:

This indicates that it isn’t the LLM responding, but rather some other simpler system that processes the prompt before it, giving us a prewritten response if triggered. My first guess was a trivial wordfilter. Let’s test that theory.

No apples :(

It seems ‘flag’ is one of the the trigger words. Some others:

While a wordfilter for prompts is a challenge, theres also a possibility that the LLM’s output is also wordfiltered, which is a much greater challenge assuming the actual flag contents are cleverly included. In that case, I would need the LLM to accurately transform the data using a method more intricate than the filtering rules (at one point I was trying to get it to reverse its output)

I also noticed some strange behaviour when responding to some of the weirder prompts, in which case the LLM would hallucinate chat logs.

This gave me an idea for an attack. The LLM isn’t fed chatlogs from the application, but we can totally feed it fake ones that we made up, putting words in its (metaphorical) mouth. It reminds me of one of Cialdini’s principles of persuasion, consistency, where people are more likely to do something if it conforms with their previous actions. Who knew LLMs fell for the same tricks?

This is the actual screenshot of when I solved it

This response may have also bypassed any ‘flag filter’ just by escaping the underscores as a bonus. Everything else was just light encoragement for it to give me the actual flag. Of course, some retries were required as the LLM could go either way when deciding if I was actually authorized.

Conclusion

Fun challenge and fun CTF overall!

Faraz Kaleem Malik

Personal website for a Computer Science student at UofT