arXiv:2510.13586v1 Announce Type: cross
Abstract: The emergence of large language models (LLMs) has opened new opportunities for cre- ating dynamic non-player characters (NPCs) in gaming environments, enabling both func- tional task execution and persona-consistent dialogue generation. In this paper, we (Tu_Character_lab) report our participation in the Commonsense Persona-Grounded Dialogue Challenge (CPDC) 2025 Round 2, which eval- uates agents across three tracks: task-oriented dialogue, context-aware dialogue, and their integration. Our approach combines two complementary strategies: (i) lightweight prompting techniques in the API track, including a Deflanderization prompting method to suppress excessive role-play and improve task fidelity, and (ii) fine-tuned large models in the GPU track, leveraging Qwen3-14B with supervisedfinetuning (SFT) and Low-Rank Adaptation(LoRA). Our best submissions ranked 2nd on Task 1, 2nd on Task 3 (API track), and 4th on Task 3 (GPU track). Read More
BC
October 17, 2025The “Deflanderization” concept is actually pretty clever – it tackles a real issue I’ve seen when testing LLMs for game dialogue. When you assign a character a specific personality, it often goes overboard. For example, if you tell it to be a “gruff warrior,” suddenly every response is “ARRGH, BY MY BLADE!” even when they’re just giving directions to the tavern. It gets old fast and breaks immersion.
The dual approach makes sense from a practical perspective. The API method with lightweight prompting is what most indie developers would actually use – not everyone can afford GPUs for fine-tuning. When testing different prompt strategies locally, I’ve found that explicitly telling models to “tone down” certain personality traits works better than expected. For instance, “be helpful first, personality second” can fix many overacting issues.
The fine-tuning results with Qwen3-14B are intriguing. That’s a model size that can practically run on consumer hardware (barely – you need 24GB VRAM for decent performance). The fact they achieved competitive results indicates you don’t need massive 70B+ models for game NPCs. In my experience, 14B models strike a good balance — they are coherent enough for dialogue while still small enough to run locally without costing a fortune.
Their 2nd and 4th place finishes reveal the real challenge — balancing “stay in character” with “actually help the player complete the quest” is tricky. I’ve seen NPCs get so caught up in their backstory that they forget to give the player the key item they need. The “deflanderization” approach seems like a step in the right direction, but there’s clearly room for more improvement.