Interview with Xinwei Song: strategic interactions in networked multi-agent systems

In this interview series, we’re meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. We hear from Xinwei Song about the two main research threads she’s worked on so far, plans to expand her investigations, and what inspired her to study AI.
Could you start with a quick introduction – where are you studying, and what is the topic of your research?
I’m Xinwei Song, a second-year PhD student in a joint program of ShanghaiTech University (Shanghai, China) and the Beijing Institute for General Artificial Intelligence (BIGAI, Beijing, China). My research primarily focuses on strategic interactions in networked multi-agent systems.
Could you give us an overview of the research you’ve carried out so far during your PhD?
My research to date consists of two main threads, which complement each other in exploring strategic interactions from different perspectives.
The first thread is rooted in algorithmic game theory, an interdisciplinary field of computer science and economics. In game theory, strategic participants follow game rules while acting to maximize their own payoff, for instance, in auction, an agent may misreport her actual valuation for the item to have a cheaper deal; and in item exchange based on preferences, an agent may misreport her preference on the available goods on get a more preferred allocation. My research, however, takes a reverse approach by focusing on mechanism design: we aim to design rules for a group of strategic and rational (maximizing individual payoff) players such that they cannot manipulate the system, and only truthful information reporting can lead to an optimal outcome. Specifically, we focus on the housing market—a house exchange problem where each agent owns a house and has preferences for others’ houses. Without a well-designed exchange rule, agents may misreport their preferences to secure better housing. We also account for a real-world reality: not all participants enter the matching market initially, so a diffusion process on social networks expands the game’s scope. Through theoretical analysis, we proved that this additional freedom of action invalidates previously proposed algorithms. We then established new theoretical boundaries for this setting and designed strategy-proof algorithms. Additionally, we extended this work to two-sided matching problems, with stable marriages as a typical example.
The second thread focuses on fully rational agents with discrete action spaces, where non-learning methods are effective. However, in real-world scenarios, not all participants are perfectly rational; they often learn game rules and optimal behaviors gradually. This naturally led me to Multi-Agent Reinforcement Learning (MARL), where each agent uses reinforcement learning algorithms to optimize its cumulative reward. A common challenge in MARL is that agents easily learn myopic policies—especially in mixed-motive games and social dilemmas (e.g., the tragedy of the commons), where groups of RL agents often fall into mutual defection or exploitation, resulting in near-zero rewards for all. Existing work typically adds intrinsic rewards to reshape the payoff matrix, making mutual cooperation easier to learn. I wanted to push this further by removing such reward shaping, aiming for more generalizable results that can adapt to multiple scenarios. To achieve this, I drew inspiration from human social intelligence—specifically, reputation and indirect reciprocity—as an incentive mechanism for MARL groups. Our work integrates module design, including gossip-based reputation learning, interaction-based reputation updating, and reputation-based policy learning, to inject reputation awareness into AI agents. We then conduct end-to-end training to co-learn these modules, enabling agents to cooperate without relying on artificial reward shaping.
Is there an aspect of your research that has been particularly interesting?
What I find most interesting is my focus on networked systems, where agents not only take actions but also diffuse information through social networks. This adds an extra strategic dimension that presents unique challenges: for mechanism design, it becomes harder to design incentives that motivate agents to provide truthful information about their social relationships; for MARL tasks, the expanded action space makes the learning process more prone to non-stationary environments, making it difficult to converge to a globally optimal policy. Navigating these challenges—balancing information diffusion with strategic behavior—has been both intellectually stimulating and rewarding.
What are your plans for building on your research so far during the PhD – what aspects will you be investigating next?
Moving forward, I plan to explore intersections between my current research and human-AI interaction, or incorporate LLM-based agents. For example, I aim to design incentives for LLM agents to enhance their consistency, enabling them to better assist humans in decision-making. Alternatively, I want to use LLM agents as incentive carriers—when they interact with humans, they can diffuse prosocial incentives to promote better group outcomes.
I feel that LLMs are currently the focus in the AI field. For research not centered on LLMs, it requires more effort to stand out and gain recognition. While LLM-related technical trends are exciting, they also raise questions: whether to shift research interests to follow these trends, and whether sticking to one’s original focus is a sign of stubbornness or personal research integrity. It’s a nuanced decision, and I’m still navigating how to balance innovation in my core area with engagement with emerging trends.
How was the AAAI Doctoral Consortium, and the AAAI conference experience in general?
I enjoyed it! Attending conferences allows me to connect with people from diverse cultures and countries who share similar research interests. Additionally, traveling to new places for conferences is a wonderful perk, serving as a “reward” amid the challenges of research, which often involves repeated feedback and rejections. The AAAI Doctoral Consortium was particularly well-organized: the committee was very supportive and arranged various events, including a welcome dinner, invited talks, and panel discussions on various topics including career choices. It was invaluable to meet other PhD students in similar stages of their journey, as we could share our concerns, experiences, and aspirations.
What made you want to study AI?
My motivation to study AI stems from a curiosity about several interconnected questions: How can we enable systems (and humans) to make wise decisions? How do strategic interactions work, and how can we guide or nudge groups toward better social outcomes? How can autonomous agents cope with complex, real-world scenarios with implicit social semantics? And finally, how can we account for the irrational aspects of human behavior when designing AI systems? These questions drive my research and my desire to advance AI in ways that are both technically rigorous and socially impactful.
What do you enjoy doing outside of the PhD?
Outside of AI, I enjoy travelling and observing birds. It’s a relaxing hobby that allows me to step back from research and appreciate the natural world, which often inspires fresh perspectives. It also helps me to focus on the present and enjoy daily life. Occasionally, I also read popular science books related to behavioral economics and animal behavior, which subtly inform my research on strategic interactions and human irrationality.
About Xinwei

Xinwei Song is a second-year PhD student in the joint program of ShanghaiTech University and the Beijing Institute for General Artificial Intelligence (BIGAI), supervised by Professor Dengji Zhao and Researcher Xue Feng. Her research focuses on strategic interactions in networked multi-agent systems via algorithmic game theory and multi-agent reinforcement learning. For matching markets, she studies strategy-proof mechanisms under network diffusion, where agents may strategically manipulate both their preferences and the market size, posing additional challenges for the design of incentive-compatible mechanisms. She also investigates how reputation awareness can encourage robust cooperation among RL agents in sequential social dilemmas, without relying on extra reward shaping.