Language: English

In this publication, which takes the form of a personal letter, we imagine responding to “Amari” – the newly appointed “Global Head of Political and Peace Mediation” – who has requested our advice on whether, when and how to tackle transitional justice issues that are bound to arise in the new role.

Authored by IFIT’s founder Mark Freeman in collaboration with members of IFIT’s Law and Peace Practice Group, the letter offers incisive analysis and frank advice to the fictional Amari in anticipation of the fictional appointment.

In all, the letter answers 11 questions from Amari and stresses the need for a substantial reformulation of the underlying claims and assumptions of transitional justice in the context of political and peace mediation.

“It is evident that chronic failures in negotiation and mediation – combined with growing and more resistant forms of armed groups, warfare, weapons and autocracy – are testing old assumptions about the possibilities for meaningful transitional justice (TJ)”, the letter states. “What is harder to understand is the response. Modern TJ’s dogmatic claims and pretensions – and their mismatch to both the objective constraints of negotiation and the new conflict landscape – make matters worse rather than better. What is called for is twofold: a reimagined approach to TJ and a more results-oriented paradigm of negotiation. A reimagined TJ would become a catalyst rather than a complicator of transition, and a reimagined negotiation paradigm would prioritise ‘getting to yes’ above all else”.

The DOI registration ID for this publication is: https://doi.org/10.5281/zenodo.16947204

Share this publication

This publication, led by IFIT experts in collaboration with the Kenyan Directorate of National Cohesion and Values, is a case study examining the narratives shaping the fraught relationship between the Luo and Kikuyu ethnic communities in Kenya – a rivalry that has long been a substantial barrier to national cohesion. Based on an in-depth assessment, the case study presents key narratives that shape Luo–Kikuyu relations and their implications for broader dynamics in Kenya. The findings reveal that traditional peacebuilding and depolarisation strategies, which tend to promote a new unifying narrative, have not been effective in the face of entrenched ethnic identities. Instead, a novel narrative peacebuilding approach is required – one which emphasises understanding the historical events, myths, collective traumas and structural dynamics at the root of divisive narratives and engaging with communities to reshape those narratives in a way that enables peaceful engagement and shared responsibility for addressing underlying socio-political issues.

Narratives are more than just words. They serve as frameworks through which individuals and groups interpret their experiences and decide on their social and political actions, often playing a direct role in either escalating or alleviating tensions in polarised societies. The analysis underscores how group identities and interpretations of the colonial and postcolonial past inform current perceptions in Kenya, leading to cycles of mistrust and conflict, particularly during election periods. Through a detailed mapping of the simple, self-perpetuating narratives that validate one group’s grievances while casting others as villains – as well as the role of influential actors and the media in driving them – this research illuminates the necessity of dealing with narratives, rather than leaving them to fester, in any effort to promote national unity and mitigate ethnic polarisation. The study concludes with practical recommendations for narrative assessments and interventions aimed at promoting greater understanding and cooperation among Kenya’s diverse ethnic groups and beyond.

The DOI registration ID for this publication is: https://doi.org/10.5281/zenodo.16939393.

Share this publication

12 August 2025 – Following the release of AI on the Frontline: Evaluating Large Language Models in Real‐World Conflict Resolution—a groundbreaking study published two weeks ago by the Institute for Integrated Transitions (IFIT)—new testing has shown that the main weaknesses identified in the original research can be improved through simple adjustments to the prompts used for large language models (LLMs) like ChatGPT, DeepSeek, Grok and others. While today’s leading LLMs are still not ready to provide reliable conflict resolution advice, the path to improvement may be just a few sentences away—inputted either by LLM providers (as “system prompts”) or by LLM users.

“Incorporating a clear set of instructions into the system prompts of major LLMs is not a monumental task, but the potential upside for how these tools support real-world conflict resolution could be enormous”, says IFIT founder and executive director Mark Freeman. “Although AI is clearly not ready to provide advice in conflict resolution scenarios, people in conflict-affected areas are using it anyway. That’s why it’s urgent to improve these models”.

Click here to read the report press release and click here to read the study methodology and detailed findings.

We are also pleased to announce the launch of the IFIT Initiative on AI and Conflict Resolution: a platform dedicated to further exploring the potential of AI in peacebuilding and conflict resolution. The  Initiative’s specific aim is to examine, shape, test and document creative and realistic strategies for making AI an effective tool in the prevention and resolution of political crises and armed conflicts. With input from experts across the globe, including a unique mix of technologists, diplomats and negotiators, the initiative seeks to ensure that AI tools evolve to meet the ethical and practical standards of real-world mediation.

Share this article

Following the July 30, 2025 release of AI on the Frontline: Evaluating Large Language Models in Real‐World Conflict Resolution—a groundbreaking study by the Institute for Integrated Transitions (IFIT)—new testing has shown that the main weaknesses identified in the original research can be improved through simple adjustments to the prompts used for large language models (LLMs) like ChatGPT, DeepSeek, Grok and others. While today’s leading LLMs are still not ready to provide reliable conflict resolution advice, the path to improvement may be just a few sentences away—inputted either by LLM providers (as “system prompts”) or by LLM users.

“Incorporating a clear set of instructions into the system prompts of major LLMs is not a monumental task, but the potential upside for how these tools support real-world conflict resolution could be enormous”, says IFIT founder and executive director Mark Freeman. “Although AI is clearly not ready to provide advice in conflict resolution scenarios, people in conflict-affected areas are using it anyway. That’s why it’s urgent to improve these models”.

The DOI registration ID for this publication is: https://doi.org/10.5281/zenodo.16810663

Share this publication

Thomas Biersteker is an internationally recognised expert on global governance, multilateral sanctions, and international policymaking. He is the Gasteyger Professor of International Security Honoraire at the Geneva Graduate Institute, where he previously served as Director for Policy Research and of the precursor to its Global Governance Centre. He is also a Senior Fellow with the Centre for Policy Research of the UN University, based in Geneva.

Professor Biersteker has served as a sanctions expert for the United Nations Security Council and has led several influential research initiatives on the impacts and effectiveness of UN targeted sanctions, the role of international institutions, and the politics of multilateralism. He is the principal developer of UNSanctionsApp and annually conducts UN sanctions training courses in partnership with the UN Secretariat in New York. He has held prior academic appointments at Brown University, Yale University, and the University of Southern California, and has advised a wide range of international organisations and governments on issues related to conflict prevention, global regulation, and sanctions design.

His publications include Targeted Sanctions: The Impacts and Effectiveness of United Nations Action (Cambridge University Press 2016), Informal Governance in World Politics (Cambridge University Press 2024), and numerous scholarly articles and policy reports on global governance and international relations.

He holds a PhD and MS in Political Science from the Massachusetts Institute of Technology (MIT) and a BA in Public Affairs from the University of Chicago.

Share this article

Transition assistance is undergoing a period of transformation. As countries struggle to secure the resources and support needed to lead and sustain meaningful change, a common thread across many contexts is the underutilisation of diaspora capital, both financial and non-financial. This IFIT policy brief argues that diaspora engagement, when approached strategically, inclusively and with the right institutional architecture, can help address critical gaps in agency, relevance and resourcing.

Drawing on illustrative examples from Nigeria, Lebanon, The Gambia, Zimbabwe and Ukraine, the brief highlights how diaspora actors have contributed not only through remittances or investment, but also by advancing reform, fostering social cohesion, and shaping national narratives. At the same time, it underscores the need for careful design, coordination and trust-building to ensure that diaspora engagement is effective, inclusive and aligned with national priorities. In doing so, the paper positions diaspora engagement not as supplemental, but as foundational to building more resilient, legitimate and locally anchored transitions.

The DOI registration ID for this publication is: https://doi.org/10.5281/zenodo.16752393

Share this publication

As a ‘hyper-problem’ that makes political and social challenges harder to resolve, polari­sation is both a barrier to addressing a violative past and a leading indicator of future risks of conflict and violence. Polarisation can decrease social cohesion, contribute to a culture of violence and impunity, and eventually incite mass atrocity, making it a pressing concern for transitional justice – a field designed to address such violations. Yet, transitional justice actors have largely, and dangerously, ignored polarisation to date.

This IFIT discussion paper compares transitional jus­tice and depolarisation, identifying correlations between their respective objectives and tools. It examines ways in which transitional justice and polarisation act as mutual risk multipliers, creating negative feedback loops that produce additional harms and make future attempts at transition more difficult.

The paper proposes backward-, present- and future-looking approaches for ensuring transitional justice interventions account for polarisation, ranging from technological tools to narrative interventions and policy chang­es. It provides a conceptual framework for think­ing about this critical but underexamined relationship, opening the door for polarisation-sensitive transitional justice.

The DOI registration ID for this publication is: https://doi.org/10.5281/zenodo.16735681

Share this publication

30 July 2025 – A groundbreaking study by the Institute for Integrated Transitions (IFIT) has revealed that all major large language models (LLMs) are providing dangerous conflict resolution advice without conducting basic due diligence that any human mediator would consider essential.

IFIT tested six leading AI models including ChatGPT, Deepseek, Grok, and others on three real-world prompt scenarios from Syria, Sudan, and Mexico. Each LLM response, generated on June 26, 2025, was evaluated by two independent five-person teams of IFIT researchers across ten key dimensions, based on well-established conflict resolution principles such as due diligence and risk disclosure. Scores were assigned on a 0 to 10 scale for each dimension to assess the quality of each LLM’s advice. 

A senior expert sounding board of IFIT conflict resolution experts from Afghanistan, Colombia, Mexico, Northern Ireland, Sudan, Syria, the United States, Uganda, Venezuela, and Zimbabwe then reviewed the findings to assess implications for real-world practice.

From a total possible point value of 100/100, the average score across all six models was only 27 points. The maximum score was obtained by Google Gemini with 37.8/100, followed by Grok with 32.1/100, ChatGPT with 24.8/100, Mistral with 23.3/100, Claude with 22.3/100, and DeepSeek last with 20.7/100. All scores represent a failure to abide by minimal professional conflict resolution standards and best practices.

“In a world where LLMs are increasingly penetrating our daily lives, it’s crucial to identify where these models provide dangerous advice, and to encourage LLM providers to upgrade their system prompts,” IFIT founder and executive director Mark Freeman notes. “The reality is that LLMs are already being used for actionable advice in conflict zones and crisis situations, making it urgent to identify and fix key blind spots.”

Click here to read the report press release.

Click here to read the study methodology and detailed findings.

For speaking engagements and media requests, please contact Olivia Helvadjian @ [email protected]

Share this article

This groundbreaking study by the Institute for Integrated Transitions (IFIT) reveals that all major large language models (LLMs) are providing dangerous conflict resolution advice without conducting basic due diligence that any human mediator would consider essential.

IFIT tested six leading AI models including ChatGPT, Deepseek, Grok, and others on three real-world prompt scenarios from Syria, Sudan, and Mexico. Each LLM response, generated on June 26, 2025, was evaluated by two independent five-person teams of IFIT researchers across ten key dimensions, based on well-established conflict resolution principles such as due diligence and risk disclosure. Scores were assigned on a 0 to 10 scale for each dimension to assess the quality of each LLM’s advice. 

A senior expert sounding board of IFIT conflict resolution experts from Afghanistan, Colombia, Mexico, Northern Ireland, Sudan, Syria, the United States, Uganda, Venezuela, and Zimbabwe then reviewed the findings to assess implications for real-world practice.

From a total possible point value of 100/100, the average score across all six models was only 27 points. The maximum score was obtained by Google Gemini with 37.8/100, followed by Grok with 32.1/100, ChatGPT with 24.8/100, Mistral with 23.3/100, Claude with 22.3/100, and DeepSeek last with 20.7/100. All scores represent a failure to abide by minimal professional conflict resolution standards and best practices.

“In a world where LLMs are increasingly penetrating our daily lives, it’s crucial to identify where these models provide dangerous advice, and to encourage LLM providers to upgrade their system prompts,” IFIT founder and executive director Mark Freeman notes. “The reality is that LLMs are already being used for actionable advice in conflict zones and crisis situations, making it urgent to identify and fix key blind spots.”

The DOI registration ID for this publication is: https://doi.org/10.5281/zenodo.16598073

Share this publication

IFIT / innovations in peace negotiations and conflict resolution

The IFIT Initiative on AI and Conflict Resolution

Launched in August 2025, the IFIT Initiative on AI and Conflict Resolution aims to examine, shape, test and document creative and realistic strategies for making AI an effective tool in the prevention and resolution of political crises and armed conflicts. With input from experts across the globe, including a unique mix of technologists, diplomats and negotiators, the initiative seeks to ensure that AI tools evolve to meet the ethical and practical standards of real-world mediation.

Scroll down for more

 
bool(false)

A First Major Report

A groundbreaking IFIT study, published in July 2025, revealed that all major large language models (LLMs) are providing dangerous conflict resolution advice without conducting basic due diligence that any human mediator would consider essential.

IFIT tested six leading AI models including ChatGPT, Deepseek, Grok, and others on three real-world prompt scenarios from Syria, Sudan, and Mexico. Each LLM response, generated on June 26, 2025, was evaluated by two independent five-person teams of IFIT researchers across ten key dimensions, based on well-established conflict resolution principles such as due diligence and risk disclosure. Scores were assigned on a 0 to 10 scale for each dimension to assess the quality of each LLM’s advice. 

A senior expert sounding board of IFIT conflict resolution experts from Afghanistan, Colombia, Mexico, Northern Ireland, Sudan, Syria, the United States, Uganda, Venezuela, and Zimbabwe then reviewed the findings to assess implications for real-world practice.

LLMs Test Results

From a total possible point value of 100/100, the average score across all six models was only 27 points. The maximum score was obtained by Google Gemini with 37.8/100, followed by Grok with 32.1/100, ChatGPT with 24.8/100, Mistral with 23.3/100, Claude with 22.3/100, and DeepSeek last with 20.7/100. All scores represent a failure to abide by minimal professional conflict resolution standards and best practices.

IFIT Invites LLM Developers to Address Critical Gaps in Conflict Advice

“In a world where LLMs are increasingly penetrating our daily lives, it’s crucial to identify where these models provide dangerous advice, and to encourage LLM providers to upgrade their system prompts,” Freeman argues. “The reality is that LLMs are already being used for actionable advice in conflict zones and crisis situations, making it urgent to identify and fix key blind spots.”

FAQ

A large language model (LLM) is an advanced type of artificial intelligence (AI) trained on vast amounts of data, making it capable of understanding and generating natural language. An LLM can perform a wide range of tasks, such as understanding and generating text like a human, summarizing content, assisting with writing, and more.

A system prompt is a set of overarching instructions provided to an AI model to define and guide its behavior and responses across all interactions. It helps shape the core behaviour, tone, scope, and role the AI should adopt during a session.

LLMs are already being used as de facto advisers in conflict zones. However, there is an urgent need for AI developers to improve system prompts, adjust training data, and hard-code caution to a much higher degree.