Hello Folks,
It’s been quite some time (almost 3 months) since my last blog post. But finally, I’m back, and let’s get started! Moving forward, my blogs will primarily focus on interesting research papers in the LLM and GenAI space. I’ll be discussing problem statements that I encounter in my day-to-day life, in what we like to call “story time,” as many of you might remember from my past blogs. This will be followed by a deep dive into the technical aspects of those problem statements. In addition to explaining the research papers, I’ll share experiences and practical examples, and I’ll also elaborate on technical details that the papers might skip, assuming the reader already knows them. So, let’s dive in!
Just a few days ago, one of my family friends visited our place. They have a lovely 8-year-old daughter. It was August 15th, India’s Independence Day, and her school had given her an assignment to write an essay on Independence Day with a strict requirement of “at least 10000 words”. Now that’s really a lot! I really don’t know if I should call this an essay or a mini-book for an 8-year-old child! As usual, the parents started drafting it on behalf of their child. The first thing that comes to everyone’s mind is ChatGPT or something similar. At first, the parents were very relaxed and thought, “Let’s start drafting this on August 14th, just a day before, since it’s just a matter of ‘prompting the LLM model’ and getting the output.” On the night of August 14th, they did just that, but any guesses what happened? The model, though it gave a good output, struggled to maintain the following: Relevance, Accuracy, Coherence, Clarity, Breadth and Depth, and Reading Experience. Additionally, when the model is asked to output strictly 10k words, it repeats the context and significantly goes out of context.
Now, you all might be wondering, what are these six dimensions? For that, let’s continue with the further reading and dive into the problem statement of “limitations of current long-context large language models (LLMs) in generating ultra-long outputs.” In this blog, we’ll explore an interesting research paper titled “LONGWRITER: UNLEASHING 10,000+ WORD GENERATION FROM LONG CONTEXT LLMS.” Even though these models can process inputs up to 100,000 tokens, they typically struggle to produce outputs longer than 2,000 words. The primary reason for this limitation is attributed to the supervised fine-tuning (SFT) datasets, which lack examples of long outputs, capping the models’ ability to generate extended text. So in this blog, let’s understand the intriguing technique the authors have used to improve long output responses and make sure parents’ lives become easier in the future! And what about the kids? These days, I leave that up to their destiny with the advancements in AI and the way life has become easier for them with limited use of their mental capabilities! Anyway, let’s get started.
Introduction
Now, let’s get into the borderline understanding of the paper. It kicks off by highlighting an interesting challenge with long context LLMs. These models, which can process over 100,000 tokens of input, still struggle to generate outputs longer than 2,000 words. This is a significant issue because, in some cases, more than 1% of user requests actually need longer responses.
The core problem? The supervised fine-tuning (SFT) datasets that train these models just don’t include enough examples of long outputs. So, even though the models are capable of handling long inputs, they haven’t been trained to produce long outputs effectively. This limitation has stuck around because many LLMs rely on these same datasets.
To tackle this, the authors introduce AgentWrite — a new approach that helps these models generate longer texts by breaking down the task into smaller parts. This method can push output lengths up to 20,000 words, far beyond what’s usually possible.
The paper also brings in LongWriter-6k and LongBench-Write, a dataset and benchmark created to train and test models on their ability to generate these ultra-long texts. The idea is to push the boundaries of what LLMs can do, making them more capable of handling tasks that require extended output.
Now let’s understand what is Agentwrite and how it works:
Step I: Plan
First things first, AgentWrite starts with a plan — just like how you’d outline an article before diving into writing. The model creates a detailed outline based on the given instructions, laying out the main content and specifying word counts for each section. Think of it as the model’s roadmap. For instance, if tasked with writing a 30,000-word piece on the Roman Empire, the plan might look something like this:
Paragraph 1: Introduction to the origins of the Roman Empire (700 words)
Paragraph 2: Founding of the Roman Empire (800 words)
…
Paragraph 15: Summary of the Roman Empire’s history (500 words)
This structured approach ensures the model knows exactly where it’s headed, making it easier to manage the task of generating lengthy outputs. Here you can look below how the author’s structure the input: