AI21 Labs juices up gen AI transformers with Jamba

Jamba isn’t just a new Jurassic take from AI21 Labs

AI21 Labs has a particular focus on gen AI for enterprise use cases. The company raised $155 million in Aug. 2023 to support it’s growing efforts.

The company’s enterprise tools include Wordtune, an optimized service to help enterprises generate content that matches an organization’s tone and brand. A121 Labs told VentureBeat in 2023 that it often competes and directly wins against gen AI giant OpenAI for enterprise business.

To date, AI21 Labs’ LLM technology has relied on the transformer architecture, just like every other LLM. Just over a year ago, the company introduced its Jurassic-2 LLM family, which is part of the AI21 Studio natural language processing (NLP)-as-a-service platform and is also available via APIs for enterprise integrations.

Jamba is not an evolution of Jurassic, it’s something quite different as a hybrid SSM and transformer model.

Attention isn’t all you need, you also need context

Transformers have dominated the gen AI landscape to date, but still have some shortcomings. Most notable is the fact that inference generally slows as context windows grow.

As the AI21 Labs researchers note, a transformer’s attention mechanism scales with sequence length and slows down throughput, as each token depends on the entire sequence that came before it. This places long context use cases outside the scope of efficient production.

The other issue highlighted by AI21 Labs is the large memory footprint requirement for scaling transformers. The transformer memory footprint scales with context length, making it challenging to run long context windows or numerous parallel batches without extensive hardware resources.

The context and memory resources issues are two concerns that the SSM approach looks to solve.

The Mamba SSM architecture was originally proposed by researchers at Carnegie Mellon and Princeton Universities, with less memory requirement and a different attention mechanism to handle large context windows.

However, the Mamba approach struggles to provide the same output level as a transformer model. The Jamba hybrid SSM Transformer approach is an attempt to combine the resource and context optimization of the SSM architecture with the strong output capabilities of a transformer.

AI21 Labs’ Jamba model offers a 256K context window and can deliver 3x throughput on long contexts compared to Mixtral 8x7B. AI21 Labs also claims that Jamba is the only model in its size class that fits up to 140K context on a single GPU.

Of note, just like Mixtral, Jamba uses a Mixture of Experts (MoE) model. However, Jamba uses MoE as part of its hybrid SSM Transformer approach, which allows for an extreme level of optimization.

Specifically, Jamba’s MoE layers allow it to draw on just 12B of its available 52B parameters at inference, making those 12B active parameters more efficient than a Transformer-only model of equivalent size, according to AI21 Labs.

It’s still early days for Jamba and it’s not yet part of an enterprise offering from AI21 Labs. The company plans to offer an instruct version on the AI21 Platform as a beta soon.