TomTom and BentoML are advancing location-based AI together
Co-authored by Chaoyu Yang, Founder & CEO @ BentoML and Sherlock Xu, Informational Architect @ BentoML
BentoML is a unified AI application framework for building reliable, scalable, and cost-efficient AI applications. It provides an end-to-end solution for streamlining the deployment process, incorporating everything users need for model serving, application packaging, and production deployment. BentoML simplifies the transition from a machine learning model to a fully operational AI service, making it a comprehensive tool for modern AI-driven solutions.
Advancing Generative AI with location technology
Over the past year, the AI industry has witnessed significant advancements in Generative AI (GenAI) technologies. Large Language Models (LLMs) like GPT-4 and open-source alternatives such as Llama2 have made building AI apps more accessible and user-friendly, reducing the need to invest years into developing deep ML skills. This transformation has caught the attention of not only AI experts but also business professionals, emphasizing the unique aspects and potential of using LLMs in contrast to traditional ML projects.
Amid these trends, TomTom has actively engaged with AI advancements. As a company deeply rooted in maps and navigation technologies, its data-centric nature positions it well for AI and ML innovations. This aligns with our mission to provide global real-time maps and navigation services. Specifically, TomTom’s strategic response to these developments has included democratizing innovation across teams, prioritizing impactful projects, and establishing a GenAI center of excellence. In addition, we’re working with academia and startups for early R&D and cloud providers on foundational models and infrastructure. These have helped us quickly get started and experiment with the latest AI technologies, ensuring TomTom stays at the forefront of the evolving tech landscape.
One of TomTom’s early partners was BentoML, which provides a unified AI application framework that’s helped the mapmaker get experiments off the ground, especially in model serving and deployment. With its end-to-end solution for streamlining the deployment process, BentoML simplifies the transition from a machine learning model to a fully operational AI service, making it a comprehensive tool for modern AI-driven solutions.
TomTom and BentoML: A Strategic Partnership
Working on AI experiments with BentoML has helped TomTom maintain focus on its core competency in maps and navigation services, while also trying out the latest AI technologies speedily. TomTom selected BentoML as a partner for rapid experimentation and innovation due to the following principal reasons:
- LLM serving and deployment: BentoML’s strong capabilities in serving and deploying LLMs made it an ideal choice, providing faster, easier and more efficient model inference. OpenLLM, an important component in the BentoML ecosystem, offers a high-performing and user-friendly solution to LLM deployment, with advanced features such as continuous batching and token streaming.
- Accelerated AI application development: BentoML’s framework facilitates the rapid development of AI initiatives. Developers can quickly get started with any model or framework using BentoML, which allows for seamless integration and efficient composition of various models into a cohesive service.
- Community and industry adoption: A thriving open-source community backing BentoML and its widespread adoption in the industry provides a reliable and tested foundation for TomTom’s AI endeavors.
For BentoML, the partnership with TomTom represents a significant opportunity. Working with a global leader in navigation technologies validates BentoML’s strong capabilities in LLM serving and deployment in production. It allows the BentoML team to refine the project further for real-world LLM use cases.
How BentoML Helps TomTom
TomTom’s exploration into LLM-powered services involves addressing various challenges. While starting with Azure OpenAI APIs is fantastic for quickly getting a prototype out the door, you’ll probably want more control to improve upon your prototype.
For example, while LLMs can effectively perform tasks like data classification, scaling these services requires more cost-effective and efficient strategies. So, experimenting with different, possibly open-source models, replacing large models with smaller ones and applying optimizations such as mini-batching techniques become essential.
In some cases, our approach to improving LLM-powered apps emphasizes using different models for specific tasks. By analyzing the problematic areas of its system — whether in quality, latency or cost — we can replace parts with more suitable alternatives, streamlining its application. However, this requires a glue solution for coordinating the pipeline.
This is where BentoML comes in. It provides a straightforward way to integrate various models, simplifying the process of model composition and inference. BentoML, particularly with its serverless platform BentoCloud, acts as a cohesive agent in AI apps, enabling developers to focus on core functionalities without getting bogged down by extensive microservice architecture and complex infrastructure.
Experimenting with positive outcomes
The collaboration between TomTom and BentoML has yielded fruitful results in the following aspects:
- Performance improvements: In one experiment, we realized a significant reduction in both latency and cost, averaging a ~50% decrease in each, with on-par quality.
- Advanced inference capabilities: Fast, efficient inference with cutting-edge open-source models for enhancing TomTom’s AI-driven services.
- Rapid experimentation and scaling: We’ve been able to swiftly test new AI services based on the latest models, with the option to scale them up rapidly.
- Infrastructure and expertise: The partnership has been majorly helpful in infrastructure setup, glue code writing and model selection. In a time where tech moves so quickly, the value of such a relationship cannot be underestimated.