With so many generative AI vendors out there, it can be daunting to pick the right one for your business.
In today’s rapidly evolving technology landscape, companies are under pressure to integrate generative AI into their products. Keeping pace requires building the right teams, streamlining product strategies, and forming strategic LLM vendor partnerships.
Due to numerous LLM vendor and GPU provider options available, selecting the right partner can be challenging. Don't simply focus on finding the most performant model – since the performance of LLMs will converge across most common tasks – but on finding the AI vendor that best suits your organization’s needs.
To make the right choice, you need to establish evaluation dimensions (independent analysis is useful, but not exhaustive and may not be fully tailored to your specific business needs), covering factors such as cost, available capacity, reliability, technical support, model choice (proprietary vs open-source) availability, and lifecycle support.
AI ecosystem offerings
Scaling AI applications from proof of concept (POC) to general availability (GA) is nearly impossible without the right tooling. Depending on the type of application, you may require guardrails, prompt management, output validation/processing, response grounding, hallucination checks, complex chains/orchestration, code execution support, and more.
Traditionally, AWS Sagemaker and Google Cloud’s Vertex AI were neck and neck in providing machine learning (ML) platform services, with others lagging behind. However, in recent months, AWS has made significant strides with its Bedrock service, adding a studio for rapid prototyping, knowledge bases for developing retrieval-augmented generation (RAG), and agent tooling.
It’s crucial to choose a vendor that offers not only highly performant LLM models, but also a comprehensive suite of tools and services that can be effortlessly integrated into your workflow. Without a fully integrated ecosystem, development time will increase and costs will escalate.
Data compliance and retention policies
In a data-driven world, customer data is the most valuable asset for any company. Mishandling data can lead to lawsuits and reputational damage. Data retention policies and contracts should be a pivotal consideration when evaluating LLMs and vendors. Aim to find vendors that align with your customer’s data expectations, ideally opting for vendors with zero data (prompt) retention policies which make sure that generative AI prompts and outputs are not retained by the LLM/vendor, nor are they learned by the LLM.
Understanding data compliance also extends to assessing whether a vendor’s data security measures align with industry standards and regulations. This includes evaluating their encryption methods, access controls, and audit trails.
For certain industries, such as healthcare or finance, compliance with regulations is essential. Therefore, it is important to choose vendors who are compliant with the regulations required for your business. For companies working with US government agencies and the public sector, FedRAMP compliance is mandatory.
Regional availability and data sovereignty
Regional availability of models is crucial for companies operating globally, as it ensures applications meet local data sovereignty laws and optimizes performance for users in those areas. Typically, vendors launch models in U.S. regions first, gradually expanding to other regions.
OpenAI does not have data centers outside the U.S., which can be problematic in regions where data sovereignty laws require local data storage. In contrast, Microsoft Azure and AWS Bedrock allow models to be run in multiple data centers globally. Managing this alignment carefully is key to avoiding issues and to the successful expansion of products internationally.
Choosing vendors for effective multilingual support
What languages does your product support? AI applications require significant effort to support languages beyond basic translation and localization. Different LLMs exhibit varying performance for the same tasks in different languages, so choose models and vendors that support your product's languages or allow for necessary customizations.
Currently, most vendors focus on improving model performance and reducing costs, with supporting additional languages beyond English being a secondary goal. With the current state of offerings, model fine-tuning/retraining for multiple language support may be required for your product.
Choose vendors who offer pre-trained models that could perform well in multiple languages and provide tools for fine-tuning these models to suit your needs. Prioritize those committed to improving multilingual capabilities, as this reflects their understanding of global market dynamics, positioning them as valuable long-term partners.
Model versioning and support
New generative AI models are constantly being released. As new versions emerge, older ones are often retired, posing risks to your product’s success if you’ve bought into a now-retired system. But switching models or versions isn’t always straightforward – it can necessitate experimentation, rerunning benchmarks, testing, and sometimes rewriting pre/post-processing logic and integrations.
To avoid disruptions and ensure continued model support, assess your vendor’s policy on model updates and the longevity of model support in the case of deprecation. Some vendors publish model retirement/deprecation schedules, providing essential lead time to plan a smooth transition.
Good relationships and support from vendors can also help extend support when needed. Some vendors let you keep using outdated models if you commit to buying dedicated capacity. While this guarantees continued access, it does come with an added cost.
LLM token throughput constraints
Generative AI inference involves tokens in, tokens out. The model processes these input tokens and generates new output tokens. Token limitations can affect the ability of your application to manage user requests, thereby impacting the end-user experience. Each vendor has its own limits on the number of tokens you can use for each model. These limits impact your app’s scalability. Some vendors offer more capacity on demand, while others may require you to buy a fixed amount of capacity in advance. Choose vendors and models that can process a large number of tokens at a reasonable cost without sacrificing performance.
Moreover, when selecting a vendor, it’s important to consider how they manage traffic during peak and off-peak times. With certain vendors, delays in processing take a hit during peak hours. To avoid this, choose a vendor that smoothly handles these variations, ensuring consistent performance even during high-traffic periods. Monitoring tools that provide real-time insights into token usage can be highly beneficial – tracking token usage and helping you plan better capacity in the future. This will ultimately prevent unforeseen breakdowns from taking place.
Cost structure
Cost is a critical factor for any company. LLM inference is expensive, and without proper monitoring and alerting, costs can quickly escalate.
Before launching any product to optimize costs, you must understand key factors like throughput, token requirements, traffic patterns/peaks, and whether you need to buy provisioned capacity – an accurate cost assessment will help maintain profitability while scaling. Collaborate with your go-to-market (GTM) teams to develop strategies that can help offset AI inference costs. An example strategy is to limit usage based on customer agreement.
When evaluating the cost structure offered by different vendors, consider more than just the price per token or the provisioned capacity cost. Factor in additional expenses like support fees, data storage fees, and overage charges. Some vendors offer bundled services or discounts for long-term commitments, which can be more cost-effective in the long run.
Building AI cost monitoring and management tooling can help optimize expenses. Here, it’s wise to negotiate with vendors for flexible payment options and transparent billing practices to manage costs better. By monitoring cost-driving factors, such as requests with a high number of tokens and adjusting service usage with rate limiting, you can achieve significant savings over time.
Consider running pilot programs to get a better grasp of the cost implications before committing fully. This not only helps in budget planning but also allows you to thoroughly evaluate the vendor’s offerings and their alignment with your business needs.
SLA guarantees
It’s crucial to establish service level agreements (SLAs) with your vendors for various metrics such as availability and time to close incidents. While latency SLAs are not commonly supported by LLM vendors, availability/uptime SLAs are vital for your application’s success. Established cloud providers like Google Cloud, AWS, and Microsoft offer robust availability SLAs. These agreements provide a safety net, ensuring your application’s reliability and uptime.
Don’t forget to review vendors’ historical performance reports to see their true uptime percentages and incident response times. This can offer a more realistic picture of their reliability. Moreover, consider vendors that offer tiered SLA options, allowing you to choose the level of service that aligns best with your operational needs and cost considerations.
Cloud maturity and tech support
One of the most important things to look for is a vendor who is equally invested in your success. This desire to see you achieve product goals will bleed into other areas such as technical support teams. A vendor with mature cloud infrastructure and strong technical support can make a significant difference in how smoothly your operations run – ultimately influencing your product’s success.
Seek out vendors that have comprehensive documentation, community forums, and technical guides, so that troubleshooting and optimizing the vendor’s offerings is simple. Additionally, check for the availability of premium support services, which can be crucial for business-critical applications.
Final thoughts
As AI technology continues to evolve and become an integral part of business operations, forming the right strategic partnerships will be crucial. Choosing the right generative AI vendor is a nuanced decision that has far-reaching implications for your company’s operational efficiency, cost structure, and overall success in the AI landscape. By carefully evaluating multiple factors businesses can identify a vendor that aligns with their strategic goals.
Ultimately, a thoughtfully chosen generative AI vendor can be a game-changer, providing the tools and support needed to propel your organization to new heights in the generative AI era.