The hard truths of putting Generative AI to work

Δημοσιευμένα 2025-10-03 12:02:03

686

The business world is in the midst of an artificial intelligence gold rush. Spurred by the rapid ubiquity of generative AI, a palpable sense of urgency has gripped boardrooms. Leaders are scrambling to integrate large language models (LLMs) into their operations while trying to understand what exactly these tools do and where they’re headed.

The promise is alluring: off-the-shelf intelligence, plugged in like a utility, ready to revolutionise productivity. Yet, as Sansan has discovered in our journey in applying AI, this seductive promise often hits the hard realities of specific, high-stakes business problems.

Our experience building a bespoke AI model from the ground up reveals a crucial lesson for this era: true, defensible value in enterprise AI comes not from blindly adopting the latest trend, but from a painstaking, first-principles approach that bridges deep technical expertise with a profound understanding of business reality.

As a company built on highly priced extraction of information, we’ve focused on evolving our use of AI to speed up daily operations and workflows across our solutions, including business cards and contact (Sansan), invoicing (Bill One), and contracts (our Japanese solution, Contract One).

The task is exacting. An error is not a conversational quirk; it is a business liability. We quickly found that while general-purpose AI is a marvel of common-sense reasoning, it is a poor substitute for domain-specific expertise.

It could not, for instance, reliably distinguish between a personal and a corporate email address on a business card without the kind of explicit, nuanced instruction that its architecture was not designed to handle.

More fundamentally, these models failed a critical technical test. Most are built on foundations, such as OpenAI’s CLIP, that were pre-trained on relatively low-resolution images.

This is adequate for identifying a cat in a photograph, but it certainly can’t decipher the tiny, dense print on a complex invoice.

The available tools’ very architecture was misaligned with the fine-grained nature of our problem. The “good enough” of the consumer world was nowhere near good enough for our company’s enterprise needs.

This left us at a strategic crossroads. We could continue to wrestle with inadequate tools, or we could take the great risk of building our own vision-language model from scratch. We chose the latter. This was not a decision born of academic curiosity, but of calculated business strategy.

We determined that the market risk was low — we knew a more accurate system would create immense value. The technical risk, however, was immense. We were venturing into uncharted territory with no guarantee of success.

Our journey was one of trial, error, and adaptation. An initial attempt to build upon the prevailing model architectures proved inefficient for the high-resolution data our task demanded.

The process was unstable, and the GPU memory requirements were staggering, forcing us to use smaller data batches that crippled the learning process.

We had to pivot, moving away from the popular approach to a simpler, more elegant architecture that was better suited to our specific goal of text generation from images. This willingness to question the hype and return to first principles was what ultimately allowed us to achieve a technical breakthrough.

Yet, this is the point where most AI stories in the business press end.

In reality, our work had just begun. We had built a model that matched the accuracy of our existing, optical character recognition (OCR) system specialized for business cards. But we quickly ran into a second, more intractable obstacle: the wall of practical application.

Technical success, our engineers learned, is only half the battle. Our new AI was powerful but, running on expensive GPUs, its operating cost was significantly higher than the incumbent system. To justify its existence, it had to deliver a return on investment that far exceeded its costs.

Also, simply matching the performance of a deeply embedded legacy system is rarely a compelling reason to undertake the costly and risky process of replacement. “As good as” is a technical milestone, not a business case.

Our developers’ initial attempts to deploy the technology stalled. Business units were being approached passively, asking, “What can we do for you?” This was met with a mixture of confusion and mismatched expectations. The breakthrough came only when we flipped our strategy.

Instead of asking what can be done, our technical teams started showing. They took each division’s own data, ran it through our model, and presented them with concrete reports — qualitative and quantitative analyses of the results.

Suddenly, the conversation changed. Abstract potential became tangible value. By demonstrating, not just describing, our model’s capabilities on their specific problems, the trust and understanding needed to secure resources and move toward deployment was built.

This journey from a technical proof-of-concept to a growing, multi-domain business application holds universal lessons. The real AI dividend will not be claimed by the companies that are quickest to adopt generic tools.

It will go to those with the discipline to identify their core business challenges and the courage to invest in bespoke solutions, even if it means building from the ground up. It requires fostering a culture where technical and business teams collaborate so closely that they speak a shared language.

The future of enterprise AI belongs not merely to the users, but to the builders — those who understand that the last mile of innovation is always the hardest, and that true competitive advantage is earned in the painstaking work of closing the gap between a powerful technology’s promise and the complex reality of the problems worth solving.

#EnterpriseAI #BespokeAI #VisionLanguageModel #AIAdoption #AIForBusiness