AI building blocks for developers
How AI / LLMs will evolve the system architecture for applications?
Introduction
I have been building software and digital products for over 25 years. During these years, I have designed and built about anything from embedded systems to complex enterprise ERPs and eCommerce. I have learned to master and use the building blocks of modern software products: databases, queues, algorithms and patterns, backend code, cloud systems, communication channels, web frameworks, frontend frameworks, and dozens of others. I have a default tech stack, that allowed me to build anything digital really efficiently.
Then came the 30th of November 2022 and since that day, many of my current beliefs and understanding have been compromised. ChatGPT and OpenAI have rocked my world. Many of my existing building blocks will be obsolete or change their shape to something new. They will be replaced with something more intelligent and human.
And no, it is not only about content generation…
OpenAI Enablers
But before going to these building blocks, allow me to explain a bit about the enablers behind this. We all know ChatGPT, a product from OpenAI that seems to be able to answer any question with fully plausible answers. But OpenAI is much more than ChatGPT - it is actually a collection of AI enabled APIs and technologies, which can be used to build various AI applications.
GPT-3 is a natural language API that allows building several NLP use cases, including ChatGPT which is a specially trained GPT model (InstructGPT) for conversational use cases. GPT-3 has APIs for generating and editing content using prompts: natural language instructions.
Codex is a bit similar that GPT-3, but instead of creating text contents, it is able to turn natural language prompts into code snippets, database queries, or basically any programming language.
Embeddings allow you to analyze and categorize textual content. It returns you a vector, which allows you to find similar contents and associations. It is based on NLP, so it can understand human categorizations and can be used for simple natural language search, recommendations, and many other uses. You still need a vector database to store the embeddings and perform a search.
Fine-tuning is a way to build your own custom AI model on top of pre-taught models. It is done by feeding a set of prompts and answers, from which the AI learns more accurately to answer users’ questions. While this can be used for training the model with domain-specific data, many times embeddings provide a faster and more cost-efficient way of doing “fine-tuning”.
Building blocks
The following image shows the high-level architecture and components for building an OpenAI-based NLP app. It consists of three main sets of components: Natural Language UI, NL Analysis, Domain Model, and Domain Language model. In this article, I will briefly explain each of them and in later articles, dive deeper into details and examples.
Natural Language UI
The core of any NLP app is ability to talk with users in their own language. There are a lot of ready components you can build off, but generally, it is not too complex to build your own (the web is full of chat examples in any programming language).
Chat UI is the very basis of any conversational app and really easy to make discuss with OpenAI APIs. If you don’t want to build your own website, it is also possible to integrate with about any messenger as a Bot. Many chats include also visual aspects like images and allow implementing of some simple clickable controls, like select boxes.
Voice UI is a bit more complex with today’s technology, especially if you want to go in realtime and support other languages than English. Google and Amazon have their transcription APIs, which are quite simple to integrate with. A more advanced option would be to go with OpenAI’s Whisper engine.
2-way conversation: this is not yet something I have explored, but as a concept I find it cool if AI can connect to a multiparty conversation. It could take input from the user/client and learn from human sales/customer service people on the other side. A bit like what Microsoft teams video calls is doing with AI.
Results UI: in my opinion, conversation (in text or speech) is quite limited to building rich interactions with the user. Being able to visualize and summarize the results in real-time is really a killer use case for many purposes like sales and customer service.
Conversation history: To be more than just a chatbot / ChatGPT copy, it’s important that you remember the discussion you have had with a customer. Chat isn’t just about answering single questions, but rather a long conversation where you have learned the needs and context from previous discussions.
Natural language analysis
Personally, I don’t find ChatGPT interesting from a developer's perspective. It is a really cool productivity tool for anybody, but repeating its functionality just does not make sense. There is no added long-term value in that.
What is more interesting is its capability to understand users’ needs and intentions. Natural language analysis provides ways of interpreting human needs into a computer-understandable form, which we can use to control and personalize the user experience and collect valuable data.
Summarize: LLM is really good at finding the key topics and summarising long texts and conversations. It is already used in creating meeting notes and summarising slidesets from long source data. This is a really powerful feature when you are representing users’ need for a human counterpart.
Analyze: even more important than summarising, analyzing the conversation is the key for connecting the conversation with the apps/company’s own data and service. You can ask AI model to collect many data points and make interpretations about the conversation. So in the end, I believe that we can replace today’s web forms with conversational UI. We can leave formatting and validations to AI, making it all simpler for the user. It is also good for categorizing and profiling the data, there is a lot AI can predict e.g. based on users’ names.
Reply: getting AI to reply to a conversation is a really trivial thing. Making it reply correctly and guide the user, is much more complex. AI prompts allow you to feed quite a much context that allows more relevant replies. You can provide data and other instructions that help AI to respond to users’ questions. In order to do this, connecting the results from the analysis to the company’s own data model is crucial.
Actions/analytics: I believe that conversations are only one part of a good user experience. In the real world, they connect with more traditional user flows. When we do this, we can collect more relevant analytics of users’ needs and behavior. Except it can be even more powerful! Besides knowing what user has been looking for, we can collect their opinions and feedback on that. E.g. in webstore, we can have user comment on a certain product: “too expensive”, “wrong color”, which allows us to make even more relevant recommendations for the user.
Domain Model / Data import
This topic comes as third in my article, but on the priority list, this is the most important! How smart the LLM model is, it is not enough for providing added value for certain services/businesses.
If you look at any existing articles, most of them talk about fine-tuning the LLM model with your own data. My view is that for most use cases that is an expensive overkill and not even the most efficient. Instead, I’d build the domain model from document database/index and well-designed data embeddings (vectors)
Building the domain model
The biggest part of any data design is to understand the data models and available data for those. Usually, you need to fetch data from multiple sources and do some data transformations and aggregations based on those. The data is best stored in some efficient document database (I use Elasticsearch).
After gathering the data in some shared database, you can start categorising it into one or multiple embeddings. As specified in the OpenAI enablers chapter, embeddings are a way to categorize content into vectors that allow detecting similar content. The easiest way is to e.g. add your own vector for each chapter in documents, so you can match that content to e.g. users question. To make more complex categorizations, you can use GPT to generate specific content from the original data (e.g. full descriptions based on few data points)
With data indexed on a searchable database and good embeddings, you can construct a semantic search which can efficiently find relevant data and answer simple questions without any AI-model.
Using the domain model
After you have a well-designed domain data model and natural language analysis, you need to connect these two. In order to do that, you need to define the business logic and processes you want to implement. Whether it is the sales process, customer service process, or some other use case, you need to control the user flow. Otherwise, you might end up in open-ended customer flows.
There are various ways of doing this, but usually, it consists of more “reactive” process flows that act based on the results from the user and domain model. It can analyze if it has the needed data and reacts based on the given input. You can also use Codex-API to dynamically generate code or database queries from the inputs.
The basis of communicating with the domain model is to construct a schema that you can use in NLP queries. So you must define the tables/indexes and attributes that they contain. Codex APIs already contain pretty impressive understanding of data models and can help in translating the queries to actual code. It is also interesting to see if these will result in a “common language” for all apps, finally allowing communication with other systems without doing mappings and transformations.
The technologies used include e.g. behavioral process flows, state machines, semantic search, and various transformations that allow you to make conclusions and statistics from the data. This is so a broad topic that I won’t dive into this at this article.
Domain language model
I wanted to separate the language model from the domain model. As explained, the domain model with embeddings is many times more efficient way of providing service than teaching your own language model.
However, there are few use cases where a custom language model can provide better results. Especially in teaching AI the real domain expertise. For example in car sales use case I have been implementing, there is some specific domain understanding which is difficult for existing language models to understand. E.g. how different car models differ in their lifespan need for repairs and other data that cannot be read from technical specifications. Why would you pick Tesla over BWM? Why choose electric car instead of gasoline? What are the real yearly costs of each car?
These things can be taught to LLM using fine-tunings. Coming up with prompts and answers for them. The best way of teaching would be to let domain specialists do the training based on users’ questions. You can also use GPT to come up with correct questions and answers based on them.
Anyways, it is good to understand that with current OpenAI api prices, the cost of using custom language model is more than six times bigger. This limits the possible use cases quite much.
Summary
OpenAI has opened a whole new world of possibilities for us software developers. Even today, it can be used for creating totally new experiences and apps that provide huge value compared to existing systems. In the future they will improve even more and as there will be competition, quality will rise and prices will go down. This is a really exciting time for developers and creates huge opportunities!
This was just a short intro to the topic. For more, subscribe to follow me here at Substack, Twitter or Linkedin.
Who am I: a passionate software architect/developer with over 25 years of experience in multiple fields. Having seen the impact of AI is creating in our industry, I decided to quit my job and put all-in at building new world with AI. Just bootstrapping and designing multiple ideas to create a big impact!