In the cutting-edge field of artificial intelligence, Davyd Maiboroda is one of its brightest talents. An AI solutions architect, software engineer and researcher, he has a wide-ranging experience of the field spanning more than 15 years. As you can tell, he’s passionate about building cutting-edge AI technologies. These solutions scale effectively in resource-constrained environments, particularly by empowering scalability through NVIDIA Jetson hardware.
AI Cast pipelines—developed in collaboration with orbitalcom—and deployed with their first pilot client VerTel are powering conversational AI systems for customer engagement and booking automation. His primary research interest is in adapting large language models (LLMs) for edge devices. These devices often are challenged because they have limited computing power and memory. His other big wins have come in massively lowering memory consumption and inference time. He did this by quantizing model parameters from 32-bit floating point values to lower precision formats (i.e INT8), all while maintaining accuracy.
Innovations in Conversational AI
At the cutting edge of Maiboroda’s contributions is his development of conversational AI running on Edge Impulse’s conversational AI and Jetson hardware. He argues that with more natural language processing and built-in decision-making capabilities, robots will be more intuitive. What should have happened These robots will be more intuitive for users to comprehend and engage with.
Then he goes through an extraordinarily careful process of preprocessing and structuring the data. This scrupulous process provides exceptionally low latency, something necessary for real-time uses such as gaming. As Maiboroda explains, the complexity of deploying AI to edge devices drives engineers to develop smarter, more streamlined solutions.
“It’s not about squeezing a huge model into a small space, but about reshaping the space and the model so they work together seamlessly.” – Davyd Maiboroda
This philosophy underlies his new approach for optimizing models specifically for the Jetson architecture. Through an emphasis on adaptation and precision, he’s mastered the creation of lightweight models, capable of real-time video analysis.
Optimization Techniques and Model Reduction
Maiboroda’s optimization techniques go much further than just reducing the number of parameters. Using this special blend of TensorRT optimization mixed in with his secret sauce, so to speak. This would involve eliminating excess parameters, rewriting model layers, among other things, to improve performance. Consequently, even the most resource-constrained devices are able to run large, sophisticated language models with ease.
He has demonstrated that large models can be effectively reduced to formats suitable for Jetson devices without compromising their functionality. This cut, in tandem with the natural language capabilities of ChatGPT, enables more powerful applications in industries across the spectrum, from robotics to customer service.
His dedication to a rigorous process of continual refining and early adoption means that the models he rolls out are tuned to stay responsive even across varying conditions. Maiboroda runs these models against several quantitative and qualitative performance metrics to verify their reliability and efficiency.
Challenges and Future Directions
The enormous complexity of deploying AI solutions to edge devices creates a myriad of challenges. For Maiboroda, these challenges represent opportunities for creativity. He laments that engineers need to adapt to technology’s new demands even, and especially, after they’re on the ground.
One of the most important lessons he’s learned from deploying LLMs so far is the need for flexibility in design. As his work shows, getting successful deployment right takes more than just a high-level understanding of hardware capabilities and software requirements.
As he makes further breakthroughs, Maiboroda believes the limitations of AI on edge devices are a thing of the past. His ongoing research aims to further enhance the synergy between AI models and their hosting environments, making advanced AI capabilities accessible even in settings with limited resources.
