OneFormer Semantic Segmentation in Action

Topic

Exploring Verona Through the Lens of Semantic Segmentation

In this video, we can see the OneFormer semantic segmentation model in action among the streets and squares of Verona. This represents the ideal scenario for benchmarking semantic segmentation algorithms, as the scenes are cluttered with many agents belonging to many different classes. There are tourists strolling around the ancient buildings, cars, bikes, and buses traveling around the streets that are full of billboards, signs, and bustling activities. Furthermore, the lighting of outdoor scenes is heavily dependent on the current weather conditions, making the task even more complex to solve with satisfactory precision. However, as you will see in this video, OneFormer is able to tackle all these problems, reaching a very satisfactory level of precision in all the scenarios. 

What is Semantic Segmentation? 

Recent advancements in artificial intelligence algorithms for computer vision are allowing more tasks to be tackled with increasing levels of precision. As an example, the latest semantic segmentation methods based on transformer models are both able to detect a lot of different classes of objects and to locate them with remarkable levels of detail. We could see these families of algorithms as the natural evolution of object detection methods. While detecting an object simply means finding the rectangle that contains it, segmentation requires identifying each single pixel belonging to it. As a result, the spatial information obtained by the state-of-the-art segmentation algorithms is much more fine-grained, meaning that it can be used to tackle more complex tasks. 

What is Semantic Segmentation Used for? 

The fine-grain localization obtained with state-of-the-art segmentation algorithms deeply influences varied domains. In autonomous vehicles, it precisely identifies lanes, obstacles, and traffic signs at a pixel level, enhancing safety and decision-making. In medical imaging, it meticulously classifies pixels to delineate organs and abnormalities, elevating diagnostic accuracy and treatment planning. Furthermore, many computer vision methods can benefit from leveraging fine-grained spatial information. As an example, tracking algorithms usually require describing each object in the most accurate and informative way possible, as they might have to distinguish between many similar objects. For this reason, it can be easily seen that using even a small number of pixels belonging to the wrong object (e.g., background) can degrade the overall tracking performance. 

What techniques were used for the video? 

To compute the semantic segmentation masks for this video, we employed the OneFormer model due to its remarkable level of precision. Primarily, this is a result of intensive optimization applied to its already potent encoder-decoder backbone transformer model.

Firstly, OneFormer utilizes a task token (Qtask) to initialize the queries that enter the transformer backbone. This approach ensures that the model’s focus is inherently tuned towards semantic segmentation. Additionally, the incorporation of a learnable text context (Qctx) enables the model to adaptively adjust its feature extraction and processing capabilities; this is essential for complex image interpretations. For instance, in a cityscape, Qctx could help the model understand the interplay between urban elements like buildings, roads, and parks, providing a cohesive understanding of the urban environment.

The dual aspect of Qtask and Qctx initialization is instrumental in enhancing the model’s ability to differentiate between various semantic classes with a higher degree of precision.

Project Page

Making AI Explainable: HOLMES

Topic

At EVS, we actively support and incentivize academic research to continuously push the boundaries of artificial intelligence, especially in computer vision. We recently supported our employee Francesco Dibitonto in his work on HOLMES (HOLonym-MEronym based Semantic Inspection), a new explainable AI technique presented at the xAI World Conference in Lisbon.

What is Explainable AI?

While deep learning has enabled incredible breakthroughs in computer vision, the knowledge acquired by models during training is fully sub-symbolic and difficult to interpret. Explainable AI (XAI) aims to shed light on these black box models by providing explanations for their predictions. By understanding why models make certain decisions, we can identify potential biases, debug errors, and increase user trust in AI systems. Explainability is especially crucial for safety-critical applications like self-driving cars.

XAI Taxonomy

Explainable AI (XAI) can be categorized along several dimensions:
  • Transparency – The degree to which an AI system’s internal workings are observable and understandable. A transparent system allows users to inspect its algorithms, data, and models.
  • Interpretability – The extent to which an AI system’s logic and predictions can be understood by humans. An interpretable system provides insights into why certain outputs were produced.
  • Explainability – The ability of an AI system to provide explanations for its functioning in human-understandable terms. Explainable systems can clarify their reasoning and justify their decisions.

These concepts are interconnected and contribute to overall model explainability. Transparency enables interpretability, which facilitates explainability. But tradeoffs exist between performance and explainability, so finding the right balance is key.

Previous XAI Approaches

Numerous techniques have emerged for explaining AI model predictions. A widely used one is Grad-CAM, which visualizes the importance of different input regions for a model’s output using gradient information. For image classification, it generates heatmap overlays showing which areas most informed the predicted class label. This provides local explanations for individual samples.

Other popular methods include producing counterfactual examples showing how inputs can be modified to change outputs, training inherently interpretable models like decision trees, and learning proxy models to mimic complex black boxes. Each approach has pros and cons regarding model fidelity, generalizability, and human understandability.

Generating Explanations with HOLMES

What HOLMES does is leveraging meronym detectors to pinpoint fine-grained salient regions and quantify their individual influence on the holonym model’s prediction. The full paper dives deeper into these techniques for prying open the AI black box. For example, to explain jaguar classification, HOLMES would detect the head, legs, tail, fur, etc. It would highlight the fur region in a heatmap and show that occluding it significantly lowers the model’s jaguar confidence score.

The Way Forward

HOLMES demonstrates the feasibility of generating multi-level explanations without relying on full-part annotations. As AI proliferates, ensuring model transparency and fairness is crucial. At EVS, we are committed to advancing explainable computer vision through pioneering research. HOLMES represents another step toward interpretable systems that users can trust.

We welcome collaborations as we continue demystifying deep learning.

Check out the full HOLMES paper here to learn more!

Computer Science Ph.D. Opportunity

Topic

Unlock Your Potential: Ph.D. Opportunity in Computer Science Research with EVS Embedded Vision Systems and the prestigious University of Verona

Are you passionate about computer science and eager to make a significant impact in the field? In collaboration with the University of Verona, EVS is thrilled to sponsor a comprehensive Ph.D. program that offers a unique blend of education and research experience.
Join our team as we support a Ph.D. program in Computer Science, offering a comprehensive education and research experience. As a Ph.D. student, you’ll gain profound knowledge and advanced research skills to excel in both pure and applied research in the industry.

Program Overview

Our three-year Ph.D. program (2023-2026) provides a remarkable opportunity for aspiring researchers in computer science. Through this program, you will gain profound knowledge and advanced skills in both pure and applied research. Explore cutting-edge research methodologies and delve into the fascinating field of personal psychophysical monitoring on IoT devices, utilizing multimodal data analysis and AI.

Immersive Work Environment

Experience the benefits of our outstanding work environment at EVS Embedded Vision Systems. As a leading company revolutionizing industries through embedded vision systems, we foster collaboration and innovation. By joining our team, you will have the opportunity to work alongside experts and contribute to groundbreaking projects that shape the future of computer science.

Program Structure and Supervision

Our program features a well-rounded structure that includes 18 months of hands-on work within the company, providing invaluable practical experience. Additionally, you will have the opportunity to broaden your perspectives with 6 months abroad. Throughout your research journey, you will receive mentorship and scientific supervision from Professor Murino, a renowned expert in the field.

Deadline for the Ph.D.

The deadline for applications is the 7th of July 2023.
Start your journey as a researcher in computer science today!


Pose Estimation in Sports

Topic

How Human Pose Estimation is Revolutionizing Sports Performance Analysis

In recent years, computer vision and machine learning techniques have revolutionized the way we analyze sports performance. One of the most exciting applications of these technologies is in human pose estimation, a field that has seen tremendous growth and development in recent years. At Embedded Vision Systems, we specialize in developing state-of-the-art computer vision algorithms for a variety of applications, including human pose estimation. This 2D human pose estimation algorithm is among the best in the industry, and we are proud to offer it to our clients in the sports industry.

What is Human Pose Estimation?

Human pose estimation involves using computer vision algorithms to analyze and track the movement of people in images or video footage. This technology has numerous applications in the sports industry, where it can be used to analyze and improve athletes’ movements in real-time. By using deep learning algorithms to analyze and track movements, coaches and trainers can gain valuable insights into athletes’ movements and performance, helping them to optimize training and improve overall performance.

The Benefits of Human Pose Estimation for Sports Performance Analysis

Human pose estimation, powered by deep learning and computer vision techniques such as our advanced 2D human pose estimation algorithm, offers a range of significant benefits for sports performance analysis. By accurately tracking and analyzing athletes’ movements in real-time, coaches and trainers gain access to invaluable insights that can enhance overall performance and training outcomes.

First and foremost, human pose estimation enables coaches to precisely identify areas where athletes can improve their technique. By tracking the movements of individual body parts, such as limbs, joints, and the spine, coaches can pinpoint subtle deviations from optimal form. These insights allow them to provide targeted feedback and guidance, helping athletes make precise adjustments and refine their technique. By optimizing movement mechanics, athletes can enhance efficiency, minimize wasted energy, and ultimately achieve better performance outcomes.

Furthermore, human pose estimation facilitates real-time performance monitoring. Coaches and trainers can receive immediate visual feedback on athletes’ movements, enabling them to assess technique and make instant adjustments during training sessions.

This real-time feedback loop allows for quick identification and correction of movement errors or compensations, ultimately leading to faster skill acquisition and improved performance outcomes.

Another significant benefit of human pose estimation lies in its ability to track an athlete’s progress over time. By collecting and analyzing data on an individual’s movements across various training sessions or competitions, coaches and trainers can identify patterns and trends that indicate progress or areas that require further attention. This historical perspective provides a comprehensive view of an athlete’s development, enabling targeted training programs and individualized coaching approaches to be devised.

Applications of Human Pose Estimation in Sports Performance Analysis

Beyond its direct impact on sports performance analysis, human pose estimation finds application in several other areas within the sports industry. One such area is injury prevention and rehabilitation. By accurately tracking an athlete’s movements and identifying potential biomechanical risks, human pose estimation enables the development of targeted injury prevention strategies. Coaches and trainers can proactively address movement patterns that may predispose athletes to injuries, leading to reduced injury rates and improved athlete well-being.

Human pose estimation also plays a crucial role in biomechanical analysis. By precisely tracking and measuring an athlete’s joint angles, body posture, and movement trajectories, this technology allows for detailed biomechanical analysis. Coaches, sports scientists, and researchers can gain insights into the kinetics and kinematics of specific movements, providing a deeper understanding of how the body functions during athletic performance. This information can inform training methodologies, equipment design, and performance optimization strategies.

Additionally, human pose estimation has the potential to enhance sports broadcasting. By integrating this technology into live broadcasts or post-event analysis, broadcasters can offer viewers enhanced insights and visualizations of athletes’ movements. Whether it’s illustrating key moments in a match or providing data-driven analysis of an athlete’s technique, human pose estimation adds a new dimension to sports coverage, enhancing viewer engagement and understanding.

The Future of Human Pose Estimation in Sports Performance Analysis

Looking ahead, the future of human pose estimation in sports performance analysis is promising. As the field continues to advance, we can anticipate exciting developments that will further enhance its applications in the sports industry.

One area of anticipated progress is the ability to analyze and track multiple athletes simultaneously. Advancements in machine learning and computer vision will enable systems to process and interpret data from multiple sources, allowing for comprehensive analysis and comparison of athletes’ movements. This will facilitate better understanding of individual performance within the context of team dynamics and enable coaches to optimize team strategies and tactics.

Moreover, we can expect to see more sophisticated applications of human pose estimation in virtual reality and robotics. By integrating human pose estimation algorithms with virtual reality training environments, athletes will have the opportunity to practice and refine their movements in realistic, simulated scenarios.

Furthermore, in the field of robotics, human pose estimation can contribute to the development of robotic trainers or assistive devices that can mimic and adapt to human movement, facilitating rehabilitation and training processes.

In conclusion, human pose estimation is revolutionizing sports performance analysis by providing coaches, trainers, and athletes with valuable insights into movement mechanics, real-time feedback, progress tracking, injury prevention, biomechanical analysis, and enhanced sports broadcasting. As this field continues to evolve, we can anticipate even greater advancements and novel applications that will reshape the way athletes train, perform, and excel in their respective sports.


Top Skills for Embedded Vision Engineers

Topic

This cutting-edge technology allows machines to “see” and “interpret” the world around them, opening up endless possibilities for innovation and creativity in a wide range of applications

What skills are necessary to thrive in this field?

First and foremost, a strong understanding of image processing and computer vision is essential. The former concerns performing operations on images to enhance or extract information from them. The latter studies techniques and algorithms to enable computers to recognize and understand images and scenes.

Secondly, you will also need to have a basic grasp of machine and deep learning concepts. The latter plays a crucial role in vision technologies today and is used, for example, to train algorithms enabling machines to recognize and classify objects.

Thirdly, as an engineer working with embedded vision, you will need enough skills to understand system architectures and operate with an optimization-oriented mindset. You will focus on using system hardware resources in the best possible way for your task (even programming at a low level if necessary), always seeking for the most efficient path, the best compromise among accuracy, performance and computational resources.

At EVS, we specialize in designing FPGA/ASIC solutions to accelerate computer vision and machine/deep learning algorithms in embedded applications. And if you’re interested in pursuing a career in embedded vision, we’ve got some tips to help you get started.

A few tips for expanding your embedded vision skills

As embedded vision engineers you will need multiple skills to operate in the intersection of artificial intelligence, computer vision, digital signal processing, and embedded systems.

A degree in Electronic Engineering, Computer Science, or related fields is certainly a good starting point but that is just the beginning – continuous learning and hands-on experience are essential to refine your skills and staying up-to-date with the latest technologies. Here are some top skills you should focus on developing:

  • Object-Oriented programming using modern C++ and Python.
  • Experience with embedded development/debugging and software optimization
  • Embedded Linux and Real-time Operating Systems (RTOS)
  • Device drivers, including Linux (user space/kernel space drivers)
  • Hardware description languages (VHDL, Verilog)
  • Solid understanding of Image Processing, Computer Vision, Machine and Deep Learning concepts
  • Understanding of design patterns and UML
  • Knowledge of computer and camera architectures
  • Software/Hardware partitioning. Hardware acceleration in FPGA, GPU, and DSP
  • Object-Oriented programming using modern C++ and Python.
  • Experience with embedded development/debugging and software optimization
  • Embedded Linux and Real-time Operating Systems (RTOS)
  • Device drivers, including Linux (user space/kernel space drivers)
  • Hardware description languages (VHDL, Verilog)
  • Solid understanding of Image Processing, Computer Vision, Machine and Deep Learning concepts
  • Understanding of design patterns and UML
  • Knowledge of computer and camera architectures
  • Software/Hardware partitioning. Hardware acceleration in FPGA, GPU, and DSP

The key driver is passion

Building these skills takes a lot of time and practice, but it is a rewarding path that can lead to significant professional achievements.

To expand your skillset, consider taking online courses, reading books and webinars, and most importantly, seeking out hands-on experience in areas you’re passionate about. Do not underestimate the power of mentorship or internships.

The guidance of someone who has mastered the skill you’re trying to develop can be incredibly valuable and accelerate your understanding of the field.


EVS is now a Select Certified AMD Adaptive Computing Partner

Topic

We are proud to announce that we are now a Select Certified AMD Adaptive Computing Partner

At EVS, we are proud to announce that we are now a Select Certified AMD Adaptive Computing Partner. We will continue working closely with AMD and our other partners to expand our products’ capabilities and optimize customer performance even further.

AMD is a semiconductor company that primarily supplies programmable logic devices, particularly providing FPGA and SoC solutions to the embedded computing market. EVS has been designing computer vision IP blocks and software modules for embedded architectures for more than 15 years, so we know what customers need when it comes to hardware/software integration, bringing your product from design through production.
Through the AMD ACP Program, AMD has created a global ecosystem of qualified companies to assist mutual customers in developing their products faster and with confidence on Targeted Design Platforms. FPGA IP providers, EDA vendors, embedded software providers, system integrators, and hardware suppliers are among the Alliance’s members.
EVS has been a Select Certified AMD Adaptive Computing Partner since 2009. Now its membership has upgraded to the certified tier thanks to the expansion of the engineering team with recognized expertise on AMD technologies., with the peculiarity of knowing how to implement them efficiently on devices with limited computing resources and low consumption used on board vehicles, robots and in edge computing in general.

“With more than 15 years of experience with FPGAs on our side we are well-equipped to serve as a partner for AMD’s FPGAs and SoCs,” says Marco Monguzzi, CTO at EVS. “We are now able to offer our customers turnkey solutions for their board-level electronics projects.”
“AMD is a worldwide leader in programmable logic and SoCs,” added Roberto Marzotto, CEO of EVS. “We are excited about this recognition and look forward to continuing our long-standing relationship with AMD as well as helping our customers leverage their products for their unique needs.”

EVS creates FPGA IP blocks for computer vision and software modules for embedded architectures. These hardware and software modules are the result of a continuous improvement process aimed at achieving higher performance with fewer resources and adding value to our customers’ embedded solutions by reducing time-to-market with customizable and simple-to-integrate solutions. EVS has worked tirelessly to achieve the goal of designing effective vision solutions, particularly on computation-limited devices, and has amassed decades of combined experience in both Computer Vision and Machine Learning, as well as solid engineering methodologies. EVS is a high-tech company focused on the research and development of innovative solutions for automotive, machine vision, video surveillance, and medical imaging. Customers all over the world recognize and value its expertise.


EVS believes in research: discover the Master Degree in Artificial Intelligence

Topic

EVS was founded in 2005 as a spin-off of the University of Verona: precisely because of this, it is also its partner in the Master Degree in Artificial Intelligence

EVS was founded in 2005 as a spin-off of the University of Verona: precisely because of this, it is also its partner in the Master Degree in Artificial Intelligence, a two-year course aimed at science or engineering students who want to contribute actively to the dissemination and application of AI, of which computer vision is a part.


EVS: Industry 4.0 meets machine vision and opens to Digital Transformation

Topic

Making AI and machine learning part of a virtuous and effective mechanism through the realization of innovative machine vision solutions: this is our mission

A story of research and expertise that brings together scientific and software engineering skills also told by Sole24 ore and Publimedia Group’s Aziende24.

Computer vision is a booming field of AI that holds a pivotal place in Digital Transformation and Industry 4.0.
An Italian excellence in this sector is certainly eVS embedded Vision Systems, founded in 2005 as the first spin-off of the University of Verona.
eVS is a high-tech company strongly oriented towards the research and development of innovative solutions, with the peculiarity of knowing how to implement them efficiently on devices with limited computing resources and low consumption used on board vehicles, robots and in edge computing in general.

In particular, eVS specializes in the optimization of vision algorithms on embedded systems and in the design of FPGA/ASIC modules for the hardware acceleration of machine learning methods capable of recognizing objects, actions and behaviors.

It is mainly active in the automotive market on driver assistance systems, such as pedestrian detection and driver monitoring. Other sectors in which it operates are the nautical one for aiding the maneuvering of boats, the aerospace, biomedical and industrial ones.
The experience of eVS is today recognized and appreciated by customers and partners all over the world as a unique mix between solid scientific foundations and software engineering skills in the field of artificial vision.

Read the original article