Our present research is
investigating new modalities of interaction with electronic systems to
improve human machine interaction. We
have developed new calibration, target prediction
and multimodal fusion algorithms to improve accuracy and response times of
gaze controlled interfaces. We are investigating automotive and
military aviation environments and trying to improve human machine
interaction for secondary mission control tasks. In parallel, we are working
on detecting users’ cognitive load by analysing their ocular parameters
using low cost off-the shelf eye gaze trackers. We also worked with users
with severe speech and motor impairment and developed various applications to
help them to engage with society. Various MDes projects developed
mechatronics and cyber physical systems and investigated different
application areas like smart manufacturing, automotive and drone related
systems.
Human Space Flight missions often require interaction with touchscreen displays. This paper presents a study of investigating human machine interaction with touchscreen using both finger and stylus in the International Space Station. The study also reports cognitive state of astronauts in the form of spatial 2-back test and mental well-being through self-reported scales. We presented a series of results comparing pointing and selection performance among ISS crews, ground crews and university students, finger-based touching and stylus-based touching in microgravity and mental well-being scores. We reported that finger-based pointing is statistically significantly faster than stylus-based pointing in microgravity based on analysis of 420 pointing tasks in ISS from 2 astronauts. We also did not find any significant difference among pointing performance and mental state of astronauts and students on ground. Results from the study can be used to predict pointing and selection time from dimension and position of GUI (Graphical User Interface) elements for cockpits of spacecraft.
Our research on military aviation, in particular for fast jets, explored gaze controlled displays and estimation of cognitive load. We have a modular VR flight simulator in lab, which is integrated to a Multimodal Gaze Controlled HUD and HMDS. We invented new target prediction
technologies for eye gaze tracking systems to reduce pointing and
selection times in the HUD and
HMDS. The HUD can also be integrated to Brain Computer and Voice
controlled systems and we have also integrated a gaze controlled system with
a high-end flight simulator at the National
Aerospace Laboratory and collected data in combat aircrafts for cognitive load estimmation of pilots
With a funding from Microsoft Research and a DST
SERB early career fellowship, we are now developing gaze controlled assistive
systems for students with severe speech and motor impairment. Our initial
study found significant
difference in visual search patterns for operating a graphical user
interface between users with cerebral palsy and their able bodied
counterpart. Presently we are developing and evaluating gaze controlled intelligent user interfaces
for augmentative and alternative communication aid and edutainment systems.
Recent research is also developing multilingual virtual keyboards for Indian
languages.
For various projects, we are developing interactive AR and VR applications. We took help from research in AI and Multimodal Interaction to integrate non-traditional modalities like eye gaze
and gesture recognition and developed various products for automotive,
manufacturing and aerospace clients. A set of systems also turned useful for
people with mobility impairment. Users with severe speech and motor
impairment could undertake representative pick and drop task
using our eye gaze controlled robotic
arm. Our recent work compared different computer graphics and machine
learning algorithms and integrated a
webcam based eye gaze tracker with a
robotic manipulator through a video see through display
With a seed funding from Bosch, we initiated research on smart
manufacturing. Presently we are developing IoT modules for environment
and posture tracking. We have developed a visualization module that
displays both spatial and temporal data and by holding a semi-transparent
screen in front of an electronic screen and a VR sensor dashboard, a user can
augment spatial information with temporal trend or vice versa. The system
also supports interaction through an interactive laser pointer, AR based
Android application and is integrated to an early warning system that
automatically sends a Gmail if it senses any deviation in sensor readings in
the contexts of environment tracking. Presently we are working with British Telecom to develop a digital twin of their office space with interactive sensor dashboards. The research was finalist (top 5) of UK AI for Innovation Award 2024.
Object detection in the wild is a well-known problem in the domain of computer vision (CV). From a computer vision problem, it turns into an intelligent user interface (IUI) problem for augmented and mixed reality interfaces. The International Standardization Organization defined (ISO/IEC 5927 : 2024) mixed reality system as a system that uses a mixture of representations of physical world data and virtual world data as its presentation medium. The accuracy and latency of detecting and registering real life objects dictates the accuracy, reliability and finally usability of mixed reality applications. Our research addresses an important problem of developing mixed reality applications for domains where the object detection problem is too complex for classical image processing technique as well as hard enough for standard deep learning-based computer vision models due to scarcity of datasets. In particular, we highlighted two applications in the context of Industry 4.0 and aircraft maintenance, where standard labelled data were not available and exact deployment scenario could not be assumed apriori requiring a robust object detection model. While there
is plethora of works in computer vision domain on developing in-the-wild object detection models for autonomous vehicles, human body parts, we rather took a different approach on training standard computer vision model with synthetic data, and testing MR applications on diverse real life situations. The novelty of the work lies on:
(1) Developing a diffusion model-based image mixing pipeline for generating customized dataset
(2) Training a standard computer vision model with synthetic data and reporting object detection accuracy through MR applications
The proposed technique uses segmentation methods to automatically annotate relevant areas of interests of a set of images, mixing the images using the proposed pipeline to generate realistic synthetic data. Results indicate that the use of proposed model led to a substantial enhancement in detection performance.
The research on object detection was further explored for autonomous navigation of ground moving vehicles, manned and unmanned aerial platforms. We worked on both automated ground vehicle and drones. We aim to integrate information about on board passengers into the existing situational awareness process to enhance safety and comfort of autonomous vehicle. Our MDes students developed a modular drone payload that has multiple onboard cameras, sensors and transmitters that can send information to a base station without requiring any electronic integration with the drone. PhD students explored human drone interaction, in particular, use of eye gaze and voice modalities for undertaking secondary tasks while flying drones. We are also working on comparing and developing CNN model for Indian road conditions detecting irregular traffic participants. The work later led to a new object detection model, I-RoD and an IEEE Transaction paper on a novel lane detection algorithm. The I-RoD model was evaluated on the Indian Driving Dataset (IDD), featuring unstructured road environments. Comparative pixel-wise analysis showed that the proposed model outperforming four other state-of-the-art segmentation models by 12.91% on the India driving dataset (IDD). The proposed Lane Detection model achieved 95.19% accuracy on TuSimple dataset and an IoU score of 0.31 while tested on 6149 labeled images from India driving dataset (IDD). The lane detection algorithm also found use for automatc taxi of an aircraft.
The recent pandemic and associated hybrid work culture reignites the importance of extended reality (XR) technologies for remote collaboration across the world. As different ranges of automation are introduced in the industry with an increasing focus on digitalization, safety, and productivity, it is important to understand the context and physiological metrics of existing human operators. This paper describes the implementation of a virtual (VR) and mixed reality (MR) interface of a welding process and compares operators’ performance between VR and MR. Initially, MR and VR were compared with respect to a 3D equivalent of the ISO pointing task followed by a welding task involving trajectory definition and actual robotic arm movement. A plethora of parameters involving ocular data, EEG, hand movement, subjective opinion and quantitative measures were recorded and analyzed. The analysis of physiological parameters such as EEG based Task Load Index, Task Engagement Index, and ocular parameters such as fixation rate and average fixation duration indicated that the interaction in the synthetic VR environment involved higher engagement, lower mental processing load and distinct visual processing mechanisms in the optical cortex compared to the MR interaction. Similar comparison trends observed in these parameters across both tasks confirmed the reproducibility of the experiment methodology and results. Results from the study can be used in terms of selecting rendering media for other immersive applications in domains such as manufacturing, robotics, healthcare, and education. The results were used to develop a multi-modal VR interface with a novel collision-based weld path definition method suitable for industrial deployment.
Target prediction or intent recognition has been studied for many applications ranging from aviation, social computing, human machine interaction and so on. At its simplest form, target prediction may take the form of linear extrapolation to more complex activity of intent recognition or behaviour forecasting in the domain of social computing. Our present research investigates multimodal target prediction for human robot handover requests. Following Newell’s time scale of human actions, the prediction worked at the cognitive band or at the range of 1/10th to tens of seconds. This talk will explore different technologies and applications for human movement prediction ranging from finger movement to walking trajectories. The talk will start with an application of using artificial neural network-based target prediction for different input modalities to reduce pointing time in a GUI (Graphical User Interface). Next, I move on to using and comparing different Imitation Learning based algorithms like Inverse Reinforcement Learning and Behavior Cloning to predict target for a human robot handover task and combining hand trajectory with eye gaze fixation. Finally, a particle filter based approach will be discussed for walking trajectory prediction for a VR digital twin of office space.
We have worked with a local start-up company to
develop technology for helping cricket players and coaches. We validated a
bespoke wireless IMU sensor attached to a cricket bat developed by StanceBeam using Optitrack Flex 13 camera
system. We compared both orientation and speed of a cricket bat while
executing shots using StanceBeam
IMU sensor and Optitrack
system. Later we tried to simultaneously track eye gaze of a batsman and
integrate measurements of bat and eye gaze movements together.
In response to the Covid 19 pandemic, we
have developed a website
that automatically divides the duration of spread of the disease based on
rate of increase in new cases, and shows a set of three
graphs which are easier to interpret and extrapolate than a single
exponential graph. The shape of the graphs (like linear, parabolic or
exponential) can be compared at different stages and countries with respect
to the average number of new cases and deaths. The system also generates a
set of comparative graphs to compare among different countries and states.
The right hand side video shows an application of the website with automatic
speech recognition and text to speech systems.