PR: Peer-reviewed Journal, Conference and Workshop Papers, TR: Technical Reports
@article{sinha2024towards, author={Schowitz, Philip and Sinha, Soham and Gujarati, Arpan}, booktitle={2024 IEEE 45th IEEE Real-Time Systems Symposium (RTSS)}, title={{Response-Time Analysis of a Soft Real-time NVIDIA Holoscan Application}}, year={2024},}
NVIDIA Holoscan SDK is a novel edge and embedded software development framework
designed for NVIDIA System-on-Chips (SoCs), primarily targeting medical device
applications. This SDK facilitates complex data processing workflows using Directed
Acyclic Graphs (DAGs) composed of functional units termed operators. These operators,
running in separate threads, are usually interconnected with intricate execution
dependencies influenced by both upstream and downstream conditions on communication
data buffers. Current methods to measure the response time of a complex Holoscan
application rely on empirical benchmarking, which can be costly, time-consuming, and
unreliable - limitations that are particularly critical in sectors where safety and
certification concerns are paramount.
This paper introduces a novel static analysis methodology to determine worst-case end-to-end response times in
NVIDIA Holoscan applications. Our approach overcomes the drawbacks of existing empirical tools by providing a
response-time analysis capable of handling complex operator interactions and communication buffering mechanisms
inherent in Holoscan's architecture. Through rigorous theoretical analysis and empirical validation, our method not
only ensures predictability in system behavior but also aids developers in identifying performance bottlenecks and
optimizing system design. Evaluation using real-world NVIDIA HoloHub applications demonstrates the efficiency and
accuracy of our analysis, achieving theoretical response times as close as 0.3% of empirically measured numbers on
NVIDIA hardware using less than 1 ms computation time.
coming soon
@article{sinha2024towards, author={Sinha, Soham and Dwivedi, Shekhar and Azizian, Mahdi}, booktitle={2024 ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS)}, title={{Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan}}, year={2024}, pages={235-246}, doi={10.1109/ICCPS61052.2024.00028}}
The introduction of AI and ML technologies into medical devices has revolutionized healthcare diagnostics and treatments. Medical device manufacturers are keen to maximize the advantages afforded by AI and ML by consolidating multiple applications onto a single platform. However, concurrent execution of several AI applications, each with its own visualization components, leads to unpredictable end-to-end latency, primarily due to GPU resource contentions. To mitigate this, manufacturers typically deploy separate workstations for distinct AI applications, thereby increasing financial, energy, and maintenance costs. This paper addresses these challenges within the context of NVIDIA's Holoscan platform, a real-time AI system for streaming sensor data and images. We propose a system design optimized for heterogeneous GPU workloads, encompassing both compute and graphics tasks. Our design leverages CUDA MPS for spatial partitioning of compute workloads and isolates compute and graphics processing onto separate GPUs. We demonstrate significant performance improvements across various end-to-end latency determinism metrics through empirical evaluation with real-world Holoscan medical device applications. For instance, the proposed design reduces maximum latency by 21-30% and improves latency distribution flatness by 17-25% for up to five concurrent endoscopy tool tracking AI applications, compared to a single-GPU baseline. Against a default multi-GPU setup, our optimizations decrease maximum latency by 35% for up to six concurrent applications by improving GPU utilization by 42%. This paper provides clear design insights for AI applications in the edge-computing domain including medical systems, where performance predictability of concurrent and heterogeneous GPU workloads is a critical requirement.
https://ieeexplore.ieee.org/document/10571631/
coming soon
Pre-recorded presentation video:
@inproceedings{sinha2022modelmap, title={{ModelMap: A Model-based Multi-domain Application Framework for Centralized Automotive Systems}}, author={Sinha, Soham and Farrukh, Anam and West, Richard}, booktitle={41st IEEE/ACM International Conference on Computer-Aided Design (ICCAD)}, year={2022} }
This paper presents ModelMap, a model-based multi-domain application development framework for DriveOS, our in-house centralized vehicle management software system. DriveOS runs on multicore x86 machines and uses hardware virtualization to host isolated RTOS and Linux guest OS sandboxes. In this work, we design Simulink interfaces for model-based vehicle control function development across multiple sandboxed domains in DriveOS. ModelMap provides abstractions to: (1) automatically generate periodic tasks bound to threads in different OS domains, (2) establish crossdomain synchronous and asynchronous communication interfaces, and (3) handle USB-based CAN I/O in Simulink. We introduce the concept of a nested binary, for the deployment of ELF binary executable code in different sandboxed domains. We demonstrate ModelMap using a combination of synthetic benchmarks, and experiments with Simulink models of a CAN Gateway and HVAC service running on an electric car. ModelMap eases the development of applications, which are shown to achieve industry-target performance using a multicore hardware platform in DriveOS.
https://doi.org/10.1145/3508352.3549463
Google Slides Presentation, PDF
Pre-recorded presentation video for virtual audience:
@article{sinha2022solution, type = {Solution}, ids = {sinha2022Solution}, title = {End-to-end Scheduling of Real-time Task Pipelines on Multiprocessors}, author = {Sinha, Soham and West, Richard}, year = {2022}, month = aug, journal = {Journal of Systems Research}, volume = {2}, number = {1}, doi = {10.5070/SR32158647}, url = {https://escholarship.org/uc/item/2h11n6xj}, urldate = {2022-08-29}, area = {Real-time and Cyber-physical Systems}, artifacts_url = {https://github.com/sohamm17/pipe_schedule}, langid = {english}, review_url = {https://openreview.net/forum?id=icP8jy6ayy8} }
Task pipelines are common in today’s embedded systems, as data moves from source to sink in sensing-processing-actuation task chains. A real-time task pipeline is constructed by connecting a series of periodic tasks with data buffers. In a time-critical system, end-to-end timing and data-transfer properties of a task pipeline must be guaranteed. A guarantee could be mathematically expressed by assigning constraints to the tasks of a pipeline. However, deriving task scheduling parameters to meet end-to-end guarantees is an NP-hard constraint optimization problem. Hence, a traditional constraint solver is not a suitable runtime solution.
In this paper, we present a heuristic constraint solver algorithm, CoPi, to derive the execution times and periods of pipelined tasks that meet the end-to-end constraints and schedulability requirements. We consider two upper bound constraints on a task pipeline: end-to-end delay and loss-rate. After satisfying these constraints, CoPi schedules a pipeline as a set of asynchronous and data independent periodic tasks, under the rate-monotonic scheduling algorithm. Simulations show that CoPi has a comparable pipeline acceptance ratio and significantly better runtime than open-source MINLPsolvers. Furthermore, we use CoPi to map multiple task pipelines to a multiprocessor system. We demonstrate that a partitioned multiprocessor scheduling algorithm coupled with CoPi accommodates dynamically appearing pipelines, while attempting to minimize task migrations.
https://escholarship.org/uc/item/2h11n6xj
Artifacts are officially evaluated and available at https://github.com/sohamm17/pipe_schedule
RTSS '21
@article{sinha2021towards, author = {Sinha, Soham and West, Richard}, title = {{Towards an Integrated Vehicle Management System in DriveOS}}, year = {2021}, issue_date = {October 2021}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {20}, number = {5s}, issn = {1539-9087}, url = {https://doi.org/10.1145/3477013}, doi = {10.1145/3477013}, journal = {ACM Transactions on Embedded Computing Systems}, month = {sep}, articleno = {82}, numpages = {24} }
Modern automotive systems feature dozens of electronic control units
(ECUs) for chassis, body and powertrain functions. These systems are
costly and inflexible to upgrade, requiring ever increasing numbers of
ECUs to support new features such as advanced driver assistance
(ADAS), autonomous technologies, and infotainment. To counter these
challenges, we propose DriveOS, a safe, secure, extensible, and
timing-predictable system for modern vehicle management in
a centralized platform. DriveOS is based on a separation kernel,
where timing and safety-critical ECU functions are implemented in a
real-time OS (RTOS) alongside non-critical software in Linux or
Android. The system enforces the separation, or partitioning, of both
software and hardware among different OSes.
DriveOS runs on a relatively low-cost embedded PC-class platform,
supporting multiple cores and hardware virtualization
capabilities. Instrument cluster, in-vehicle infotainment and advanced
driver assistance system services are implemented in a Yocto Linux
guest, which communicates with critical real-time services via secure
shared memory. The RTOS manages a real-time controller area network
(CAN) interface that is inaccessible to Linux services except via
well-defined and legitimate communication channels. In this work, we
integrate three Qt-based services written for Yocto Linux, running in
parallel with a real-time longitudinal controller task and multiple
CAN bus concentrators, for vehicular sensor data processing and
actuation. We demonstrate the benefits and performance of DriveOS with a
hardware-in-the-loop CARLA simulation using a real car dataset.
https://dl.acm.org/doi/10.1145/3477013
Teaser Video: Google Drive Link
Live Presentation: Google Drive Link
SIGBED Blog on RTAS'22 Trip Report (Rich's Keynote)
MobiSys '21
@inproceedings{sinha2020pastime, author = {Soham Sinha and Richard West and Ahmad Golchin}, title = {{PAStime: Progress-Aware Scheduling for Time-Critical Computing}}, booktitle = {32nd Euromicro Conference on Real-Time Systems (ECRTS 2020)}, pages = {3:1--3:24}, series = {Leibniz International Proceedings in Informatics (LIPIcs)}, ISBN = {978-3-95977-152-8}, year = {2020}, volume = {165}, publisher = {Schloss Dagstuhl--Leibniz-Zentrum f{\"u}r Informatik}, URL = {https://drops.dagstuhl.de/opus/volltexte/2020/12366}, doi = {10.4230/LIPIcs.ECRTS.2020.3}, annote = {Keywords: progress-aware scheduling, code instrumentation, timing annotation} }
Over-estimation of worst-case execution times (WCETs) of real-time tasks leads to poor resource utilization. In a mixed-criticality system (MCS), the over-provisioning of CPU time to accommodate the WCETs of highly critical tasks may lead to degraded service for less critical tasks. In this paper we present PAStime, a novel approach to monitor and adapt the runtime progress of highly time-critical applications, to allow for improved service to lower criticality tasks. In PAStime, CPU time is allocated to time-critical tasks according to the delays they experience as they progress through their control flow graphs. This ensures that as much time as possible is made available to improve the Quality-of-Service of less critical tasks, while high-criticality tasks are compensated after their delays. This paper describes the integration of PAStime with Adaptive Mixed-criticality (AMC) scheduling. The LO-mode budget of a high-criticality task is adjusted according to the delay observed at execution checkpoints. This is the first implementation of AMC in the scheduling framework of LITMUS^RT, which is extended with our PAStime runtime policy and tested with real-time Linux applications such as object classification and detection. We observe in our experimental evaluation that AMC-PAStime significantly improves the utilization of the low-criticality tasks while guaranteeing service to high-criticality tasks.
https://drops.dagstuhl.de/opus/volltexte/2020/12366/
The guideline to download and run PAStime
RTSS '19, RTAS '20
@inproceedings{golchin2020boomerang, title={Boomerang: Real-Time I/O Meets Legacy Systems}, author={Golchin, Ahmad and Sinha, Soham and West, Richard}, booktitle={Proceedings of the 26th IEEE Real-Time and Embedded Technology and Applications Symposium}, pages={390--402}, year={2020}, doi = {10.1109/RTAS48715.2020.00013} }
This paper presents Boomerang, an I/O system that
integrates a legacy non-real-time OS with one that is customized
for timing-sensitive tasks. A relatively small RTOS benefits from
the pre-existing libraries, drivers and services of the legacy
system. Additionally, timing-critical tasks are isolated from less
critical tasks by securely partitioning machine resources among
the separate OSes. Boomerang guarantees end-to-end processing
delays on input data that requires outputs to be generated within
specific time bounds.
We show how to construct composable task pipelines in
Boomerang that combine functionality spanning a custom RTOS
and a legacy Linux system. By dedicating time-critical I/O to
the RTOS, we ensure that complementary services provided by
Linux are sufficiently predictable to meet end-to-end service
guarantees. While Boomerang benefits from spatial isolation,
it also outperforms a standalone Linux system using
deadlinei-based CPU reservations for pipeline tasks. We also show how
Boomerang outperforms a virtualized system called ACRN,
designed for automotive systems.
https://ieeexplore.ieee.org/document/9113119
SOSP '19
@inproceedings{sinha2020paravirtualized, title={A Paravirtualized Android for Next Generation Interactive Automotive Systems}, author={Sinha, Soham and Golchin, Ahmad and Einstein, Craig and West, Richard}, booktitle={Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications (HotMobile 2020), Austin, Texas, USA}, doi = {10.1145/3376897.3377861}, year={2020} }
Android's APIs, bluetooth support and smartphone integration
provide capabilities for user interaction with In-Vehicle
Infotainment
(IVI) and vehicle control services. However, Android is not
developed
to interface with automotive subsystems accessed via CAN bus
networks. This work proposes a new automotive system based on
our
Quest-V partitioning hypervisor, which allows Android to
communicate
and interact with timing and safety-critical services managed by
the
Quest real-time OS (RTOS). Quest is used to filter and receive
messages from Android applications and to interface with a car's
internal CAN bus in a timing predictable manner. Android is then
used
to host IVI applications and provide a user interface to
real-time
vehicle services. This system design allows Android to leverage
the
timing guarantees of Quest, while securely isolating critical
hardware
components and memory regions.
Quest-V hosts a paravirtualized Android 8.1 (Oreo) guest, which
required modification of 126 lines of kernel code. Secure shared
memory communication mechanisms between Android and a separate
Quest
guest provide real-time I/O to CAN bus networks.
https://dl.acm.org/doi/10.1145/3376897.3377861
@article{sinha2019pastime, title={PAStime: Progress-aware Scheduling for Time-critical Computing}, author={Sinha, Soham and West, Richard}, journal={arXiv preprint arXiv:1908.06211}, year={2019} }
Over-estimation of worst-case execution times (WCETs) of real-time tasks leads to poor resource utilization. In a mixed-criticality system (MCS), the over-provisioning of CPU time to accommodate the WCETs of highly critical tasks can lead to degraded service for less critical tasks. In this paper, we present PAStime, a novel approach to monitor and adapt the runtime progress of highly time-critical applications, to allow for improved service to lower criticality tasks. In PAStime, CPU time is allocated to time-critical tasks according to the delays they experience as they progress through their control flow graphs. This ensures that as much time as possible is made available to improve the Quality-of-Service of less critical tasks, while high-criticality tasks are compensated after their delays. In this paper, we integrate PAStime with Adaptive Mixed-criticality (AMC) scheduling. The LO-mode budget of a high-criticality task is adjusted according to the delay observed at execution checkpoints. Using LITMUS-RT to implement both AMC and AMC-PAStime, we observe that AMC-PAStime significantly improves the utilization of low-criticality tasks while guaranteeing service to high-criticality tasks.
https://arxiv.org/abs/1908.06211
@article{golchin2019boomerangtr, title={Boomerang: Real-Time I/O Meets Legacy Systems}, author={Golchin, Ahmad and Sinha, Soham and West, Richard}, journal={arXiv preprint arXiv:1908.06807}, year={2019} }
This paper presents Boomerang, a system that integrates a legacy non-real-time OS with one that is customized for timing-sensitive tasks. A relatively small RTOS benefits from the pre-existing libraries, drivers and services of the legacy system. Additionally, timing-critical tasks are isolated from less critical tasks by securely partitioning machine resources among the separate OSes. Boomerang guarantees end-to-end processing delays on input data that requires outputs to be generated within specific time bounds. We show how to construct composable task pipelines in Boomerang that combine functionality spanning a custom RTOS and a legacy Linux system. By assigning time-critical I/O to the RTOS, we ensure that complementary services provided by Linux are sufficiently predictable to meet end-to-end service guarantees. While Boomerang benefits from spatial isolation, it also outperforms a standalone Linux system using deadline-based CPU reservations for pipeline tasks.
https://arxiv.org/abs/1908.06807
@article{scheduling_sinha_2018, title = {Scheduling Policies and System Software Architectures for Mixed-criticality Computing}, author = {Sinha, Soham}, year = {2018}, URL = {https://open.bu.edu/handle/2144/40211}, publisher = {OpenBU} }
Mixed-criticality model of computation is being increasingly adopted in timing-sensitive systems. The model not only ensures that the most critical tasks in a system never fails, but also aims for better systems resource utilization in normal condition. In this report, we describe the widely used mixed-criticality task model and fixed-priority scheduling algorithms for the model in uniprocessors. Because of the necessity by the mixed-criticality task model and scheduling policies, isolation, both temporal and spatial, among tasks is one of the main requirements from the system design point of view. Different virtualization techniques have been used to design system software architecture with the goal of isolation. We discuss such a few system software architectures which are being and can be used for mixed-criticality model of computation.
https://hdl.handle.net/2144/40211
@article{ye2018vlibos, title={vLibOS: Babysitting OS Evolution with a Virtualized Library OS}, author={Ye, Ying and Cheng, Zhuoqun and Sinha, Soham and West, Richard}, journal={arXiv preprint arXiv:1801.07880}, year={2018} }
Many applications have service requirements that are not easily met by existing operating systems. Real-time and security-critical tasks, for example, often require custom OSes to meet their needs. However, development of special purpose OSes is a time-consuming and difficult exercise. Drivers, libraries and applications have to be written from scratch or ported from existing sources. Many researchers have tackled this problem by developing ways to extend existing systems with application-specific services. However, it is often difficult to ensure an adequate degree of separation between legacy and new services, especially when security and timing requirements are at stake. Virtualization, for example, supports logical isolation of separate guest services, but suffers from inadequate temporal isolation of time-critical code required for real-time systems. This paper presents vLibOS, a master-slave paradigm for new systems, whose services are built on legacy code that is temporally and spatially isolated in separate VM domains. Existing OSes are treated as sandboxed libraries, providing legacy services that are requested by inter-VM calls, which execute with the time budget of the caller. We evaluate a real-time implementation of vLibOS. Empirical results show that vLibOS achieves as much as a 50% reduction in performance slowdown for real-time threads, when competing for a shared memory bus with a Linux VM.
https://arxiv.org/abs/1801.07880
@article{data_sinha_2016, title = {Data Transfer Nodes for Cloud-Storage Providers}, author = {Sinha, Soham}, year = {June 2016}, URL = {https://doi.org/10.7939/R3NK36D0M}, publisher = {University of Alberta}, doi = {10.7939/R3NK36D0M} }
We provide a case study of current inefficiencies in how traffic to well-known cloud-storage providers (e.g., Dropbox, Google Drive, Microsoft OneDrive) can vary significantly in throughput (e.g., a factor of 5 or more) depending on the location of the source and sink of the data. Our case study supplements previous work on resilient overlay networks (RON) and other related ideas. These inefficiencies exist in the presence of vendor-specific points-of-presence (POP), which try to provide better network performance to the clients. In fact, the existence of special-purpose networks (e.g., national research networks, PlanetLab) and complicated peering relationships between networks, means that performance problems might exist in many wide-area networks (WANs). Our main contribution is to continue the cataloging of network inefficiencies so that practitioners and experimenters are aware of them. But, we also show how simple routing detours, can improve throughput by factors of over 3x for client-to-cloud-storage. Routing detours are implemented by adding intermediate nodes in the routing path. These special-purpose intermediate nodes are called data transfer nodes (DTNs). We have also implemented an optimization in these DTNs in the form of cut-through routing. Although the specific inefficiencies in this paper might be transitory (and we agree with that characterization), WAN bottlenecks due to routing, sub-optimal middlebox configuration, and congestion persist as real problems to be cataloged, discussed, and addressed through the use of detours, or data transfer nodes (DTNs), or RONs. Additionally, we provide a brief overview of the beneficial routing detours in 20 PlanetLab nodes in North America.
https://doi.org/10.7939/R3NK36D0M
https://github.com/sohamm17/cloud-storage, https://github.com/sohamm17/cloud-storage-scripts
@inproceedings{sinha2016mitigating, title={Mitigating routing inefficiencies to cloud-storage providers: A case study}, author={Sinha, Soham and Niu, Di and Wang, Zhi and Lu, Paul}, booktitle={2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (DPDNS)}, pages={1298--1306}, year={2016}, organization={IEEE}, doi = {10.1109/IPDPSW.2016.177} }
We provide a case study of current inefficiencies in how traffic to well-known cloud-storage providers (e.g., Dropbox, Google Drive, Microsoft OneDrive) can vary significantly in throughput (e.g., a factor of 5 or more) depending on the location of the source and sink of the data. Our case study supplements previous work on resilient overlay networks (RON) and other ideas. These inefficiencies exist in the presence of vendor-specific points-of-presence (POP), which try to provide better network performance to the clients. In fact, the existence of special-purpose networks (e.g., national research networks, PlanetLab) and complicated peering relationships between networks, means that performance problems might exist in many wide-area networks (WANs). Our main contribution is to continue the cataloging of network inefficiencies so that practitioners and experimenters are aware of them. But, we also show how simple routing detours, can improve throughput by factors of over 3x for client-to-cloud-storage. Although the specific inefficiencies in this paper might be transitory (and we agree with that characterization), WAN bottlenecks due to routing, suboptimal middlebox configuration, and congestion persist as real problems to be cataloged, discussed, and addressed through the use of detours, or data transfer nodes (DTNs), or RONs.
https://ieeexplore.ieee.org/document/7530016
https://github.com/sohamm17/cloud-storage, https://github.com/sohamm17/cloud-storage-scripts
@inproceedings{10.1145/3376897.3379163, author = {Sinha, Soham and Golchin, Ahmad and Einstein, Craig and West, Richard}, title = {A Paravirtualized Android for Next Generation Interactive Automotive Systems}, year = {2020}, isbn = {9781450371162}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3376897.3379163}, doi = {10.1145/3376897.3379163}, booktitle = {Proceedings of the 21st International Workshop on Mobile Computing Systems and Applications}, pages = {100}, numpages = {1}, keywords = {real-time, machine virtualization, automotive systems, android}, location = {Austin, TX, USA}, series = {HotMobile ’20} }
https://dl.acm.org/doi/abs/10.1145/3376897.3379163
publications