Home
Scholarly Works
The Case for HW/SW Harmony in Real-Time Systems:...
Journal article

The Case for HW/SW Harmony in Real-Time Systems: Tightening Memory Latency of Streaming Applications

Abstract

Modern critical cyber-physical systems such as autonomous vehicles, drones, and real-time medical monitoring, demand not only intensive data processing but also stringent adherence to real-time performance constraints. These applications often involve continuous or sequential data streams (e.g., images, videos, and sensor readings), which require frequent memory accesses. Despite advancements in processing power, huge variable interference delay is incurred within the Dynamic Random Access Memory (DRAM) accesses. However, achieving a tight bound of memory latency remains a significant challenge, yet it is essential for ensuring safe and predictable execution of these critical tasks. To address this bottleneck, we propose InterStellarRT , a novel hardware/software harmony methodology that provides data-aware optimizations across the entire memory hierarchy. Leveraging a software layer that communicates data access patterns to the memory controller, InterStellarRT achieves significant reductions in memory access times, ensuring tightly bounded and predictable times. We perform the theoretical analysis of the memory latency bound. Then, we prove that InterStellarRT provides remarkable tighter memory latency bound for in-isolation and interference latencies compared to the state-of-the-art real-time systems based on the Commercial-Off-The-Shelf (COTS) Double Data Rate 4 (DDR4) memory devices and is also applicable to DDR5. We evaluate InterStellarRT on a RISC-V-based quad-core system on GEM5 and DDR4 in Ramulator. Analyzing benchmark results from Polybench, LAPACK, Phoenix, and HPCG Suites, InterStellarRT achieves a 3.8× tighter average bound for in-isolation memory latency and 13.5× for interference latency under affine workloads, while for mixed-affinity workloads, the bounds are 2.15× and 4×, respectively. Moreover, InterStellarRT achieves average 1.72× end-to-end speedup, and 1.9× bandwidth improvement, and 14% DRAM energy reduction against the baseline.

Authors

Abotaleb AM; Hassan M

Journal

ACM Transactions on Embedded Computing Systems, Vol. 24, No. 5s, pp. 1–27

Publisher

Association for Computing Machinery (ACM)

Publication Date

November 30, 2025

DOI

10.1145/3762647

ISSN

1539-9087

Contact the Experts team