Stream Processing support for FasTensor
Hi, I’m Aditya Narayan,๐
I’m a frequent visitor to the town square of theoretical CS, operations (Ops), and robust high-performance systems. Sometimes I indulge myself with insights on Computing and Biology, and other times I enjoy the accounts of minefield experiences in the systems world. Luckily, this summer, OSRE offered an opportunity that happened to be at the perfect intersection of my interests.
This summer, I will be working on a scientific computing library called FasTensor that offers a parallel computing structure called Stencil, widely popular in the scientific computing world to solve PDEs for Physical Simulations and Convolutions on Signals, among its many uses. I am excited to introduce my mentors, Dr. Bin Dong and Dr. John Wu of the Scientific Data Management Group at Lawrence Berkeley National Laboratory (LBNL). They bring invaluable expertise to the project.
They recognized the need for a tensor processing library that provided dedicated support for big datasets with inherent structural locality, often found in the scientific computing world, which was lacking in popular open-source MapReduce or Key-Value based frameworks.
More often than not, the operations performed on these datasets are composed of computations involving neighboring elements. This motivated the development of the FasTensor library.
I will be working on providing a Stream Processing interface that enables online data processing of large-scale datasets as they arrive from Data Producers. The project focuses on offering rich interfaces for managing and composing streams, supporting common scientific data formats like HDF5, and integrating fault tolerance and reliability mechanisms.
I am thrilled to work on the FasTensor project because I believe it has the potential to make a significant impact by enabling researchers to implement a rich set of computations on their big datasets in an easy and intuitive manner.
After all, FasTensor has just one simple paradigm: A -> Transform(F(x), B),
and it handles all the behind-the-scenes grunt work of handling big datasets so you can focus on your research.
Stay tuned for updates and feel free to collaborate!