Shadi A. Noghabi

Ph.D. Candidate
Department of Computer Science
University of Illinois at Urbana-Champaign

I am currently a Ph.D. candidate in the Department of Computer Science at the University of Illinois at Urbana-Champaign. I work conjunctly with Prof. Roy H. Campbell in the Systems Research Group (SRG) and Prof. Indranil Gupta in the Distributed Protocols Research Group (DPRG).

My research focuses on Distributed Systems, Cloud Computing and Big Data. I am currently working on geo-distributed large-scale objects stores, and stream processing systems. You can find my CV and research statement.

Contact Information

Address: 3111 Siebel Center, 201 N. Goodwin Ave., Urbana, IL 61801.
Email: abdolla2 [at] illinois [dot] edu

My profiles in social media:

Projects


Samza

May'15 - present

Samza: Stateful Scalable Stream Processing at LinkedIn, Collaboration between LinkedIn and University of Illinois at Urbana-Champaign, Under Prof. Indy Gupta and Prof. Roy Campbell.

Distributed stream processing systems need to support stateful processing, recover quickly from failures to resume such processing, and reprocess an entire data stream quickly. We present Samza, a distributed system for stateful and fault-tolerant stream processing. Samza is currently in use at LinkedIn by hundreds of production applications with more than 10,000 containers. Samza is an open-source Apache project adopted by more than 15 companies (Uber, Netflix, TripAdvisor, etc.)


FreeFlow

Jun'16 - Sep'16

FreeFlow: High Performance Container Networking, Collaboration between Microsoft Research, Carnegie Mellon University, and University of Illinois at Urbana-Champaign.

Current container networking solutions have either poor performance or poor portability, which undermines the advantages of containerization. We propose FreeFlow, a container networking solution which achieves both high performance and good portability. FreeFlow leverages two insights: first, in most container deployments a central entity (i.e. the orchestrator) exists that is fully aware of the location of each container. Second, strict isolation is unnecessary among containers belonging to the same application.


Ambry

May'14 - May'16

Ambry: LinkedIn's Scalable Geo-Distributed Object Store, Collaboration between LinkedIn and University of Illinois at Urbana-Champaign, Under Prof. Roy Campbell and Prof. Indy Gupta.

The infrastructure beneath a worldwide social network has to serve billions of variable-sized media objects such as photos, videos, and audios, continually. These objects must be stored and served with low latency and high throughput by a system that is geo-distributed, highly scalable, and load-balanced. To meet these goals we developed Ambry, a production-quality system for storing large immutable data. Ambry has been running in LinkedIn's production environment for the past 2 years, serving up to 10K requests per second across more than 400 million users.


Toward Fabric (ToF)

Jan'15 - present

Towards Fabric: A Middleware Implementing High-level Description Languages on a Fabric-like Network, System Research Group, Under Prof. Roy Campbell

Current SDN technologies provide powerful and flexible APIs, but can be unreasonably complex for implementing nontrivial network control logic, and very error-prone. We have developing a middleware layer for implementing policies and behaviors from high-level network descriptions on top of a more structured network (i.e., Fabric network). Based on our results, we reach near linear scalability with respect to the number of addresses routed over the network, all while introducing minimal performance overhead and requiring no changes to packet structure.


Adaptive Storm

Jan'14 - May'15

Real Time Adaptive Profiling in Storm Topologies, System Research Group, University of Illinois at Urbana-Champaign, Under Prof. Roy Campbell and Prof. Indy Gupta.

The layout of stream processing job and its parallelism is statically defined before execution and is not tuned by the Storm runtime. The bursty workloads of real time streams combined with the unstructured nature of big data presents unique challenges. Here we are working a dynamic profiling engine that runs within Storm and generates improved topologies, optimizing for throughput and latency on a given set of resources.


Context Passing in Cloud

Dec'13 - May'14

Using Context to Improve Performance of Cloud Stacks, System Research Group, University of Illinois at Urbana-Champaign, Under Prof. Roy Campbell and Prof. Indy Gupta.

General-purpose cluster management substrates such as YARN, Mesos, and HDFS, make it easier to run arbitrary systems atop them. Unfortunately, the generalized APIs supported by these substrates suffer from performance limitations and inefficiencies. This is primarily because these APIs do not support passing contextual information with requests and responses. As evidence of this gap, we have found several JIRA issues from these substrates. Also, we associating annotations with requests and responses in several ways to fix this gap.


Mimesis

Nov'13 - May'14

Mimesis Namespace Generetor, System Research Group, University of Illinois at Urbana-Champaign, Under Prof. Roy Campbell.

This project is a namespace generator that can create large and realistic hierarchical namespaces. This tool preserves many distributions of the hierarchy including: directories at each depth, subdirectories per directory, files at each depth, files per directory, file sizes, file creation stamps. Additionally, it includes configurations based on a large Hadoop (HDFS) cluster, as well as several configurations based on statistics collected at several HPC deployments.


Scheduled Caching

Aug'13 - Jan'14

Scheduled Caching – Memory Locality with the Help of Scheduler, System Research Group, University of Illinois at Urbana-Champaign, Under Prof. Roy Campbell.

In data-intensive processing systems like MapReduce, job input data is explicitly communicated to the job scheduler, so that it can properly schedule for locality. We present the design of a Scheduled Caching technique that leverages this information available to the job scheduler, to improve caching. The job scheduler provides (hints) about the access patterns of files to the storage layer, which can then pre-fetch and cache input data right before it needs to be processed.