SHARE YOUR KNOWLEDGE 1/2 SCHEDULE

Share your knowledge 1/2

10.30 - 12.00

Graduate students of Computer Science Department will present their research work. The purpose of this section is to share our knowledge and get to know each other's research interests. Presentations will last for 10 minutes with 5 minutes questions. Presenters should submit both their presentation document and poster in PDF format until October 1st, 2019 by sending an e-mail to: gsacsd@csd.uoc.gr, with the following Subject: [GSC19] Presentation and Poster submission - < Author's Full Name >. Presenters may request for their poster to be printed by the organizing committee of the conference in their submission e-mail. The poster session will take place between 14:00-15:00 and the poster dimensions should be A0 [84.1 x 118.9 cm] size.

In-Memory Data Analytics Computations at Scale? Look Beyond Physical DRAM by Iacovos G. Kolokasis

10.30 - 10.45

The amount of data generated daily by social media and large organizations is increasing at a high rate. The flood of data enables JVM-based data analytics frameworks such as Apache Spark or Apache Flink to perform intensive analysis on data to discover and predict trends. While the intensive computations and the size of data grow, the memory capacity of a server must scale to optimize big data analytics engines' workloads by moving and processing large amount of data closer to the processor. However, DRAM capacity scaling is not able to match the requirements of in-memory big data analytics frameworks. There are two orthogonal approaches, to overcome DRAM capacity scaling: (i) by adding more memory to each server via persistent memory (PM) technologies, (ii) by adding more servers.
Our goal is to explore how persistent memory can be used to enlarge the memory of each server in Apache Spark. We are exploring the use of on-heap, off-heap, and storage address spaces in Apache Spark and we are considering optimizations related to each approach. In addition, we are interested to understand which aspects (byte addressability, performance, etc.) of PM have the largest impact on big data analytics frameworks applications and what are the new challenges (e.g Garbage Collection Time) for JVM-based data analytics engines to work on TB heap sizes. We would also like to contrast these approaches with mmap-based systems, where block-device address space can appear as an extension of DRAM (and via DRAM).


Slides in PDF.

Support for different service levels through transparent migration of pages in distributed memory systems by Emmanouil Skordalakis

10.45 - 11.00

With the constant evolution of high performance applications, their memory requirement is rapidly increasing. As a result, the demand for more memory on computer nodes of large clusters running those applications, continuously rises. However, an individual computer node has limits in terms of memory capacity. Typically, by running several processes of different computational and memory requirements on a cluster, creates fluctuating workloads among the computer nodes. Hence, several nodes use most of their memory while others are left with unused memory which could potentially be exploited by nodes with a heavy memory workload.
Consequently, the concept of remote memory management has become the subject for research by many organizations, which have implemented varying techniques for reading and writing data on remote memory. Although using remote memory practically increases the total available memory of a computer node, accessing data remotely, can critically minimize performance due to the data travelling through the network interconnection of the cluster. Furthermore, software APIs that are implemented to give processes access to remote memory, primarily can be complex, and secondly the responsibility for remote memory allocation and fair remote memory sharing among processes, is assigned to processes themselves, which can be quite complicated, especially when many processes are running simultaneously on the same computer node.
In our thesis, we present the Page Migration System(PMS), which monitors main memory usage of the computer nodes on a cluster, and moves infrequently accessed data of a process from the memory of a computer node with heavy memory workload, to the unused memory of a remote computer node of the same cluster node with a lighter memory workload. The key features of the PMS is that it transparently moves LRU pages of processes to remote memory while using a fairness algorithm when choosing memory pages among many processes running on the same computer node. What's more, remote memory is mapped on the local node, allowing the OS to cache remote data. To be precise, a read and/or write on remote memory happens when we get a cache miss. Cacheability offers better performance when there are less misses, by reducing network transfers. Finally the system is able to return memory pages locally if the overall node memory usage drops, or if the access frequency of those memory pages increases.
We evaluate the PMS using several benchmarks that stress the CPU in terms of memory access. We use benchmarks that perform raw serial access on arrays of around a Gigabyte in size and thus cause cache eviction frequently, essentially moving more data through the network. That way we can measure the performance drop of a process due to memory access in the worst case scenario. We also run cache blocking benchmarks that exploit temporal locality, and we show that we get a better performance that way by reducing operations on remote memory. Finally we observe the behaviour and performance on real HPC applications using the PMS.


Slides in PDF.

VAT: Asymptotic Cost Analysis for Multi-Level Key-Value Stores by Nikos Batsaras

11.00 - 11.15

Over the past years, there has been an increasing number of key-value (KV) store designs, each optimizing for a different set of requirements. Furthermore, with the advancements of storage technology the design space of KV stores has become even more complex. More recent KV-store designs target fast storage devices, such as SSDs and NVM. Most of these designs aim to reduce amplification during data re-organization by taking advantage of device characteristics. However, until today most analysis of KV-store designs is experimental and limited to specific design points. This makes it difficult to compare tradeoffs across different designs, find optimal configurations and guide future KV-store design.
In this presentation, we introduce the Variable Amplification--Throughput analysis (VAT) to calculate insert-path amplification and its impact on multi-level KV-store performance. We use VAT to express the behavior of several existing design points and to explore tradeoffs that are not possible or easy to measure experimentally. VAT indicates that by inserting randomness in the insert-path, KV stores can reduce amplification by more than 10x for fast storage devices. Techniques, such as key-value separation and tiering compaction, reduce amplification by 10x and 5x, respectively. Additionally, VAT predicts that the advancements in device technology towards NVM, reduces the benefits from both using key-value separation and tiering.


Slides in PDF.

A Rack-scale Key-value Store for Flash Storage and RDMA by Michalis Vardoulakis

11.15 - 11.30

Technical presentation on the system I'm working on at CARV for my Master's thesis.
Abstract: Scale-out persistent key-value stores are at the heart of modern data processing systems. However, they exhibit high CPU and I/O overhead because they use TCP/IP for their communication across servers and target HDDs as their storage devices. With the advent of flash storage and fast networks in datacenters, there is a lot of room for improvements in terms of CPU efficiency. In this paper we design a scale-out version of Kreon, an efficient key-value store tailored for flash storage, that uses RDMA for its communication. RDMA’s lower protocol overhead and μs latency reduces the impact imposed by replication as well as the latency experienced by the client.


Slides in PDF.

Towards Network Aware Recommendations by Savvas Kastanakis

11.30 - 11.45

Internet mobile traffic grows exponentially and multimedia streaming is a dominant contributor to this increase. Multimedia delivery services, such as online video (e.g., YouTube,Netflix, Hulu), audio (e.g., Spotify, Deezer), live streaming for gaming (e.g., Twitch.tv), content over social media (e.g., Facebook), etc., use recommendation systems (RSs) to best satisfy the users and/or maximize their engagement in the service (retention rate). In this work, we propose that jointly designing communication networks and recommendation systems (RSs) enables content and network providers to keep up with the increasing data demand, while maintaining acceptable Quality of Experience (QoE). Our goal is to investigate the benefits of this concept, both from the content provider's side and the end user's, respectively. To this end, we conduct a simulation driven evaluation and a real users' evaluation, to quantify the advantages of this idea. We apply our approach to the YouTube service, and conduct measurements on YouTube video recommendations. Our analysis supports the potential that: Network Aware recommendations could be beneficial for both the users (better experience) and content providers (higher retention rates, offloaded core network). We envision that the experimental results presented in this work are a first step towards embracing recommendations into networks, to jointly improve the level of users' satisfaction, content providers and network systems.


Slides in PDF.

The CAPrice Initiative and the CAP-A project: A bottom-up solution to digital privacy by Ioannis Chrysakis

11.45 - 12.00

CAPrice is an initiative led by FORTH for developing socio-technical solutions that can make an impact on privacy protection in the digital world. CAP-A is a NGI-Trust funded project that suggests a bottom-up solution to digital privacy by implementing a suite of tools and engaging users to participate to crowd-sourcing activities that improve citizen awareness on privacy issues, and leverage this awareness to motivate the market to adopt more privacy-friendly practices.


Slides in PDF.