In modern day data processing architectures developing a pipelines is fairly simple thanks to feature packed Spark and its APIs. But the fun and the challenging part is to optimise the system to achieve the best performance and faster processing throughput. Optimising things in a distributed systems is a tricky part because its not always a function of linear improvements by adjusting the configurations. The most important thing before working on optimisation is to know what the system you are building is primarily intended to do. Optimisation ain’t no fairytale as it comes at a cost of settling for the…


Introduction

Elasticsearch is an Apache Lucene based distributed query processing system for building search and analytical systems. It’s a java based implementation that reads a JSON request and responds back with the JSON response payloads. It provides APIs for the invoking the query interface and is widely being used in industry for faster analytical search needs on huge volume of data. ELK stack provide a wide range of capabilities and exposes a complete ecosystem for storing, processing, querying and transforming huge hulk of transactional logs and provided brilliant monitoring and visualisation capabilities on the dataset. The data management part in…

Shishir Chandra

Distributed computing enthusiast, data engineer, system architect, cloud computing

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store