Serverless Data Pipeline Blog Header

What is serverless?

Serverless architectures refer to the applications that significantly depend on third-party services (knows as Backend as a Service or “BaaS”) or on custom code that runs on ephemeral containers (Function as a Service or “FaaS”)

Why serverless?

Serverless ETL (extract-transform-load) is becoming the future for teams looking to stay focused building their core features and machine learning (ML) engine rather than maintaining a large infrastructure to power data pipeline transformation. At Bsquare, we deal with two types of data: (1) big data for building, training and testing the machine learning models and (2) real time streaming data from IoT devices. The streaming data needs to be visualized in near real time for the customers to get insight about their data. Our product DataV capable of managing a large number of IoT devices and processing telemetry data collected from these devices.

In the past, we managed and processed devices by having huge servers that can transfer micro batches of telemetry using Apache Spark Streaming. For a few use cases, we used Apache Storm to elevate the capability of bolts and spouts. Though Apache is a very powerful solution for big data and telemetry that are received in linear time, we had to deal with auto-scaling and maintaining massive servers in addition to paying for the servers even when no data was received. Serverless architecture solves many of these problems.

Zero administration

Zero Admin Icon

Deploy your codebase in any language without provisioning anything or worry about managing anything. There is no concept of a fleet, an instance, or even an operating system. No more bothering the DevOps department.

Auto-scaling

Auto-Scaling IconAutomate your resources to manage the scaling challenges without any effort. No need to fire alerts or write scripts to scale up and down. Handle quick bursts of traffic and spend your weekend happily with peace of mind.

Pay-per-use

Pay-Per-Use IconFaaS compute and managed service charges are based on usage rather than pre-provisioned capacity. You have complete resource utilization without paying a cent for idle time. One can save approximately 90% of the cost over a cloud VM, and the satisfaction of knowing that you never pay for resources you don’t use.

Increased velocity

Idea Icon

Give more power to your glowing idea bulb by shortening the loop between having an idea and deploying it to production. Because you do less provision up front and less manage after deployment, smaller teams can ship more features. It’s easier than ever to make your idea live.

How serverless?

AWS Lambda is one of the best options in FaaS solution. It is an event-driven, serverless computing platform service that runs code in response to events and automatically manages the computing resources required by that code. The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications that are responsive to events and new information. AWS starts a Lambda instance within milliseconds of an event. AWS targets starting a Lambda instance within milliseconds of an event in case of Node.js, Python, Go and running JVM will take seconds to spin up.

At Bsquare, we receive massive data from thousands of IoT devices. Some of them are MQTT messages received in varying intervals. These data need to be transformed into human readable tables and visualized for customers in form of graphs and analytical listings with little latency. We route these telemetry data into a Kinesis data stream, and as soon as the message arrives in the shards of Kinesis, a lambda function is triggered. The triggered lambda function will spin up, do the transformation, and push the data into another kinesis for the next level of analysis or machine learning services. The Java code that lives inside lambda can do 40 types of transformations that can be called many times and the scaling is taken care by AWS.

Final thoughts

Without reinventing the wheel, we can easily solve the pipeline transformation problem for our IoT Solutions, Data Science Consulting and DataV customers. By shifting from humongous servers to highly scalable AWS Lambda we save our customers thousands of dollars and time. Shifting to cloud-based services doesn’t eliminate engineering efforts, but it shifts effort to work on better things. With more time available, we can make your data speak with more sense. Find out how Bsquare Team can help harness the full potential of your data to optimize your ROI.

Loading