Flink: A Practical Take on Streaming with SQL

I'll be honest: it took me way too long to realize that Flink is just the German word for "fast".

And I'm a native speaker. Go figure.

The name fits though. Flink is fast. But what surprised me even more is how much you can do with just SQL. I'm talking about full streaming pipelines, with joins, filters, fan-outs, even windowed aggregations—without writing a single line of Java or Python.

Once you get past the initial friction, Flink turns out to be one of the most flexible and surprisingly lightweight tools in the real-time space.

Flink streaming pipeline architecture

Datastream, Table API, SQL

When working with Flink, you’ll come across three main approaches:

                     The Datastream gives you full control, perfect for complex event handling, custom serialization, or exotic use cases.
 The Table API feels like writing SQL but in code. It’s expressive and easier to follow than the raw stream logic.
 Then there’s pure SQL, which is what I’ve ended up using the most.

                

With SQL, you connect to Kafka, Kinesis, JDBC, or Iceberg. You define your sources and sinks. Then you write queries that continuously run on the incoming data.

This is not one-off querying. It’s declarative streaming pipelines, entirely in SQL.

The Flink SQL client

Once your job is deployed, you can enter an interactive shell that lets you explore and run SQL commands live:

./bin/sql-client.sh embedded

This shell becomes your control center. You can describe tables, inspect schemas, and run ad hoc queries to get a feel for the data flowing through your streams.

Especially when dealing with something like Kinesis or Kafka, this is a huge help. You can instantly validate what's coming in, try filters, and prototype logic before deploying it.

Flink SQL client interface

A real-world case: one collector instead of ten microservices

At one client, I stepped into a situation where they had multiple APIs, all doing the same thing in slightly different ways.

Different teams, different formats, different behaviors—but at the end of the day, they all just wanted to collect events and get them into the pipeline.

Instead of running and maintaining ten separate services, I built a single Go-based collector. It accepts everything, normalizes the data into a shared schema, batches the events, and pushes them to Kinesis.

The collector acts as a buffer and a contract enforcer. It also protects the system from misuse, rate spikes, or malformed requests.

The Flink job: as simple as it gets

Once the collector is in place, the Flink job becomes almost boring:

                     One consumer reads from Kinesis
 Three INSERT INTO statements push to different Iceberg tables
 A few WHERE clauses split the data based on event type

                

That's it. No custom operators. No orchestration. Just SQL.

This is what I mean when I say Flink lets you move logic to the edge of the system. The heavy lifting is already done upstream. Flink just connects the dots and handles the scale.

A note on Iceberg

Even though we're writing into Iceberg tables, I wouldn't always recommend using them as the first destination in your stream.

Flink's current Iceberg connector is still evolving. For example, features like hidden partitioning or some of Iceberg's more advanced tuning options are not yet fully supported.

In setups like this, I treat Iceberg more as the final consumer layer—where long-term querying, BI, or analytics happen.

If you're curious how to work around some of those limitations, especially when dealing with late data or optimized query patterns, I wrote a deeper breakdown of the topic here. That article walks through some of the practical design choices I made to bridge the gap between Flink and Iceberg's capabilities.

What about late data and time windows?

This is another area where Flink shines.

You can define time-based windows on event time instead of processing time. That way, even if data arrives late, it still ends up in the correct window.

This is incredibly helpful when tracking revenue, sessions, or engagement metrics—because let's face it, events don't always arrive in order. Especially not from mobile apps or cross-region systems.

Flink lets you set a watermark strategy and configure how much lateness you're willing to tolerate. The result is more accurate data, less duplication, and smoother aggregation logic.

Final thoughts

Flink might look complex at first, especially if you come from batch jobs or traditional SQL warehouses.

But once you get used to it, and especially once you realize how far you can go with SQL alone, it becomes a very practical and maintainable piece of your data stack.

In my case, I went from dealing with ten brittle services to a clean collector and one SQL-based fanout job. No more microservice sprawl. No more gluing together random scripts.

Just one fast engine. One source of truth. And one clear pipeline.

If you're exploring streaming or want help cleaning up your event pipeline, feel free to reach out. I'd be happy to share templates or help you get your own Flink setup running.