Data Lineage – bigdata-pilot.com

If Your AI Use Case Needs Perfect Data, It’s Not a Use Case—It’s a Wishlist

Let’s get something out of the way:Your data isn’t perfect. It never was. It never will be. It’s late. It’s missing. It’s mislabeled. The schema changed without warning. A key field is suddenly NULL for 3,000 rows. And the lookup table you depend on? It got overwritten at 2 a.m. by someone testing a new […]

If Your AI Use Case Needs Perfect Data, It’s Not a Use Case—It’s a Wishlist Weiterlesen »

Kafka Isn’t Just a Queue. And Flink Isn’t Just a Buzzword.

Architecture, Data Steraming, Stream Pipelines, Uncategorized / Dominique Ronde

Why real-time systems aren’t luxury infrastructure—they’re how smart businesses stay ahead. Let’s get one thing out of the way:Batch is fine—for laundry. Not for decisions. Most companies still move data the same way they moved it in 2005: extract, load, wait, analyze, repeat. It’s comfortable. It’s familiar. But it’s also a few hours—or days—behind what’s

Kafka Isn’t Just a Queue. And Flink Isn’t Just a Buzzword. Weiterlesen »

Implementing Real-Time Data Products with Apache Kafka and Apache Flink (Part 3)

Architecture / Dominique Ronde

As we have explored in the previous parts of this series, high-quality and real-time data are essential for AI and ML applications. Now, let’s take a deeper look into how to implement real-time data products effectively using Apache Kafka and Apache Flink. This part focuses on two crucial features of Flink that enable reliable and

Implementing Real-Time Data Products with Apache Kafka and Apache Flink (Part 3) Weiterlesen »

Challenges in Building and Maintaining Data Products for AI and ML (Part 2)

Processing / Dominique Ronde

Building and maintaining data products for AI and ML is not just about collecting data—it is about ensuring data quality, scalability, and accessibility. Without addressing these challenges, AI models will produce unreliable results, and organizations will struggle to use data effectively. Two of the biggest challenges in this area are data quality and scalability. Ensuring

Challenges in Building and Maintaining Data Products for AI and ML (Part 2) Weiterlesen »

What Are Data Products and Why Do They Matter for AI and ML? (Part 1)

Architecture / Dominique Ronde

I still remember the days in the early 2010s, when the term „big data“ was widely discussed in the tech industry. Companies were encouraged to collect as much data as possible, seeing it as a key resource (or the new oil as we called it those days) for the digital economy. However, focusing only on

What Are Data Products and Why Do They Matter for AI and ML? (Part 1) Weiterlesen »

Trade Monitoring and Pattern Matching with Flink and Kafka

Kommentar verfassen / Processing, Stream Pipelines / Dominique Ronde

Financial markets generate one of the densest streams of real-time data we can observe today. Price ticks, order submissions, cancellations, executions, and settlement instructions all occur at millisecond scale. Within that torrent of activity, regulators and trading firms need to detect suspicious behavior: wash trades, spoofing, layering, or coordinated account activity. The traditional approach—batch analysis

Trade Monitoring and Pattern Matching with Flink and Kafka Weiterlesen »