The Future of kdb+?

It’s been 2 years since I worked full time in kdb+ but people seem to always want to talk to me about kdb+ and where I think it’s going, so to save rehashing the same debates I’m going to put it here and refer to it in future. Please leave a comment if you want and I will reply.

Let’s first look at the use cases for kdb+, consider the alternatives, then which I think will win for each use-case and why.

Use Cases

A. Historical market data storage and analysis. – e.g. MS Horizon, Citi CloudKDB, UBS Krypton (3 I worked on).
B. Local quant analysis – e.g. Liquidity analysis, PnL analysis, profitability per client.
C. Real-time Streaming Calcuation Engines – e.g. Streaming VWAP, Streaming TCA…
D. Distributed Computing – e.g. Margin calculations for stock portfolios or risk analysis. Spread data out, perform costly calcs, recombine.

Alternatives

Historical Market Data – kdb+ Alternatives

A large number of users want to query big data to get minute bars, perform asof joins or more advanced time-series analysis.

  • New Database Technologies – Clickhouse, QuestDB.
  • Cloud Vendors – Bigquery / redshift
  • Market Data as a Service

Let me tell you three secrets, 1. Most users don’t need the “speed” of kdb+. 2. Most internal bank platforms don’t fully unleash the speed of kdb+. 3. The competitors are now fast enough. I mean clickbench are totally transparent on benchmarking..

Likely Outcome: – Kdb+ can hold their existing clients but haven’t and won’t get the 2nd tier firms as they either want cloud native or something else. The previous major customers for this had to invest heavily to build their own platform. As far as I’m hearing the kdb cloud platform still needs work.

Local Quant Analysis – Alternatives

  • Python – with DuckDB
  • Python – with Polars
  • Python – with PyKX
  • Python – with dataframe/modin/….

Now I’m exaggerating slightly but the local quant analysis game is over and everyone has realised Python has won. The only question is who will provide the speedy add-on. In one corner we have widely popular free community tools that know how to generate interest at huge scale, are fast and well funded. In the other we have a niche company that never spread outside finance, wants to charge $300K to get started and has an exotic syntax.

Likely Outcome: DuckDB or Polars. Why? It’s free. People at Uni will start with it and not change. Any sensible quant currently in a firm will want to use a free tool so that they are guaranteed to be able to use similar analytics at their next firm. WIthout that ability they can only go places that have kdb+ else face losing a large percentage of their skillset.

Real-time Streaming / Distributed Computing

These were always the less popular cases for kdb+ and never the ones that “won” the contract. The ironic thing is, combining streaming with historical data in one model is kdbs largest strength. However the few times I’ve seen it done, it’s either taken someone very experienced and skillful or it has become a mess. These messes have been so bad it’s put other parts of the firm off adopting kdb+ for other use cases.

Likely Outcome: Unsure which will win but not kdb+. Kafka has won mindshare and is deployed at scale but flink/risingwave etc. are upcoming stars.

Summary

Kdb+ is an absolutely amazing technology but it’s about the same amazing today as it was 15 years ago when I started. In that time the world has moved on. The best open source companies have stolen the best kdb+ ideas:

  • Parquet/Iceberg is basically kdb+ on disk format for optimized column storage.
  • Apache Arrow – in-memory format is kdb+ in memory column format.
  • Even Kafka log/replay/ksql concept could be viewed as similar to a tplog viewed from a certain angle.
  • QuestDB / DuckDB / Clickhouse all have asof joins

Not only have the competitors learnt and taken the best parts of kdb+ but they have standardised on them. e.g. Snowflake, Dremio, Confluent, Databricks are all going to support Apache Iceberg/parquet. QuestDB / DuckDB / Python are all going to natively support parquet. This means in comparisons it’s no longer KX against one competitor, it’s KX against many competitors at once. If your data is parquet, you can run any of them against your data.

As many at KX would agree I’ve talked to them for years on issues around this and to be fair they have changed but they are not changing quick enough.
They need to do four things:

  1. Get a free version out there that can be used for many things and have an easy reasonable license for customers with less money to use.
  2. Focus on making the core product great. – For years we had Delta this and now it’s kdb.ai. In the meantime mongodb/influxdb won huge contracts with a good database alone.
  3. Reduce the steep learning curve. Make kdb+ easier to learn by even changing the language and technology if need be.
  4. You must become more popular else it’s a slow death

This is focussing on the core tech product.
Looking more widely at their financials and other huge costs/initiatives such as AI and massive marketing spending, wider changes at the firm should also be considered.

2024-08-03: This post got 10K+ views on the front page of Hacker News to see the followup discussion go here.

Author: Ryan Hamilton

 

5 Responses to “The Future of kdb+?”


  1. NP

    Interesting article Ryan. I agree building streaming analytics combining real time and historical data is a very powerful use case of kdb+. There are good principles for building these systems. If you follow these principles, you’ll have super elegant and efficient applications (like real time TCA). If you don’t, you’ll have an awful mess. So KX should be teaching everyone these principles for free.

  2. Ryan Hamilton

    @NP – I wish they had recorded your lessons way back then. Thanks for teaching me anyway.

  3. Neil Kanungo

    Full disclosure: I work for KX. In fact, my job is to connect with developers to learn about their experience with KX, so I can help to make it better. I am always open to feedback about what we can improve, and while no product is perfect, I think there’s a lot in this blog that’s worth addressing.

    For benchmarks, I would check out STAC M3… kdb+ holds 17 world records there and that is something we’re proud of. The Clickbench benchmarks cited in the article, however, aren’t designed for time series databases and kdb+ isn’t included (probably for that reason). I don’t think it’s relevant here. We also think that speed – and performance in general – is still important to our customers, as they continue to affirm.

    As far as accessibility is concerned, I’d like to address in multiple parts:

    1) We are invested in creating cloud-native features that are more appealing for smaller firms

    2) q is the best language out there (in our opinion) but we also offer a path for Python (including Polars) and SQL developers, which is essential to expanding the kdb+ userbase to the maximum extent. Our entire Fusion interfaces was built to enable more interoperability. We also don’t mandate language lock-in… there is nothing preventing other languages from being used with kdb+.

    3) Pricing—this comes up a lot. We already offer a free edition of kdb+ for non-commercial use that is very popular. We recognize there’s more we can do in this area (an opinion expressed by KX leadership too) so new pricing models are actively being evaluated.

    4) Our latest release of kdb+ 4.1 included a renewed focus on ease of installation and use, and a new documentation hub is being launched this year to further enhance the developer experience.

    5) Our Community is growing rapidly – with now over 6000 members and 10 courses available in KX Academy. We have more and more developers networking to help others learn kdb+ every day with a month-over-month net new increase of members for the past 30 months. We’ve recently launched a Slack channel and developer advocacy program too.

    There’s a lot of criticism about kdb+ (and KX) in this article, but a lot of the things devs love the most about kdb+ have been left out. This includes efficiency/compactness, expressiveness of q, vertical integration, and speedy development workflow. Sure, if you want to combine 3-5 tools to do what kdb+ does you can go that route, but we feel we offer a vastly superior experience with performance at scale. A quality that extends to ALL our products, including Delta & KDB.AI, since they are all built on kdb+.

    Note: I reached out to the author to discuss, but he declined to talk to us. Since he mentions that comments will receive replies, I thought this would be a better way to get a more open conversation going.

  4. Andrei

    KDB is an absolute nightmare, a barbaric piece of tech that should have never existed.

    Here is a link on how you do queries: https://code.kx.com/q/basics/funsql/

    TL;DR;

    This is a select:
    q)t:([] c1:`a`b`a`c`a`b`c; c2:10*1+til 7; c3:1.1*1+til 7)

    And this is another select:
    q)?[t; ((>;`c2;35);(in;`c1;enlist[`b`c])); 0b; ()]

    Mind that these are the basic queries :)))))

    The future of kdb+ is in the toilet.

  5. Ryan Hamilton

    @Neil – Thanks for replying and it’s good to hear that KX are listening.

    >>Note: I reached out to the author to discuss, but he declined to talk to us.
    I would push back a little on this point. I declined the specific meeting as I was already scheduled to be in your Belfast office talking to Victoria on the 9th August, the day you posted here. I’ve made 3 such trips + other informal chats this year and no followup action or changes were ever communicated to me.

    I would also suggest the community as a whole including myself have had a laid back attitude of superiority and eliteness. As an example, many (again including myself) have previously took geat pleasure in code golfing short incomprehensible code. Your reply belies some of that approach, to take two quotes:
    “Sure, if you want to combine 3-5 tools to do what kdb+ does you can go that route, but we feel we offer a vastly superior experience”
    “q is the best language out there”

    To quote two others:
    Andy Grove – CEO Intel – “Only the Paranoid Survive”
    Steve Jobs on Picasso – “Good artist copy great artists steal”

    We need to look outside our myopic bubble. Companies need to be paranoid that their competition has something great that will wipe them out. Most firms already have those 2-5 other tools and in some cases they have great ideas. We need to embrace and reuse their ideas.

    Let me present a challenge:
    1. Name some areas kdb+ is not good at?
    2. Highlight a competitor that does some piece of core tech better than KX that we should take inspiration from?

    To be fair, I will do the same for Pulse:
    1. Pulse is not as friendly as tableau for beginner users and may never be. My belief is that real-time visualation with large data requires skilled users to build.
    2. Grafana – is very user friendly and their move to arrow data format for websocket communication is good future proofing. I wish I had chose it for Pulse.