Postgres, ClickHouse and NATS for everything

pdimitar · on Dec 7, 2024

You use NATS as opposed to Kafka, I presume?

Also, what's Clickhouse for? Logs, observability data?

osigurdson · on Dec 7, 2024

NATS is great because it has pub/sub, streaming (like Kafka), KVS (like Redis). It is possible to do some of these things in Postgres but really, why?

We use ClickHouse for time series data. Postgres was ok up to low billions of points. Despite trying to use timescale for this purpose it did not fit our use case.

pdimitar · on Dec 7, 2024

Very valuable, thank you. As time series data (and especially observability data) tends to very quickly explode in volume, I believe planning for tens of billions, if not trillions, of records is worth planning for from the start and it is not over-engineering.

If you don't mind one final question: can you ACK a message in NATS without it being bound to offset that makes it impossible to _not_ ACK a message without ruining the ACKs of the previous messages?

To clarify: I often found myself in situations when I was fetching batches of stuff from Kafka, say, 50 at a time, and then hand them off to 50 parallel agents to process. However, f.ex. messages 17, 31 and 47 failed processing and I could not not ACK them as that would not allow us to ACK those that succeeded before. So I ended up pushing them to another Kafka queue / topic that specifically deals with retries. That's IMO a hack, as most apps out there surely don't need the monstrous speed that Kafka can provide. I am OK with something (not much) slower where I have the freedom to ACK or not-ACK any particular event/message regardless of its position.

Does NATS allow for it?

osigurdson · on Dec 7, 2024

Perhaps I misunderstand, but if you have 50 parallel agents, why not just have each pull messages, process them and ACK when complete? The part I don't understand is the pre-fetch. Note that NATS is much more flexible than Kafka however, so more likely to fit more uses cases (even just for streaming).

For the first question, I'd definitely recommend using ClickHouse for 10B - 1T points.

pdimitar · on Dec 7, 2024

I mean the following (let me simplify it). You pull stuff from Kafka in batches of 4 and immediately send each message to a parallel worker, first parsing them in-thread and sequentially and these are the results of the parsing:

1. ok

2. error

3. ok

4. ok

I cannot not-ACK message#2 because that means message#1 is not ACK-ed as well.

Does NATS solve this? F.ex. can I get a reference to each message in my parallel workers for them to also say "I am not ACK-ing this because I failed processing it, let the next batch include it again"?

rcombatwombat · on Dec 12, 2024

Yes, one of the many differences (advantages) of NATS JetStream over Kafka: with NATS you can ack explicitly each message individually, and even better if you set your stream to be in 'work-queue' mode it will also automatically (and atomically) delete the ack'd message from the stream (i.e. like a 'proper' queue) another difference with Kafka where you can't delete individual messages in the middle of a stream (only trim the tail end).

You can also 'negative ack' messages, specify a back-off period before the message is re-delivered (because NATS automatically re-delivers un-acked (or nacked) messages) when you can't temporarily process it, or 'term' a message (don't try to re-deliver it, e.g. because the payload is bad), or even 'ask for more time before needing to ack the message (if you are temporarily too slow at processing the message).

pdimitar · on Dec 12, 2024

Ohhh, this is an awesomely informative and concrete message! Extremely useful, thank you!

I like everything about this: the ability to NACK individual messages, the specifying of a backoff period, _and_ to just discard a message f.ex. if you really cannot do anything about it. Super nice. I am grateful.

osigurdson · on Dec 7, 2024

Yes. This will work fine. Each message is ACK-ed.

pdimitar · on Dec 7, 2024

Thanks. I'm asking because in Kafka if you ACK a message at offset 15 then all messages from 1 to 14 are ACK-ed as well. You can't just say "ACK all from 1 to 15 except 9".

But if NATS supports that use case then great, I'll migrate to it for that reason alone.