A Song Of Bugs And Patches

Feb 28, 2024Edited

> Events are the unlock. When you start with events, metrics are easy. When you start with metrics, events are impossible. And logs ... blech. What a mess. Logs are the junk drawer of observability.

Love this.

However reading HN discussion (https://news.ycombinator.com/item?id=39529775) it looks like some folks are offended by calling logs names. So I will avoid this in future. Logs are observability history :)

Expand full comment

Alex Pliutau

Sep 17

Great article Ivan. Should we expect more ? ;)

Expand full comment

Sep 19

Thank you! Yes, I do have plans writing more, need to battle procrastination :D

Expand full comment

Jean-Mark Wright

Aug 8

Thanks for writing this up! I'd love to hear more about Scuba and how the paradise it was!

Expand full comment

LittleStone

May 31, 2024

Thank you so much for writing this. That`s great!

Expand full comment

Ezz Abuzaid

Mar 10, 2024

Love this. It is very difficult to make sense of the open telemetry concepts.

Expand full comment

Olivier Lefevre

Feb 29, 2024

I would say that Wide Events sound a lot like Structured Logging.

Expand full comment

Feb 29, 2024

Yes. I like "wide events" term more because it has this "wide" component which actually highlights 2 things:

- the desire to attach to these events as much information as possible

- the design of the solution that would handle those events. While structured logs one can store in a row-oriented database with a few indexes, it's not wise to store wide events this way - a columnar storage is a must.

When reading some replies on HN discussion about structured logs vs wide events I confirmed this feeling that structured logs are not necessary associated with "a lot" of information. But it's just a personal preference in the end.

Expand full comment

Andrew Kostousov

Feb 15, 2024Edited

Great article, thanks!

Here is a link to the original Scuba paper I guess: https://research.facebook.com/publications/scuba-diving-into-data-at-facebook/

Maybe you know other public sources with its implementation details?

Expand full comment

Reply (2)

Feb 15, 2024

Yep, that's the one. There is nothing more in public I think. It's weird how it's so not promoted, given that the paper is published in 2013! It's cool that the paper shows UI screenshots as well and doesn't focus on storage only. Storage-wise I think systems like ClickHouse can do the job good enough.

UI is a different story though. It's simple yet powerful, but in the open source world it's hard to find anything that comes close.

Expand full comment

Roman Leventov

Did you check Rill? https://github.com/rilldata/rill

Expand full comment

Literally just saw a tweet about their integration with ClickHouse, and it made me wonder :) Do you recommend?

Expand full comment

Roman Leventov

I'm biased because I work for them :) You can take a quick look at a GIF demo here: https://docs.rilldata.com/notes/0.34 which visualises pretty much the experience that you've described in the post (comparing dimensions), I believe.

Expand full comment

Looks nice! Would be really awesome if there was a playground, like the one Honeycomb has: https://www.honeycomb.io/sandbox Really useful to get a quick feeling

Expand full comment

Mohammad Arshad

Feb 16, 2024

May be this video can be helpful on the Scuba: https://youtu.be/5XvzuFbKuOc?si=R1WtICvrxMF81wFR

Expand full comment

Baraded

Jan 16

Hi Ivan, as you said, the problem is easily solved with Metrics. Just move "timestamp" out of the labels, and put "samples count" into the metric value, so query would be `sum by(OsVersion) (AdImpressinos{IsTest=false)}` -- even more readable than SQL. I would say whenever something is presented as a timeseries graph is a metric, and is convenient to be used as metric.

Note that not every metric storage restricts you to "take a snapshot of the system once in a while" as Prometheus does -- for example VictoriaMetrics supports push-based ingestion, so you can push the wide events whenever you like, even historical, no need to depend on periodic scraping.

As you said, metrics storages often talk about cardinality problem, because they are usually optimized for very fast filtering by labels (e.g. OsVersion in your example), so typically index them in RAM/BloomFilters for fast access. There's easy solution of course -- not index them! But then you need to scan the data in order to filter out the row you need, and that is slower, no matter how we call it (logs/metrics/events).

But I doubt you're going to hit the cardinality limits unless you store particular userId/deviceId in the wide event. E.g. VictoriaMetrics is able to hold billions of metrics (aka "active timeseries") in one VictoriaMetrics cluster.

But if you really want to store userId/deviceId in the wide event, then yes, I agree, it would push the cardinality to trillions, and metric storage indexes would fail (or be too slow). It would probably have some privacy implications as well (filter one particular user/device).

Expand full comment

Jan 30

> But I doubt you're going to hit the cardinality limits unless you store particular userId/deviceId in the wide event. E.g. VictoriaMetrics is able to hold billions of metrics (aka "active timeseries") in one VictoriaMetrics cluster.

I'm not sure it's true. Assume you have 20 different attributes (labels). Even without explicit userId/deviceId the combination of values in these attributes can result in pretty high cardinality.

That said, VictoriaMetrics is an impressive piece of software, but it would still suffer from high-cardinality problem correct?

I think any system trying to pre-aggregate something based on combination of labels would suffer from it, so the only approach that would not is scanning raw events, as you mentioned.

It's indeed slower, but one important thing about observability use case is that unless queries are too slow (tens of seconds) it still yields a decent user experience. It doesn't matter much if the chart gets refreshed in a second or in 100ms. This observation opens up some approaches that are both cheap and fast enough - "serverless disaggregated storage architecture", as Axiom folks calling it (https://axiom.co/blog/a-database-hacker-story).

Expand full comment

Baraded

Feb 5

> Assume you have 20 different attributes ... values in these attributes can result in pretty high cardinality

Theoretically yes, but I've never seen values to be independent. Even when people put 50+ labels, the cardinality is typically still within millions. And some companies do have billions of active time series in VictoriaMetrics, but it's rare.

I usually look at the question from its use-case. If the main use-case is to put the data on a chart (aka time-series) then its a metric. Optionally with alerts on _numerical_ condition.

We can start from different types of input, but if in the end it's a time-series, then it's a metric.

And if there's the high-cardinality problem -- it means that time-series are very short, like 1 point per chart. There's no use in putting that points on a chart, so there's no chart use-case, so it's not a metric. It can be an event for OLAP analysis/backups/manual reading etc, just not a metric. But since I saw _charts_ in your post I thought of metrics.

> it would still suffer from high-cardinality problem correct?

Yes, it indexes the time-series, so will suffer from that.

Just to clarify, victoriametrics does not pre-aggregate anything by default (unless you turn on streaming aggregation or historical downsampling). You probably meant indexing, and yes, you're right, it's always slower with indexing.

> It doesn't matter much if the chart gets refreshed in a second or in 100ms.

Well, users pretty often put 100s of charts on one dashboard with auto-reload. E.g. some put it on a big office screen with auto-refresh.

Some users put 10,000+ alerts. And queries for both alerts and dashboards can be arbitrarily complex, like pulling historical data or forecasting.

So I would say it does matter if the data is indexed or not, may result in millions of dollars savings for infrastructure depending on data volumes.

Expand full comment

Torayuri

Dec 26

When I actually start using OpenTelemetry I find a surprising benefit. For example, if I received an invalid HTTP payload and would like to find which client is sending us bad data, I used to have to pass "informational parameters" deep into the call site where the error is detected, so we can log all the details.

With Tracing, I can actually add the error as a tag/attribute of a span without knowing any "context", because my OTel collector will eventually associate all the spans up to become a trace, and I can find correlation between the error and the client ID.

However, same with the author's argument - that OpenTelemetry can keep your functions parameters succinct is not being advertised at all with how Trace/Span is explained.

Expand full comment

Jean-Mark Wright

Aug 8

With wide events, how does one decide what should be used as a span vs not? If I think of wide events as a sea of wide, structured logs, coming from arbitrary sources, how do we decide which of those are "special" and can be rendered as traces?

Also, I'm curious about whether Scuba had a Traces UI view.

Expand full comment

Aug 9

I mean if we implement observability system, we can decide that we have some special field, let's call it eventType. And so eventType = 'span' would mean it's a span. An additional field traceId would help us rendering tracing view.

I've never used Trace UI at Meta tbh but recently learned from this video (https://www.youtube.com/watch?v=5XvzuFbKuOc) that there is some.

Expand full comment

Pavel Reich

Mar 4, 2024

scuba actually reminds me of graylog gelf logs: you can have a message with key/value pairs and then aggregate/filter/group them using graylog UI.

Expand full comment

Mar 4, 2024

Never used it, but from a quick look sounds like a similar thing yes.

However "native sampling" is a really important Scuba feature - not sure if graylog UI takes into account.

Expand full comment

Thomas Brewer

Mar 3, 2024

A colleague of mine recently sent me a link to your post. I couldn't agree with you more which is one reason I have been working on https://github.com/eventrelay/eventrelay.

Expand full comment

Mar 3, 2024

Thank you for sharing!

Expand full comment

Przemek Maciolek

Feb 28, 2024Edited

Horses for courses. I would argue this post covers only one use case for observability. There are more. E.g. when you want to track a specific distributed transaction, you must have a full trace available and be able to look at it as a tree. It's very powerful. But you cannot just sample a random events in a trace though to make it work. Additionally, it poses very different requirements on the collection (the context needs to be passed). Also, all queries that look at traces (rather than individual spans/events) must group the spans into traces

The signals are not just about different semantics associated to them (but eventually meaning the same thing). It's also about relationship of records

Expand full comment

Nobody is denying the importance of smart sampling. When we need that certain events always appear together, we should sample based on some common field in the event (traceId).

The Ad Impression example from the post has the same challenge. Once Ad Impression is served, there can be some events associated with this impression: clicks, shares, whatever. And it would be good to be able to see them all together. It's absolutely the same problem of the need for a smart sampling.

And fwiw I don't say that tracing is a bad concept. I'm arguing that it gets presented / defined like a completely different thing, while there is more similarities that differences between wide events / structured logs and traces / spans.

Expand full comment

Flor Bigzby

Did Meta ever invest in a tool that could automatically tell you which dimension(s) had abnormality, or how much they lent to a regression? Eg, new OS versions, app versions, countries, carriers etc commonly point to the issue. Seems like doing this automatically, maybe on demand, while computationally expensive, would just give you the answer and prevent manual hunting and pecking.

Expand full comment

At Meta while I was there there wasn't such a thing, maybe it has developed it after that. In the tools available outside of Meta I know about BubbleUp feature in Honeycomb which does exactly this: https://docs.honeycomb.io/working-with-your-data/bubbleup/

Indeed AFAIK this feature is uber-popular among Honeycomb customers

Expand full comment

zeno

How do you deal with schemas and versioning of wide events?

Expand full comment

- https://vimeo.com/331143124

You can always have a schema describing the type of each field (may be useful for efficient storage). It just needs to be backward compatible version over version (e.g. only add fields normally with some special process for deprecation).

Some implementations infer schemas from the events itself, see

- https://axiom.co/blog/a-database-hacker-story

Expand full comment

Vivek Chandela

Feb 16, 2024

Nicely written !!

Adding more fields to log.Errorf() is a pain. Although Kibana does give nice visualizations on logs.

We do logging for JSON/proto events in BQ but not sure about the UI.

Expand full comment