Thank you so much for writing this. You are so spot on!
- Logs capture information with the intent to diagnose issues. Logs should not capture success cases. They should focus on failure cases.
- Metrics capture aggregates like counters. Metrics should be derived from Events.
- Events are data points in a series + metadata. The metadata enables slicing and dicing data points in innumerable ways.
Events are the unlock. When you start with events, metrics are easy. When you start with metrics, events are impossible. And logs ... blech. What a mess. Logs are the junk drawer of observability.
> Events are the unlock. When you start with events, metrics are easy. When you start with metrics, events are impossible. And logs ... blech. What a mess. Logs are the junk drawer of observability.
Love this.
However reading HN discussion ( it looks like some folks are offended by calling logs names. So I will avoid this in future. Logs are observability history :)
Yes. I like "wide events" term more because it has this "wide" component which actually highlights 2 things:
- the desire to attach to these events as much information as possible
- the design of the solution that would handle those events. While structured logs one can store in a row-oriented database with a few indexes, it's not wise to store wide events this way - a columnar storage is a must.
When reading some replies on HN discussion about structured logs vs wide events I confirmed this feeling that structured logs are not necessary associated with "a lot" of information. But it's just a personal preference in the end.
Yep, that's the one. There is nothing more in public I think. It's weird how it's so not promoted, given that the paper is published in 2013! It's cool that the paper shows UI screenshots as well and doesn't focus on storage only. Storage-wise I think systems like ClickHouse can do the job good enough.
UI is a different story though. It's simple yet powerful, but in the open source world it's hard to find anything that comes close.
I'm biased because I work for them :) You can take a quick look at a GIF demo here: which visualises pretty much the experience that you've described in the post (comparing dimensions), I believe.
Looks nice! Would be really awesome if there was a playground, like the one Honeycomb has: Really useful to get a quick feeling
Hi Ivan, as you said, the problem is easily solved with Metrics. Just move "timestamp" out of the labels, and put "samples count" into the metric value, so query would be `sum by(OsVersion) (AdImpressinos{IsTest=false)}` -- even more readable than SQL. I would say whenever something is presented as a timeseries graph is a metric, and is convenient to be used as metric.
Note that not every metric storage restricts you to "take a snapshot of the system once in a while" as Prometheus does -- for example VictoriaMetrics supports push-based ingestion, so you can push the wide events whenever you like, even historical, no need to depend on periodic scraping.
As you said, metrics storages often talk about cardinality problem, because they are usually optimized for very fast filtering by labels (e.g. OsVersion in your example), so typically index them in RAM/BloomFilters for fast access. There's easy solution of course -- not index them! But then you need to scan the data in order to filter out the row you need, and that is slower, no matter how we call it (logs/metrics/events).
But I doubt you're going to hit the cardinality limits unless you store particular userId/deviceId in the wide event. E.g. VictoriaMetrics is able to hold billions of metrics (aka "active timeseries") in one VictoriaMetrics cluster.
But if you really want to store userId/deviceId in the wide event, then yes, I agree, it would push the cardinality to trillions, and metric storage indexes would fail (or be too slow). It would probably have some privacy implications as well (filter one particular user/device).
> But I doubt you're going to hit the cardinality limits unless you store particular userId/deviceId in the wide event. E.g. VictoriaMetrics is able to hold billions of metrics (aka "active timeseries") in one VictoriaMetrics cluster.
I'm not sure it's true. Assume you have 20 different attributes (labels). Even without explicit userId/deviceId the combination of values in these attributes can result in pretty high cardinality.
That said, VictoriaMetrics is an impressive piece of software, but it would still suffer from high-cardinality problem correct?
I think any system trying to pre-aggregate something based on combination of labels would suffer from it, so the only approach that would not is scanning raw events, as you mentioned.
It's indeed slower, but one important thing about observability use case is that unless queries are too slow (tens of seconds) it still yields a decent user experience. It doesn't matter much if the chart gets refreshed in a second or in 100ms. This observation opens up some approaches that are both cheap and fast enough - "serverless disaggregated storage architecture", as Axiom folks calling it (
> Assume you have 20 different attributes ... values in these attributes can result in pretty high cardinality
Theoretically yes, but I've never seen values to be independent. Even when people put 50+ labels, the cardinality is typically still within millions. And some companies do have billions of active time series in VictoriaMetrics, but it's rare.
I usually look at the question from its use-case. If the main use-case is to put the data on a chart (aka time-series) then its a metric. Optionally with alerts on _numerical_ condition.
We can start from different types of input, but if in the end it's a time-series, then it's a metric.
And if there's the high-cardinality problem -- it means that time-series are very short, like 1 point per chart. There's no use in putting that points on a chart, so there's no chart use-case, so it's not a metric. It can be an event for OLAP analysis/backups/manual reading etc, just not a metric. But since I saw _charts_ in your post I thought of metrics.
> it would still suffer from high-cardinality problem correct?
Yes, it indexes the time-series, so will suffer from that.
Just to clarify, victoriametrics does not pre-aggregate anything by default (unless you turn on streaming aggregation or historical downsampling). You probably meant indexing, and yes, you're right, it's always slower with indexing.
> It doesn't matter much if the chart gets refreshed in a second or in 100ms.
Well, users pretty often put 100s of charts on one dashboard with auto-reload. E.g. some put it on a big office screen with auto-refresh.
Some users put 10,000+ alerts. And queries for both alerts and dashboards can be arbitrarily complex, like pulling historical data or forecasting.
So I would say it does matter if the data is indexed or not, may result in millions of dollars savings for infrastructure depending on data volumes.
When I actually start using OpenTelemetry I find a surprising benefit. For example, if I received an invalid HTTP payload and would like to find which client is sending us bad data, I used to have to pass "informational parameters" deep into the call site where the error is detected, so we can log all the details.
With Tracing, I can actually add the error as a tag/attribute of a span without knowing any "context", because my OTel collector will eventually associate all the spans up to become a trace, and I can find correlation between the error and the client ID.
However, same with the author's argument - that OpenTelemetry can keep your functions parameters succinct is not being advertised at all with how Trace/Span is explained.
With wide events, how does one decide what should be used as a span vs not? If I think of wide events as a sea of wide, structured logs, coming from arbitrary sources, how do we decide which of those are "special" and can be rendered as traces?
Also, I'm curious about whether Scuba had a Traces UI view.
I mean if we implement observability system, we can decide that we have some special field, let's call it eventType. And so eventType = 'span' would mean it's a span. An additional field traceId would help us rendering tracing view.
A colleague of mine recently sent me a link to your post. I couldn't agree with you more which is one reason I have been working on
Horses for courses. I would argue this post covers only one use case for observability. There are more. E.g. when you want to track a specific distributed transaction, you must have a full trace available and be able to look at it as a tree. It's very powerful. But you cannot just sample a random events in a trace though to make it work. Additionally, it poses very different requirements on the collection (the context needs to be passed). Also, all queries that look at traces (rather than individual spans/events) must group the spans into traces
The signals are not just about different semantics associated to them (but eventually meaning the same thing). It's also about relationship of records
Nobody is denying the importance of smart sampling. When we need that certain events always appear together, we should sample based on some common field in the event (traceId).
The Ad Impression example from the post has the same challenge. Once Ad Impression is served, there can be some events associated with this impression: clicks, shares, whatever. And it would be good to be able to see them all together. It's absolutely the same problem of the need for a smart sampling.
And fwiw I don't say that tracing is a bad concept. I'm arguing that it gets presented / defined like a completely different thing, while there is more similarities that differences between wide events / structured logs and traces / spans.
Did Meta ever invest in a tool that could automatically tell you which dimension(s) had abnormality, or how much they lent to a regression? Eg, new OS versions, app versions, countries, carriers etc commonly point to the issue. Seems like doing this automatically, maybe on demand, while computationally expensive, would just give you the answer and prevent manual hunting and pecking.
At Meta while I was there there wasn't such a thing, maybe it has developed it after that. In the tools available outside of Meta I know about BubbleUp feature in Honeycomb which does exactly this:
Indeed AFAIK this feature is uber-popular among Honeycomb customers
You can always have a schema describing the type of each field (may be useful for efficient storage). It just needs to be backward compatible version over version (e.g. only add fields normally with some special process for deprecation).
Some implementations infer schemas from the events itself, see
Depends actually on the language / framework. E.g. in JVM world passing various context data to los it relatively easy (you add data to some context object or whatever, and it gets passed to the logs - something like this).
> Although Kibana does give nice visualizations on logs.
Right. The UX of Kibana is whatever the opposite to "intuitive" IMO. It's not a tool for exploration. One can find the answer on a well-defined question there, but exploring unknonw stuff?.. Would be surprised if many people can do it naturally.
> We do logging for JSON/proto events in BQ but not sure about the UI.
Yep exactly. Logging to somewhere is one thing, but it doesn't give the easy-to-use exploration capabilities. Queries should be easy to make and fast.
I believe more into writing data to ClickHouse for this purpose, because it's faster than BQ. UI is still a question - not sure if there is something available.
Thank you so much for writing this. You are so spot on!
- Logs capture information with the intent to diagnose issues. Logs should not capture success cases. They should focus on failure cases.
- Metrics capture aggregates like counters. Metrics should be derived from Events.
- Events are data points in a series + metadata. The metadata enables slicing and dicing data points in innumerable ways.
Events are the unlock. When you start with events, metrics are easy. When you start with metrics, events are impossible. And logs ... blech. What a mess. Logs are the junk drawer of observability.
> Events are the unlock. When you start with events, metrics are easy. When you start with metrics, events are impossible. And logs ... blech. What a mess. Logs are the junk drawer of observability.
Love this.
However reading HN discussion ( it looks like some folks are offended by calling logs names. So I will avoid this in future. Logs are observability history :)
Great article Ivan. Should we expect more ? ;)
Thank you! Yes, I do have plans writing more, need to battle procrastination :D
Thanks for writing this up! I'd love to hear more about Scuba and how the paradise it was!
Thank you so much for writing this. That`s great!
Love this. It is very difficult to make sense of the open telemetry concepts.
I would say that Wide Events sound a lot like Structured Logging.
Yes. I like "wide events" term more because it has this "wide" component which actually highlights 2 things:
- the desire to attach to these events as much information as possible
- the design of the solution that would handle those events. While structured logs one can store in a row-oriented database with a few indexes, it's not wise to store wide events this way - a columnar storage is a must.
When reading some replies on HN discussion about structured logs vs wide events I confirmed this feeling that structured logs are not necessary associated with "a lot" of information. But it's just a personal preference in the end.
Great article, thanks!
Here is a link to the original Scuba paper I guess:
Maybe you know other public sources with its implementation details?
Yep, that's the one. There is nothing more in public I think. It's weird how it's so not promoted, given that the paper is published in 2013! It's cool that the paper shows UI screenshots as well and doesn't focus on storage only. Storage-wise I think systems like ClickHouse can do the job good enough.
UI is a different story though. It's simple yet powerful, but in the open source world it's hard to find anything that comes close.
Did you check Rill?
Literally just saw a tweet about their integration with ClickHouse, and it made me wonder :) Do you recommend?
I'm biased because I work for them :) You can take a quick look at a GIF demo here: which visualises pretty much the experience that you've described in the post (comparing dimensions), I believe.
Looks nice! Would be really awesome if there was a playground, like the one Honeycomb has: Really useful to get a quick feeling
May be this video can be helpful on the Scuba:
Hi Ivan, as you said, the problem is easily solved with Metrics. Just move "timestamp" out of the labels, and put "samples count" into the metric value, so query would be `sum by(OsVersion) (AdImpressinos{IsTest=false)}` -- even more readable than SQL. I would say whenever something is presented as a timeseries graph is a metric, and is convenient to be used as metric.
Note that not every metric storage restricts you to "take a snapshot of the system once in a while" as Prometheus does -- for example VictoriaMetrics supports push-based ingestion, so you can push the wide events whenever you like, even historical, no need to depend on periodic scraping.
As you said, metrics storages often talk about cardinality problem, because they are usually optimized for very fast filtering by labels (e.g. OsVersion in your example), so typically index them in RAM/BloomFilters for fast access. There's easy solution of course -- not index them! But then you need to scan the data in order to filter out the row you need, and that is slower, no matter how we call it (logs/metrics/events).
But I doubt you're going to hit the cardinality limits unless you store particular userId/deviceId in the wide event. E.g. VictoriaMetrics is able to hold billions of metrics (aka "active timeseries") in one VictoriaMetrics cluster.
But if you really want to store userId/deviceId in the wide event, then yes, I agree, it would push the cardinality to trillions, and metric storage indexes would fail (or be too slow). It would probably have some privacy implications as well (filter one particular user/device).
> But I doubt you're going to hit the cardinality limits unless you store particular userId/deviceId in the wide event. E.g. VictoriaMetrics is able to hold billions of metrics (aka "active timeseries") in one VictoriaMetrics cluster.
I'm not sure it's true. Assume you have 20 different attributes (labels). Even without explicit userId/deviceId the combination of values in these attributes can result in pretty high cardinality.
That said, VictoriaMetrics is an impressive piece of software, but it would still suffer from high-cardinality problem correct?
I think any system trying to pre-aggregate something based on combination of labels would suffer from it, so the only approach that would not is scanning raw events, as you mentioned.
It's indeed slower, but one important thing about observability use case is that unless queries are too slow (tens of seconds) it still yields a decent user experience. It doesn't matter much if the chart gets refreshed in a second or in 100ms. This observation opens up some approaches that are both cheap and fast enough - "serverless disaggregated storage architecture", as Axiom folks calling it (
> Assume you have 20 different attributes ... values in these attributes can result in pretty high cardinality
Theoretically yes, but I've never seen values to be independent. Even when people put 50+ labels, the cardinality is typically still within millions. And some companies do have billions of active time series in VictoriaMetrics, but it's rare.
I usually look at the question from its use-case. If the main use-case is to put the data on a chart (aka time-series) then its a metric. Optionally with alerts on _numerical_ condition.
We can start from different types of input, but if in the end it's a time-series, then it's a metric.
And if there's the high-cardinality problem -- it means that time-series are very short, like 1 point per chart. There's no use in putting that points on a chart, so there's no chart use-case, so it's not a metric. It can be an event for OLAP analysis/backups/manual reading etc, just not a metric. But since I saw _charts_ in your post I thought of metrics.
> it would still suffer from high-cardinality problem correct?
Yes, it indexes the time-series, so will suffer from that.
Just to clarify, victoriametrics does not pre-aggregate anything by default (unless you turn on streaming aggregation or historical downsampling). You probably meant indexing, and yes, you're right, it's always slower with indexing.
> It doesn't matter much if the chart gets refreshed in a second or in 100ms.
Well, users pretty often put 100s of charts on one dashboard with auto-reload. E.g. some put it on a big office screen with auto-refresh.
Some users put 10,000+ alerts. And queries for both alerts and dashboards can be arbitrarily complex, like pulling historical data or forecasting.
So I would say it does matter if the data is indexed or not, may result in millions of dollars savings for infrastructure depending on data volumes.
When I actually start using OpenTelemetry I find a surprising benefit. For example, if I received an invalid HTTP payload and would like to find which client is sending us bad data, I used to have to pass "informational parameters" deep into the call site where the error is detected, so we can log all the details.
With Tracing, I can actually add the error as a tag/attribute of a span without knowing any "context", because my OTel collector will eventually associate all the spans up to become a trace, and I can find correlation between the error and the client ID.
However, same with the author's argument - that OpenTelemetry can keep your functions parameters succinct is not being advertised at all with how Trace/Span is explained.
With wide events, how does one decide what should be used as a span vs not? If I think of wide events as a sea of wide, structured logs, coming from arbitrary sources, how do we decide which of those are "special" and can be rendered as traces?
Also, I'm curious about whether Scuba had a Traces UI view.
I mean if we implement observability system, we can decide that we have some special field, let's call it eventType. And so eventType = 'span' would mean it's a span. An additional field traceId would help us rendering tracing view.
I've never used Trace UI at Meta tbh but recently learned from this video ( that there is some.
scuba actually reminds me of graylog gelf logs: you can have a message with key/value pairs and then aggregate/filter/group them using graylog UI.
Never used it, but from a quick look sounds like a similar thing yes.
However "native sampling" is a really important Scuba feature - not sure if graylog UI takes into account.
A colleague of mine recently sent me a link to your post. I couldn't agree with you more which is one reason I have been working on
Thank you for sharing!
Horses for courses. I would argue this post covers only one use case for observability. There are more. E.g. when you want to track a specific distributed transaction, you must have a full trace available and be able to look at it as a tree. It's very powerful. But you cannot just sample a random events in a trace though to make it work. Additionally, it poses very different requirements on the collection (the context needs to be passed). Also, all queries that look at traces (rather than individual spans/events) must group the spans into traces
The signals are not just about different semantics associated to them (but eventually meaning the same thing). It's also about relationship of records
Nobody is denying the importance of smart sampling. When we need that certain events always appear together, we should sample based on some common field in the event (traceId).
The Ad Impression example from the post has the same challenge. Once Ad Impression is served, there can be some events associated with this impression: clicks, shares, whatever. And it would be good to be able to see them all together. It's absolutely the same problem of the need for a smart sampling.
And fwiw I don't say that tracing is a bad concept. I'm arguing that it gets presented / defined like a completely different thing, while there is more similarities that differences between wide events / structured logs and traces / spans.
Did Meta ever invest in a tool that could automatically tell you which dimension(s) had abnormality, or how much they lent to a regression? Eg, new OS versions, app versions, countries, carriers etc commonly point to the issue. Seems like doing this automatically, maybe on demand, while computationally expensive, would just give you the answer and prevent manual hunting and pecking.
At Meta while I was there there wasn't such a thing, maybe it has developed it after that. In the tools available outside of Meta I know about BubbleUp feature in Honeycomb which does exactly this:
Indeed AFAIK this feature is uber-popular among Honeycomb customers
How do you deal with schemas and versioning of wide events?
You can always have a schema describing the type of each field (may be useful for efficient storage). It just needs to be backward compatible version over version (e.g. only add fields normally with some special process for deprecation).
Some implementations infer schemas from the events itself, see
Nicely written !!
Adding more fields to log.Errorf() is a pain. Although Kibana does give nice visualizations on logs.
We do logging for JSON/proto events in BQ but not sure about the UI.
> Adding more fields to log.Errorf() is a pain
Depends actually on the language / framework. E.g. in JVM world passing various context data to los it relatively easy (you add data to some context object or whatever, and it gets passed to the logs - something like this).
> Although Kibana does give nice visualizations on logs.
Right. The UX of Kibana is whatever the opposite to "intuitive" IMO. It's not a tool for exploration. One can find the answer on a well-defined question there, but exploring unknonw stuff?.. Would be surprised if many people can do it naturally.
> We do logging for JSON/proto events in BQ but not sure about the UI.
Yep exactly. Logging to somewhere is one thing, but it doesn't give the easy-to-use exploration capabilities. Queries should be easy to make and fast.
I believe more into writing data to ClickHouse for this purpose, because it's faster than BQ. UI is still a question - not sure if there is something available.