24 Comments
Feb 28·edited Feb 28Liked by Ivan Burmistrov

Thank you so much for writing this. You are so spot on!

- Logs capture information with the intent to diagnose issues. Logs should not capture success cases. They should focus on failure cases.

- Metrics capture aggregates like counters. Metrics should be derived from Events.

- Events are data points in a series + metadata. The metadata enables slicing and dicing data points in innumerable ways.

Events are the unlock. When you start with events, metrics are easy. When you start with metrics, events are impossible. And logs ... blech. What a mess. Logs are the junk drawer of observability.

Expand full comment
Mar 10Liked by Ivan Burmistrov

Love this. It is very difficult to make sense of the open telemetry concepts.

Expand full comment
Feb 15·edited Feb 15

Great article, thanks!

Here is a link to the original Scuba paper I guess: https://research.facebook.com/publications/scuba-diving-into-data-at-facebook/

Maybe you know other public sources with its implementation details?

Expand full comment

scuba actually reminds me of graylog gelf logs: you can have a message with key/value pairs and then aggregate/filter/group them using graylog UI.

Expand full comment

A colleague of mine recently sent me a link to your post. I couldn't agree with you more which is one reason I have been working on https://github.com/eventrelay/eventrelay.

Expand full comment

I would say that Wide Events sound a lot like Structured Logging.

Expand full comment
Feb 28·edited Feb 28

Horses for courses. I would argue this post covers only one use case for observability. There are more. E.g. when you want to track a specific distributed transaction, you must have a full trace available and be able to look at it as a tree. It's very powerful. But you cannot just sample a random events in a trace though to make it work. Additionally, it poses very different requirements on the collection (the context needs to be passed). Also, all queries that look at traces (rather than individual spans/events) must group the spans into traces

The signals are not just about different semantics associated to them (but eventually meaning the same thing). It's also about relationship of records

Expand full comment

Did Meta ever invest in a tool that could automatically tell you which dimension(s) had abnormality, or how much they lent to a regression? Eg, new OS versions, app versions, countries, carriers etc commonly point to the issue. Seems like doing this automatically, maybe on demand, while computationally expensive, would just give you the answer and prevent manual hunting and pecking.

Expand full comment

How do you deal with schemas and versioning of wide events?

Expand full comment

Nicely written !!

Adding more fields to log.Errorf() is a pain. Although Kibana does give nice visualizations on logs.

We do logging for JSON/proto events in BQ but not sure about the UI.

Expand full comment