I subscribe to the notion that — beside trying to make your code bug-free (impossible) — you as a developer also need to make it as easy to debug as possible.
Sooner or later, there will be a runtime bug in production. And there’s a huge variety of ways how your program can act.
It can just crash or get unresponsive.
It can keep on trucking only to misbehave later or do (1) afterwards.
It can log to the device’s console (invisible to the developer) but otherwise do (1) or (2).
It can show an error to the user, such as StateError('this should not happen').
It can show a slightly better error to the user.
…
…
etc.
I would like to hear what methods and tools people are using to make debugging production / runtime issues easier. Things like:
gathering log messages (and sending them with error reports)
Sentry/Firebase/etc.
rethrow techniques
allowing users to send screenshots and app states with error reports
rules of thumb when to fail fast versus when to keep on trucking
etc.
I think this is one of the questions that’s just too “soft” for something like Stack Overflow, and possibly too open ended even for this forum, but also critical to have good answers for, or at least good options for. Inevitably, different teams will have different approaches, and the approaches will also depend on sizes of apps, sizes of teams, targeted platforms, complexity of apps, etc.
I can share my own preferences and experiences but I’d love to learn others’ approaches first.
Awesome topic. I don’t have too much to offer here besides that sometimes the tool https://detective.dev/
Can be really useful as it allows to inspect the currently living objects inside your running app.
My two biggest problems when debugging are
Asynchronous gaps in stack traces that end in Nirwana so I have no idea from where it was called
An exception during a rebuild because the stack trace does not provide information bout the widget classes as it’s all only the element tree. I wished the Flutter team would add something to make it easier to map such build stack traces to your actual widget classes
What we like to do is on each user action such as button click (which triggers some Bloc most of the time) create a trackingId which is then passed everywhere up to API requests and back. You put this trackingId literally to each log, we do ‘blabla’.tracked(trackingId), to each request, etc., Also your server can accept this trackingId and add it to it’s logs, so now it’s even more connected.
With this, you are able to somehow track stuff better, especially if some kind of continuous logging service is used to collect it for some time. But you can also use it just with Crashlytics, at least from those errors you can tell your BE this id and they can investigate more etc.
We also have a “AppLogger” mixin which serves as an util for logging and we log to different services based on enviroment etc. (Crashlytics, Bugfender, …). We had issues with Crashlytics, which groups logs/errors/crashes based on stack trace. Which is an issue, since our last stack trace entry is from that logging mixin. I discovered this cool package stack_trace | Dart package (actually from Dart team) where you can omit some stacktrace entries or do other stuff. So now we just do Trace.current(1) to omit the last wrong entry and that should help with Crashlytics putting everything to one thing. We still have to publish the app version with this change, but I suppose this will work good.
That’s really interesting! So, if I understand correctly, the ID is for the user interaction and is then attached to all the calls (including server calls) that the user interaction caused?
So you also have some kind of way of seeing the user interactions that immediately preceded the one you’re looking at?
Also, what tool, if any, do you use to put all these logs together and filter on the interaction tracking id?
We currently use mainly two tools – Crashlytics and Bugfender.
Crashlytics as you might know is more for bug/crash reports, but you get some logs around the time of the crash (some logs before and some logs after, not really sure about the specifics, but usually it is enough), so at least from that you know i.e. what requests or logs belongs to one/two/etc. interactions. Imagine that for some reason you are processing the same button handler at the same time, this way you can separate those logs. Plus it’s useful if your BE also consumes these trackingIds and use them to log as well, so you can pair FE and BE logs.
For the continuous logging we use Bugfender. Currently just for development and testing, mostly because enabling it in production would just cost nonsense money. But we will probably use some tool for continuous (centralized) logging in the future. However, in there (while testing/developing), we see ALL the logs from the device (for like a week, which is paid). And when a tester reports some bug, they can export the logs and we can see exactly the connection of all logs etc. which can be really helpful since you just filter out the event (i.e. button click) you care about and see everything related. Or even when there is no crash, you can just see everything and all logs can be split/filter/search using this id.
(tldr the point is to distinguish “flows” from random logs)
Note we haven’t squeezed everything out of it yet, but I guess anything is better than nothing.