What is our primary use case?
My main use case for Comet is experiment tracking and model lifecycle management. Comet has been a very helpful tool in our machine learning workflows. It has helped us improve reproducibility, collaboration, and visibility across all the AI projects that we manage. My primary use case is experiment tracking and machine learning.
Initially, we needed Comet as a centralized platform because we required a centralized platform that could track experiments and improve collaboration between the ML engineers and the data scientists. Comet has allowed us to consolidate experiment tracking and visualization into a single platform, making our workflow much more organized and reproducible.
Comet allowed us to consolidate experiment management, model evaluation, and visualization, everything into a single platform, which made our ML workflows much more organized and reproducible.
What is most valuable?
The most important use case of Comet would be the centralized experiment tracking. Every training run, metric, hyperparameter configuration, and model outputs are logged automatically, which makes it much easier to compare experiments and identify what is improving model performance.
The most important feature that Comet offers would be the reproducibility. Previously, we had to reproduce old experiments by ourselves, which was difficult because configuration metrics and everything else was scattered across notebooks and local systems. When we introduced Comet into our systems, all our experiments are stored in a single place, which greatly simplifies debugging and retraining workflows. Visualization is another feature that provides clear dashboards for tracking and resource utilization.
Visibility is the main benefit of Comet that has helped us create dashboards for tracking multiple models across various domains. Training curves, validation metrics, and resource utilization at different levels are all visible. This visibility has made it easier for us to understand where we are getting overfitting or where we are facing bottlenecks. Collaboration is also improved. Engineers can sit down and share findings within a single environment instead of relying on spreadsheets and multiple disconnected notebooks.
Comet has good integration capabilities with popular ML frameworks, and the integration is very strong. While using some customized pipelines, we need to have some manual configuration, and some effort is needed in that area. Apart from that, Comet is a very capable platform for ML lifecycle management.
What needs improvement?
Comet is a very powerful tool for experiment tracking and MLOps workflows, but the platform is somewhat complex for teams that are not initially familiar with the structured practices that have to be followed in MLOps. Understanding experiment organization, integrations, and tracking workflows requires some onboarding.
Pricing is one of the major challenges that Comet is facing. As our organization has increased and many users and experiment tracking requirements have increased, the platform cost can increase very quickly. The platform delivers very strong value when the users have increased or experiment tracking has increased extensively. However, as the ML workload increases, the cost also increases very quickly. Smaller teams running a limited number of ML experiments may not be able to fully utilize its capabilities as a whole.
Comet has good integration capabilities with popular ML frameworks, and the integration is very strong. While using some customized pipelines, we need to have some manual configuration, and some effort is needed in that area. The slight learning curve for teams that are unfamiliar with structured MLOps practices could have some improvement in that area. Some integrations with customized pipelines still require a lot of manual effort, which is one area that Comet could improve in.
Pricing initially seemed very high compared to other open-source experiment tracking tools. However, once we integrated the platform into our workflows, the productivity improvements justified the investment.
For how long have I used the solution?
I have been using Comet for around nine months.
What do I think about the stability of the solution?
Comet is very stable and easily scalable. Comet has been very stable in our experience, and with experiment logging, dashboard visualization, and model tracking workflows, it performs reliably even during large training workloads. We have not experienced any reliability issues affecting our ML operations. The performance platform handles scaling well as the number of experiments and users increases.
What do I think about the scalability of the solution?
The scalability of Comet is a very strong point for its use case. As we have scaled across multiple experiments, our models have increased by two to three folds. Comet is continuously able to organize runs efficiently and maintain visibility across projects, which becomes very important when we are scaling as an AI team.
Comet has been very stable in our experience, and with experiment logging, dashboard visualization, and model tracking workflows, it performs reliably even during large training workloads. We have not experienced any reliability issues affecting our ML operations. The performance platform handles scaling well as the number of experiments and users increases. The number of experiment models has increased drastically, but Comet has continued to organize runs efficiently and maintain visibility across multiple projects.
How are customer service and support?
Our overall experience with customer support has been mostly positive. Documentation has been quite detailed, and integration with PyTorch and TensorFlow are generally very straightforward. For advanced configurations, our support interactions were very responsive and technically helpful. I would rate the customer support a nine out of ten.
Which solution did I use previously and why did I switch?
Initially, we managed all our experiments manually using Jupyter notebooks, spreadsheets, TensorBoard, and some internally managed tracking scripts before switching to Comet. We thought switching would allow us to manage experiments across multiple tools easily, which had become very inefficient with the previous solutions we were using, making reproducibility very difficult. Comet provided a centralized and much more scalable alternative for experimentation altogether.
How was the initial setup?
The setup process was very straightforward, especially for teams already using modern ML frameworks, and even integration with our existing training pipelines was very smooth.
What was our ROI?
The biggest return on investment of Comet comes from improved reproducibility. We have improved reproducibility and experimentation has been way faster than before, and collaboration between teams has gotten better. This has allowed us to cut our workforce that was redundant, basically doing the manual documentation work, which has now shifted to Comet. Development lifecycles have become about one point five times faster. We spend less time debugging, and more time is spent tracking model performance and documenting experiments, which has shifted to actual model developments and overall metrics improvements. This has been our main return on investment.
Which other solutions did I evaluate?
Before choosing Comet, we evaluated MLflow, Weights & Biases, Neptune.ai, and TensorBoard. Most of these solutions handled parts of experiment tracking, but Comet stood out because it allowed us to have visualization along with centralized experiment management, which served as a base for great collaboration. That clear dashboard and strong visualization capabilities are what led us to choose Comet.
What other advice do I have?
My advice for others looking into using Comet would be to evaluate the scale and level that their organization operates at. If a team is running occasional ML experiments with a smaller number of researchers, lightweight tracking tools may be sufficient. However, for organizations managing multiple models and datasets, Comet provides a great load of benefits for them. The platform is very valuable when reproducibility, centralized visibility, and experiment comparison become important priorities. For AI-focused organizations or ML teams starting to scale, I would definitely recommend Comet.
Comet is a very valuable platform when it comes to reproducibility, collaboration, experiment tracking, and visibility. Even though there is a slight initial learning curve for teams trying to use Comet, once you are familiar with it and once your workflows and integrations are sorted, Comet becomes a very powerful platform for managing all your ML experimentation. I believe this review is overall quite good and would help anyone understand whether Comet is built for their team or if they would require it. I give this review an overall rating of eight out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)