What is our primary use case?
We host a document store with approximately seven million documents that need to be retained in perpetuity. The documents are used for discovery in legal cases. The document store grows continuously and periodically they have been rehoused due to scalability issues. We selected MinIO because it promises to be a more highly scalable and performant solution than past choices. We are users of MinIO and I'm the VP of Information Services as well as the company CTO.
How has it helped my organization?
We're currently running MinIO in parallel to an existing object store as we migrate to it. Object retrieval with MinIO is much improved over the prior solution, updates are simple to apply, and we anticipate that future capacity requirements will be easier to meet. Our customers are using this solution to retrieve tens of thousands of objects (documents and files) per hour, and MinIO seems to be able to keep up with it. While we migrated all 7 million objects we are performing a byte-by-byte validation before removing them from the old system - a process that is not expected to be completed before the end of the year. From a management perspective, once we get the object store validated, we're going to be able to set up a seamless synchronization with our offsite data store in Amazon S3 with the built-in functionality for mirroring buckets, which will provide a huge improvement over the existing application.
What is most valuable?
The retrieval performance of objects with MinIO is significantly higher than a lot of the alternatives we investigated. We're able to use Amazon's SDK to interact with it, and the SDK support is excellent. The included MC tool is not only good at administering MinIO, it is also good for administering our S3 buckets, as it's actually more capable than Amazon's own tool. Updates are also very simple and easy. We've done numerous updates since the deployment and the platform stays online throughout, which is an unexpected bonus. I think in terms of other useful functionality, there are some very good things about it.
What needs improvement?
Improvement could be made in several areas. When implementing a distributed solution, there were no good white papers or tutorials on implementing clusters, so we had to kind a of wing it, and it took us a while to get the clustered implementation working. The clustering documentation is oriented towards containers and Kubernetes, and we're running Linux VMs instead of containers, partly because we run on top of VMware and it's a little easier to manage a VM than it is a container in VMware's platform. We're also using vMotion and we have a cluster of VMware hosts which approximates the functionality of containers without the complexity, plus we have SAN on the backend. Containers actually create a degree of complexity that's unnecessary for our application because we already have a great deal of hardware redundancy in our system.
There's very little documentation on performance tuning for MinIO and for running it on Linux, which has been problematic because as the object store has grown, we've run into various performance issues. We've done a lot of our own research and some of our own performance tuning on a trial and error basis. We've had intermittent latency on object retrieval on a sporadic basis, and no way to determine the underlying cause. There have also been some technical issues. When we added more than 100,000 objects into a single bucket, the web browser interface for viewing buckets became unusable, which means we have no graphical way to search or browse our buckets, and have to rely on programmatic means.
We were initially using Prometheus, which extracts performance and usage data into Grafana for monitoring. That was useful until we exceeded a million objects and then it stopped working correctly. We were unable to get accurate statistics out of the system, and had to come up with a workaround by creating bucket notifications, that can be forwarded to a database. We're using a MySQL cluster for that, aggregating bucket notifications into MySQL and then parsing the JSON data out of MySQL to do our own dashboarding to keep track of performance and utilization issues. It seems to be working, but none of the native interfaces for MinIO work when you exceed a certain bucket size.
We then had two other issues we ran into: There's a supposedly optional heal function, but in practice that's not exactly the case. It's extraordinarily slow. We started a heal run about two weeks ago, and it's only done about 7% of the documents in the last two weeks and it's still running. Secondly, the SSL implementation was a more complicated than it should be. We had wanted to secure the documents for access, because we use a suite of web services we developed ourselves with the Amazon SDK for providing CRUD operations on objects. We needed to secure it with SSL, but we ultimately found it was a lot simpler to front MinIO with NGINX as a proxy, and keepalived to provide automatic failover, than the solution they had suggested. We came up with our own SSL solution, but it was not easy.
It would be nice if there was a graphical tool for searching buckets that didn't attempt to display the bucket. We use a product called Couchbase, which is based on CouchDB as a key value pair database for one of our web applications. It has a very nice function for searching buckets by key - something comparable in MinIO would be great. I think also that improving the logging functionality to enable more selective statistics logging the way that bucket notifications work would be very valuable. One step further - outputting statistics data in other formats would also enable better monitoring. Fixing the heal function - or perhaps allowing it to run across the cluster in parallel instead of only on a single node - would be valuable.
For how long have I used the solution?
What do I think about the stability of the solution?
The solution seems to perform fairly well under load. There have been occasional issues with unexpected latency developing and we've been trying to pin that down, but we don't have any tuning documentation to allow us to do that. It's basically been a hit or miss proposition to figure out what's causing the intermittent performance issues. MinIO doesn't seem to require a lot of maintenance. We have a network operation center and they've been able to keep a pretty good eye on it using a workaround we devised. We haven't had any outages so I'd classify it as stable.
What do I think about the scalability of the solution?
The biggest bugs are in the web browser interface, and the failure of Prometheus when it didn't scale. The web browser interface won't let Prometheus scale correctly in large buckets. But I don't imagine people are typically putting seven million objects in a bucket like we are. That's a lot of objects and the average size of the binary is at least a megabyte. In some cases it's 10 megabytes and they are fairly large binaries to be shuffling around. Those are the major problems, but also the heal function is inordinately slow. Other than that, the application performs fairly well.
Internally, it's the backend for our document system and the content management system. The document management system is used by our entire staff of approximately 40 people. It's role as a content management system is used externally by our customer base, which is approximately 10,000 people a day. We have a number of other potential use cases and once we've completed migration from our old object store, we're going to explore those. Presuming they can resolve some of the issues in upcoming releases, it'll have a lot more utility in the organization. Unstructured data has always been a challenge to deal with in content management systems and we have a lot of unstructured data in our business. There's a range of different applications where that's valuable.
How are customer service and technical support?
We haven't taken advantage of the technical support but we have used their support forums and had some responses, but I believe it's a fairly small team at MinIO and I don't think they have a lot of people involved in support yet.
Which solution did I use previously and why did I switch?
Our data was all previously stored in a SQL Server AG cluster, but the performance in moving binary objects in and out of SQL Server is fairly poor, and database backups for multi-terabyte databases within maintenance windows very challenging. MinIO is an S3 compatible object store that mirrors seamlessly with Amazon and other S3 object stores, and doesn't have the management problems that go with a large SQL Server installation. There are a lot of complexities in managing a large document store in SQL Server because it's not a great solution for storing binary large objects - blobs, as they're known. Programmatically, the .NET support for putting data in and out of SQL Server is quite good, but not the performance. It's slow and backing up such a large database exceeds our backup window. Effectively, we were never able to back up that database. We had to create a process to continuously extract new data out of it and to place it offsite in Amazon, but the backup and restore processes were slow as a result. It created a lot of headaches for our DBA. It was a 10 terabyte database and SQL doesn't deal well with such large databases.
How was the initial setup?
The initial setup involved two people - a network engineer to setup the backend VMware infrastructure, and then the Linux VM deployment which I performed myself. Although I'm the CTO, I'm fairly hands-on because we're a smaller company and I have deep Linux expertise that goes back to the 90s, so I implemented the various nodes and set up all of the information around replication and configuration. My network engineer set up the VMware hosts and tuned the storage backend and a number of other things. It was a collaborative effort.
If we'd had better resources, we probably could have done it in about two weeks. But we had a few false starts and had to basically make it up as we went, so it took about six weeks to get the initial deployment working.
What about the implementation team?
We did not utilize outside vendors. For mission critical applications we always develop inhouse expertise.
What was our ROI?
Since the product is not technically in production, a ROI can't be computed
What's my experience with pricing, setup cost, and licensing?
We are using the open source version, so we didn't pay anything for it. We are planning to obtain a support contract when it's fully in production - which is to say once all of our documents have been validated, and it's stable in our environment. Looking at the prices, they seem to be fairly comparable to what we've paid for similar products in the past so there's no issue paying for support and product, but we're still at a step beyond the proof of concept stage. That won't be complete until we validate that all the binaries are correct.
In terms of additional costs, a lot of hardware went into this. We're running it on a cluster of VMware servers. There were licensing costs for the VMware host and in addition to that, we're running it on a storage area network on the backend and the storage area networks alone are very expensive. It's a fairly large storage area network too, so we're essentially running it on dedicated hardware. Our investment to date exceeds $200,000.
Which other solutions did I evaluate?
We compared MinIO with a number of other object stores and it offered some performance advantages and a degree of simplicity which was very appealing. We implemented a four server failover cluster and we migrated all of the documents into MinIO which took us about three months. We're currently running a validation process before we remove the old object store by doing a byte by byte comparison of the objects from the old store and the new store. Because it's such a time-consuming process, it's only about 25% done and we anticipate it'll take another three or four months before it's complete.
What other advice do I have?
As the solution evolves, it will become a better product, but right now it has a lot of rough edges. I would not recommend implementing MinIO unless you have sufficient technical expertise. I'm very familiar with non-relational as well as relational databases, and with Linux, which helped during implementation, but not every shop has that skill set available. I was able to extrapolate what it should be doing based on the type of product it is and comparisons with similar technologies that I've dealt with in the past. Without that experience I think it would be difficult. If you're doing a single server implementation, it's very simple but it gets complex very quickly if you're building a clustered server, which is what we were doing.
I rate the solution 7 out of 10. I would rate it higher if the management tools were usable on large buckets, and the quality of the documentation was higher.
Which deployment model are you using for this solution?
On-premises
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?