Our primary use case is for building a data pipeline and data analytics.
The performance is one of the most important features. It has an API to process the data in a functional manner.
I would like to have the ability to process data without the overhead. To use the same API to process both terabytes data and be able to process one GB of data.
I have been using Spark SQL for around four years.
It is scalable. I use it on and off. I use it mostly daily.
From an infrastructure perspective, it was easy for us to set up because we used some cloud services. But on-premise requires more setup. There is a learning curve. If you're not a programmer there is a learning curve. It requires more effort to learn more complex steps.
I deployed it by myself. We use cloud so we are able to do it.
The amount of people required for deployment will depend. One person is enough for AWS but not in other places.
If you know how to do it, the deployment can be done in minutes.
I would rate Spark SQL a nine out of ten.
My advice would be to read Databricks books about Spark. It's a good source of knowledge.
In the next update, we'd like to see better performance for small points of data. It is possible but there are better tools that are faster and cheaper.