We use this solution for information gathering and processing.
I use it myself when I am developing on my laptop.
I am currently using an on-premises deployment model. However, in a few weeks, I will be using the EMR version on the cloud.
We use this solution for information gathering and processing.
I use it myself when I am developing on my laptop.
I am currently using an on-premises deployment model. However, in a few weeks, I will be using the EMR version on the cloud.
The most valuable feature of this solution is its capacity for processing large amounts of data.
This solution makes it easy to do a lot of things. It's easy to read data, process it, save it, etc.
When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data. Once you are experienced, it is easier and more stable.
When you are trying to do something outside of the normal requirements in a typical project, it is difficult to find somebody with experience.
This solution is difficult for users who are just beginning and they experience out of memory errors when dealing with large amounts of data.
I have not been in contact with technical support. I find all of the answers that I need in the forums.
The work that we are doing with this solution is quite common and is very easy to do.
My advice for anybody who is implementing this solution is to look at their needs and then look at the community. Normally, there are a lot of people who have already done what you need. So, even without experience, it is quite simple to do a lot of things.
I would rate this solution a nine out of ten.
We use the solution for analytics.
I'm not sure how it has improved my organization but I believe that it's a good product.
The fast performance is the most valuable aspect of the solution.
The search could be improved. Usually, we are using other tools to search for specific stuff. We'll be using it how I use other tools - to get the details, but if there any way to search for little things that will be better.
It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster.
In the next release, if they can add more analytics, that would be useful. For example, for data, built data, if there was one port where you put the high one then you can pull any other close to you, and then maybe a log for the right script.
I found the solution stable. We haven't had any problems with it.
Usually, we can fix any issues. If we have problems, we google a little bit to find the issue.
I was using some other systems and we moved to Spark later. We faced performance and other issues with the other solution.
The initial setup was easy. We keep on getting data from different sources so we will keep on porting in little bits. It's not done in a single sitting, so I can't really say how long it takes.
I would recommend the solution. I would rate it an eight or nine out of 10.
For some areas, I would give it ten but I cannot use some parts. If you are going to use it for a consumer then I would be able to recommend it and you should go ahead. It doesn't work for me as I have different clients and different engagements.
We primarily use the solution for security analytics.
The scalability has been the most valuable aspect of the solution.
The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive.
The 2.3 version is quite stable. All of our customers use it, there are around 100,000+ users, and it runs 24/7.
The scalability is very good.
You actually buy Cloudera along with it. You don't really get any support, except you need support.
In previous companies, we used MySQL platform and solutions like ArcSight and Splunk. We switched for scalability. MySQL wasn't going to scale, and we don't use Splunk at this company.
The initial setup was complex. It is a complex tool. It's a lot to do with how you will use it. There is a lot to set up. They need to put a lot of scripts to it. There's nearly 60 to set up. When you set up the cloud, it takes about a day to set up. If you set it up on-premise, you know, on hardware, it only takes about a week.
I would rate this solution eight out of 10.
Streaming telematics data.
It's a better MR, supports streaming and micro-batch, and supports Spark ML and Spark SQL.
It supports streaming and micro-batch.
Better data lineage support.
Ingesting billions of rows of data all day.
Spark on AWS is not that cost-effective as memory is expensive and you cannot customize hardware in AWS. If you want more memory, you have to pay for more CPUs too in AWS.
Powerful language.
It is like going back to the '80s for the complicated coding that is required to write efficient programs.
Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical.
It provides a scalable machine learning library so that we can train and predict user behavior for promotion purposes.
Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature.
I would suggest for it to support more programming languages, and also provide an internal scheduler to schedule spark jobs with monitoring capability.
Organisations can now harness richer data sets and benefit from use cases, which add value to their business functions.
Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.
Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing.
At times when users do not know how to use Spark and request a lot of resources, then the underlying JVMs can crash, which is a big sense of worry.
No issues.
DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort.
We developed a tool for data ingestion from HDFS->Raw->L1 layer with data quality checks, putting data to elastic search, performing CDC.
Dynamic DataFrame options are not yet available.
One and a half years.
No.
No.
Spark gives the flexibility for developing custom applications.
