What are the options for deploying models in production with R?

0 votes

According to my notice, there don't seem to be much options for deploying predictive models in production. 

I am in knowledge that PMML can be used to export models as an XML for in-database prediction. But, to make this work it seems like you need to use the PMML plugin by Zementis.  That means PMML is truly not open source.

Is there an easier option so that I can map PMML to SQL for prediction?

I have also looked for an option i.e. to use the JSON instead of XML to output model predictions.

But I'm confused where would R sit in such model as I'm assuming it's always needed to map to SQL.

But in this case, where would the R model sit? I'm assuming it would always need to be mapped to SQL.

Any other options out there?

Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points

1 answer to this question.

0 votes

Well, I could say that the answer completely depends on what production environment you are using

You can try out this new open source PMML "scoring  engine" called Pattern, only if you are using Big Data on Hadoop

Else your only choice is to run R on server. You will save your fitted models in .RData files and then load them to run respective predict on the server.

This would be very slow but you can always try to use more hardware.

Now, how you do that completely depends on the platform. You can always use user defined function to add custom function in R.

Coming to Hadoop you can add such functions to Pig or you can use RHadoop to write simple map-reduce code to load the model and call predict in R.

If your data are in Hive, you can use Hive TRANSFORM to call external R script.

There are also other ways such as vendor-specific to add functions written in R to various SQL databases.

answered Apr 12, 2018 by DataKing99
• 8,130 points

you can do even dashboard and public.
Hey @Irina, Can you please expand your answer and explain a little more?

Related Questions In Data Analytics

0 votes
1 answer

What are the top packages in R for data visualization?

These are the top R packages used ...READ MORE

answered Aug 25 in Data Analytics by Cherukuri
• 32,260 points
0 votes
1 answer

What is the standard naming convention for the variables in R?

Use of period separator e.g. product.prices <- c(12.01, ...READ MORE

answered Apr 25, 2018 in Data Analytics by shams
• 3,580 points
0 votes
1 answer

What are the important skills to have in Python with regard to data analysis?

The following are some of the important ...READ MORE

answered Aug 20, 2018 in Data Analytics by Anmol
• 3,620 points
0 votes
3 answers

%>% What are these symbols/characters used for in R?

%>% is called a pipe. The process ...READ MORE

answered Aug 6 in Data Analytics by anonymous
0 votes
1 answer

Use different distance formula other than euclidean distance in k means

K-means is based on variance minimization. The sum-of-variance formula ...READ MORE

answered Jun 21, 2018 in Data Analytics by darklord
• 6,190 points
+1 vote
1 answer

How to handle Nominal Data?

Nominal data is basically data which can ...READ MORE

answered Jul 23, 2018 in Data Analytics by Anmol
• 3,620 points
+1 vote
2 answers

How to handle outliers

There are multiple ways to handle outliers ...READ MORE

answered Jul 23, 2018 in Data Analytics by Anmol
• 3,620 points
0 votes
1 answer

How to find out the sum/mean for multiple variables per group in R?

You can use the reshape2 package for ...READ MORE

answered Apr 12, 2018 in Data Analytics by DataKing99
• 8,130 points
0 votes
1 answer

List packages are used for data mining in R?

You can refer to the following packages ...READ MORE

answered Jul 3, 2018 in Data Analytics by DataKing99
• 8,130 points