What is the best way to integrate SAS with Hadoop without losing the parallel processing capacity of Hadoop

Question

I am trying to understand the integration between SAS and Hadoop. From what I understand, SAS processes like proc sql can only work against a SAS data set, I cannot issue proc sql against a text file on a hadoop node. Is it correct?

If yes, then I need to uses some ETL jobs to first take the data out of HDFS and convert it to SAS tables. But if I do that, I will lose the parallel processing capabilties of Hadoop.

So what is the ideal way of integrating SAS and Hadoop and still use the parallel processing power of Hadoop?

I understand you can call a map reduce job from inside SAS, but can a map reduce job be written in SAS? I think not.

Frankie · Answer

One of the major pushes at SAS Global Forum 2015 was actually the new options for connections to Hadoop and Teradata.&#160;FEDSQL&#160;and&#160;DS2, new in SAS 9.4, exist in part specifically to enable SAS to better work with Hadoop. You can execute code directly in your Hadoop node, as well as do a lot more efficient processing in SAS directly.Assuming you have the most recent release of SAS (9.4 TS1M3), you can look at the&#160;SAS Release Notes&#160;(Current as of 9/3/2015; in the future this will point to later versions). That includes information like the following:In the second maintenance release for SAS 9.4, the SAS In-Database Code Accelerator for Hadoop runs the DS2 data program as well as the thread program inside the database. Several new functions have been added. The HTTP package enables you to construct an HTTP client to access web services and a new logger enables logging of HTTP traffic. A connection string parameter is available when instantiating an SQLSTMT package.SAS FedSQL is a SAS proprietary implementation of the ANSI SQL:1999 core standard. It provides support for new data types and other ANSI 1999 core compliance features and proprietary extensions. FedSQL provides data access technology that brings a scalable, threaded, high-performance way to access, manage, and share relational data in multiple data sources. FedSQL is a vendor-neutral SQL dialect that accesses data from various data sources without submitting queries in the SQL dialect that is specific to the data source. In addition, a single FedSQL query can target data in several data sources and return a single result table. The FEDSQL procedure enables you to submit FedSQL language statements from a Base SAS session. The first maintenance release for SAS 9.4 adds support for Memory Data Store (MDS), SAP HANA, and SASHDAT data sources.In the second maintenance release for SAS 9.4, SAS FedSQL supports Hive, HDMD, and PostgreSQL data sources. Data types can be converted to another data type. You can add DBMS-specific clauses to the end of the CREATE INDEX statement, and you can write a SASHDAT file in compressed format.In the third maintenance release of SAS 9.4, FedSQL has added support for HAWQ and Impala distributions of Hadoop, enhanced support for Impala, new data types, and more.Hadoop SupportThe first maintenance release for SAS 9.4 enables you to use the SPD Engine to read, write, and update data in a Hadoop cluster through the HDFS. In addition, you can now use the HADOOP procedure to submit configuration properties to the Hadoop server.In the second maintenance release for SAS 9.4, performance has been improved for the SPD Engine access to Hadoop. The SAS Hadoop Configuration Guide for Base SAS and SAS/ACCESS is available from the support.sas.com third-party site for Hadoop.In the third maintenance release of SAS 9.4, access to data stored in HDFS is enhanced with a new distributed lock manager and therefore easier access to Hadoop clusters using Hadoop configuration files.Beyond this, there is extensive documentation and papers written on the subject; documentation for the&#160;SAS Connector for Hadoop, for example.

What is the best way to integrate SAS with Hadoop without losing the parallel processing capacity of Hadoop

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

I have to ingest in hadoop cluster large number of files for testing , what is the best way to do it?

Best way of starting & stopping the Hadoop daemons with command line

What is the best way to merge multi-part HDFS files into single file?

What is the use of sequence file in Hadoop?

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

Hadoop dfs -ls command?

What is the best functional language to do Hadoop Map-Reduce?

What is the standard way to create files in your hdfs file-system?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES