In our previous blog post we have seen learnt how to start with Pig Programming and Scripting. There, we learnt the steps to write a Pig Script in HDFS Mode. In the second part of this series, we will review the steps to write a Pig script in Local Mode.
Pig Script in Local Mode
Step1: Writing a Script
- Open an editor (e.g. gedit) in your Cloudera Demo VM environment.
- Write the following command to create ‘sample.pig’ file inside the home directory of cloudera user:
Command: gedit sample.pig
Let’s write few PIG commands in the sample script!
Let’s say our task is to read data from a data file and to display the required contents on the console as output.
The sample data file contains the following data:
Save the text file with the name ‘information.txt’
The file contains five columns FirstName, LastName, MobileNo, City, and Profession separated by tab key. Our task is to read the content of this file and display First Name, Mobile Number and Profession of a contact.
To process this data using Pig, this file should be present in local file system because we are working in local mode of Pig.
Edit the Pig script (sample.pig) to include following commands:
Here, the data-set file ‘information.txt’ is present in cloudera directory and hence, we have specified the file path ‘/home/cloudera/information.txt’.
Save and close the file.
- The first command loads the file information.txt into variable A with indirect schema
 (FName, LName, MobileNo, City, and Profession).
- The second command loads the required data from variable A to variable B.
- The third line displays the content of variable B on the terminal/console.
Step 2: Execute the Pig Script
To execute the pig script in local mode, run the following command:
Command: pig –x local sample.pig
Review the result.
Congratulations on successful execution of the Pig script and getting a step ahead in Pig Programming!
Got a question for us? Mention them in the comments section and we will get back to you.
Related Posts: