Query regarding Temporary table s metadata in HIVE

Question

Hi,

Suppose I have created a temporary table in HIVE and used that table for further processing. If i create a temporary table where does the schema (or) Metadata gets stored, does it created in an External RDBMS table or in HDFS?

If in HDFS where does that reside?

score 0 · Answer 1 · May 22, 2019

Registered tables are not cached in memory. The registerTempTable createOrReplaceTempView method will just create or replace a view of the given DataFrame with a given query plan. It will convert the query plan to canonicalized SQL string, and store it as view text in metastore, if we need to create a permanent view.

You'll need to cache your DataFrame explicitly. e.g :

df.createOrReplaceTempView("my_table") # df.registerTempTable("my_table") for spark <2.+

spark.cacheTable("my_table")

Let's illustrate this with an example :

Using cacheTable :

scala> val df = Seq(("1",2),("b",3)).toDF

// df: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> sc.getPersistentRDDs

// res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()


scala> df.createOrReplaceTempView("my_table")


scala> sc.getPersistentRDDs

// res2: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()

scala> spark.cacheTable("my_table")

scala> sc.getPersistentRDDs

// res4: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(2 -> In-memory table my_table MapPartitionsRDD[2] at cacheTable at <console>:26)

Now the same example using cache.registerTempTable cache.createOrReplaceTempView :

scala> sc.getPersistentRDDs

// res2: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()


scala> val df = Seq(("1",2),("b",3)).toDF

// df: org.apache.spark.sql.DataFrame = [_1: string, _2: int]


scala> df.createOrReplaceTempView("my_table")


scala> sc.getPersistentRDDs

// res4: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map()


scala> df.cache.registerTempTable("my_table")


scala> sc.getPersistentRDDs

// res6: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] =

// Map(2 -> ConvertToUnsafe

// +- LocalTableScan [_1#0,_2#1], [[1,2],[b,3]]

// MapPartitionsRDD[2] at cache at <console>:28)