Why feed_dict is constructed when running epoch in PTB tutorial on Tensorflow?

0 votes

Q1: I am following this tutorial on Recurrent Neural Networks, and I am wondering why do you need to create feed_dict in the following part of the code:

def run_epoch(session, model, eval_op=None, verbose=False):

  state = session.run(model.initial_state)

  fetches = {
      "cost": model.cost,
      "final_state": model.final_state,
  }
  if eval_op is not None:
    fetches["eval_op"] = eval_op

  for step in range(model.input.epoch_size):
    feed_dict = {}
    for i, (c, h) in enumerate(model.initial_state):
      feed_dict[c] = state[i].c
      feed_dict[h] = state[i].h

    vals = session.run(fetches, feed_dict)

I tested and it seems that if you remove this part of the code, the code also runs:

def run_epoch(session, model, eval_op=None, verbose=False):

  fetches = {
      "cost": model.cost,
      "final_state": model.final_state,
  }
  if eval_op is not None:
    fetches["eval_op"] = eval_op

  for step in range(model.input.epoch_size):
    vals = session.run(fetches)

So my question is why do you need to reset the initial state to zeros after you feed a new batch of data?

Q2: Also, from what I understand using feed_dict is considered to be slow. That is why it is recommended to feed data using tf.data APIs. Is using feed_dict also an issue in this case? If so, how is it possible to avoid using feed_dict in this example.

UPD: Thank you a lot @jdehesa for your detailed response. It helps a lot! Just before I close this question and accept your answer, could you clarify one point that you mentioned answering Q1.

I see now the purpose of feed_dict. However, I am not sure that it is something that is implemented in the tutorial. From what you say:

At the beginning of each epoch, the code first takes the default "zero state" and then goes on to a loop where the current state is given as initial, the model is run and the output state is set as new current state for the next iteration.

I just looked again into the source code of the tutorial, and I do not see where the the output state is set as new current state for the next iteration. Is it done somewhere implicitly or do I miss something?

I maybe also missing something on theoretical side. Just to make sure that I understand it correctly, here there is a quick example. Assume the input data is an array that stores integer values from 0 to 120. We set the batch size is 5, the number of data points in one batch is 24, and the number of time steps in unrolled RNN is 10. In this case you, you only use data points at time points from 0to 20. Then you process the data in two steps (model.input.epoch_size = 2). When you iterate over model.input.epoch_size:

state = session.run(model.initial_state)
# ...
for step in range(model.input.epoch_size):
  feed_dict = {}
  for i, (c, h) in enumerate(model.initial_state):
    feed_dict[c] = state[i].c
    feed_dict[h] = state[i].h

  vals = session.run(fetches, feed_dict)
Oct 8, 2018 in Python by eatcodesleeprepeat
• 4,670 points
64 views

So my question is why do you need to reset the initial state to zeros after you feed a new batch of data?

Q2: Also, from what I understand using feed_dict is considered to be slow. That is why it is recommended to feed data using tf.data APIs. Is using feed_dict also an issue in this case? If so, how is it possible to avoid using feed_dict in this example.

UPD: Thank you a lot @jdehesa for your detailed response. It helps a lot! Just before I close this question and accept your answer, could you clarify one point that you mentioned answering Q1.

I see now the purpose of feed_dict. However, I am not sure that it is something that is implemented in the tutorial. From what you say:

At the beginning of each epoch, the code first takes the default "zero state" and then goes on to a loop where the current state is given as initial, the model is run and the output state is set as new current state for the next iteration.

I just looked again into the source code of the tutorial, and I do not see where the the output state is set as new current state for the next iteration. Is it done somewhere implicitly or do I miss something?

I maybe also missing something on theoretical side. Just to make sure that I understand it correctly, here there is a quick example. Assume the input data is an array that stores integer values from 0 to 120. We set the batch size is 5, the number of data points in one batch is 24, and the number of time steps in unrolled RNN is 10. In this case you, you only use data points at time points from 0to 20. Then you process the data in two steps (model.input.epoch_size = 2). When you iterate over model.input.epoch_size:

state = session.run(model.initial_state)
# ...
for step in range(model.input.epoch_size):
  feed_dict = {}
  for i, (c, h) in enumerate(model.initial_state):
    feed_dict[c] = state[i].c
    feed_dict[h] = state[i].h

  vals = session.run(fetches, feed_dict)

you feed a batch of data like this:

> Iteration (step) 1:
x:
 [[  0   1   2   3   4   5   6   7   8   9]
 [ 24  25  26  27  28  29  30  31  32  33]
 [ 48  49  50  51  52  53  54  55  56  57]
 [ 72  73  74  75  76  77  78  79  80  81]
 [ 96  97  98  99 100 101 102 103 104 105]]
y:
 [[  1   2   3   4   5   6   7   8   9  10]
 [ 25  26  27  28  29  30  31  32  33  34]
 [ 49  50  51  52  53  54  55  56  57  58]
 [ 73  74  75  76  77  78  79  80  81  82]
 [ 97  98  99 100 101 102 103 104 105 106]]

> Iteration (step) 2:
x:
 [[ 10  11  12  13  14  15  16  17  18  19]
 [ 34  35  36  37  38  39  40  41  42  43]
 [ 58  59  60  61  62  63  64  65  66  67]
 [ 82  83  84  85  86  87  88  89  90  91]
 [106 107 108 109 110 111 112 113 114 115]]
y:
 [[ 11  12  13  14  15  16  17  18  19  20]
 [ 35  36  37  38  39  40  41  42  43  44]
 [ 59  60  61  62  63  64  65  66  67  68]
 [ 83  84  85  86  87  88  89  90  91  92]
 [107 108 109 110 111 112 113 114 115 116]]

At each iteration, you construct a new feed_dict with the initial state of he recurrent units at zero. So you assume at each step that you start processing the sequence from scratch. Is it correct?

1 answer to this question.

0 votes
  • Q1. feed_dict is used in this case to set the initial state of the recurrent units. By default, on each call to run recurrent units process data with an initial "zero" state. However, if your sequences are long you may need to split them into several steps. It is important that, after each step, you save the final state of the recurrent units and input as initial state for the next step, otherwise it would be as if the next step were the beginning of the sequence again (in particular, if your output is only the final output of the network after processing the whole sequence, it would be like discarding all the data prior to the last step). At the beginning of each epoch, the code first takes the default "zero state" and then goes on to a loop where the current state is given as initial, the model is run and the output state is set as new current state for the next iteration.

  • Q2. The claim the "feed_dict is slow" can be somewhat misleading, taken as a general truism (I am not blaming you for saying it, I have seen it many times too). The problem with feed_dictis that its function is to take non-TensorFlow data (typically NumPy data) into TensorFlow world. It is not that it is terrible at that, it is just that it takes some extra time to move the data around, which is especially notable when a lot of data is involved. For example, if you want to input a batch of images through feed_dict, you need to load them from disk, decode them, convert it to a big NumPy array and pass it into feed_dict, then TensorFlow would copy all the data into the session (GPU memory or whatever); so you would two copies of the data in memory and additional memory exchanges. tf.data helps because it does everything within TensorFlow (which also reduces the number of Python/C trips and is sometimes more convenient in general). In your case, what is being fed through feed_dict are the initial states of the recurrent units. Unless you have several quite big recurrent layers I'd say the performance impact is probably rather small. It is possible, though, to avoid feed_dict in this case too, you would need to have a set of TensorFlow variables holding the current state, set up the recurrent units to use their output as initial state (with the initial_state parameter of tf.nn.dynamic_rnn) and use their final state to update the variable values; then on each new batch you would have to reinitialize the variables to the "zero" state again. However, I would make sure that this is going to have a significant benefit before going down that route (e.g. measure runtime with and without feed_dict, even though the results will be wrong).

EDIT:

As a clarification for the update, I copied here the relevant lines of the code:

state = session.run(model.initial_state)

fetches = {
    "cost": model.cost,
    "final_state": model.final_state,
}
if eval_op is not None:
  fetches["eval_op"] = eval_op

for step in range(model.input.epoch_size):
  feed_dict = {}
  for i, (c, h) in enumerate(model.initial_state):
    feed_dict[c] = state[i].c
    feed_dict[h] = state[i].h

  vals = session.run(fetches, feed_dict)
  cost = vals["cost"]
  state = vals["final_state"]

  costs += cost
  iters += model.input.num_steps

At the beginning of an epoch, state takes the value of model.initial_state, which, unless a feed_dict replacing its values is given, will be the default "zero" initial state value. fetches is a dictionary that is passed to session.run later so it return another dictionary where (among other things) the key "final_state" will hold the final state value. Then, on every step, a feed_dict is created that replaces the initial_state tensor values with the data in state, and run is called with that feed_dict to retrieve the values of the tensors in fetches, and vals holds then the outputs of the run call. The line state = vals["final_state"] replaces the contents of statewhich was our current state value, with the output state of the last run; so on the next iteration feed_dict will hold the values of the previous last state, and so the network will continue "as if" the whole sequence had been given in one go. In the next call to run_epoch, state will be initialized again as the default value of model.initial_state and the process will start from "zero" again.

answered Oct 8, 2018 by Priyaj
• 56,520 points

Related Questions In Python

0 votes
1 answer

How do I determine if my python shell is executing in 32bit or 64bit mode on OS X?

UPDATED: One way is to look at sys.maxsize as ...READ MORE

answered Dec 11, 2018 in Python by ariaholic
• 7,340 points
85 views
0 votes
1 answer

When is the perfect time to use Tornado in python?

There is a server and a webframework. ...READ MORE

answered Feb 14 in Python by ariaholic
• 7,340 points
41 views
0 votes
1 answer

Why am I getting a error when I am printing two different data types in python?

different data type is being used. that ...READ MORE

answered Mar 6 in Python by Mohammad
• 3,060 points
38 views
0 votes
1 answer

How to iterate over a string when there is successive increase in its length?

The following code might help -  mystring = ...READ MORE

answered Jul 22 in Python by Arvind
• 2,960 points
15 views
+1 vote
2 answers

how can i count the items in a list?

Syntax :            list. count(value) Code: colors = ['red', 'green', ...READ MORE

answered Jul 6 in Python by Neha
• 330 points

edited Jul 8 by Kalgi 230 views
0 votes
1 answer

Why feed_dict is constructed when running epoch in PTB tutorial on Tensorflow?

Q1. feed_dict is used in this case to set ...READ MORE

answered Oct 3, 2018 in Python by Priyaj
• 56,520 points
61 views
0 votes
1 answer

Why there is no do while loop in python

There is no do...while loop because there ...READ MORE

answered Aug 6, 2018 in Python by Priyaj
• 56,520 points
117 views