Multidimensional softmax

Creating a Softmax Output Layer

When state_below is a 2D Tensor, U is a 2D weights matrix, b is a class_size-length vector:

logits = tf.matmul(state_below, U) + b
return tf.nn.softmax(logits)

When state_below is a 3D tensor, U, b as before:

def softmax_fn(current_input):
    logits = tf.matmul(current_input, U) + b
    return tf.nn.softmax(logits)

raw_preds = tf.map_fn(softmax_fn, state_below)

Computing Costs on a Softmax Output Layer

Use tf.nn.sparse_softmax_cross_entropy_with_logits, but beware that it can’t accept the output of tf.nn.softmax. Instead, calculate the unscaled activations, and then the cost:

logits = tf.matmul(state_below, U) + b
cost = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)

In this case: state_below and U should be 2D matrices, b should be a vector of a size equal to the number of classes, and labels should be a 2D matrix of int32 or int64. This function also supports activation tensors with more than two dimensions.