If the existing Keras layers don’t meet your requirements you can
create a custom layer. For simple, stateless custom operations, you are
probably better off using layer_lambda() layers. But for
any custom operation that has trainable weights, you should implement
your own layer.
The example below illustrates the skeleton of a Keras custom layer.
Layers encapsulate a state (weights) and some computation. The main
data structure you’ll work with is the Layer. A layer
encapsulates both a state (the layer’s “weights”) and a transformation
from inputs to outputs (a “call”, the layer’s forward pass).
library(tensorflow)
library(keras)
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units, input_dim) {
    super()$`__init__`()
    w_init <- tf$random_normal_initializer()
    self$w <- tf$Variable(
      initial_value = w_init(shape = shape(input_dim, units),
                             dtype = tf$float32)
      )
    b_init <- tf$zeros_initializer()
    self$b <- tf$Variable(
      initial_value = b_init(shape = shape(units),
                             dtype = tf$float32)
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  }
)
x <- tf$ones(shape = list(2,2))
layer <- layer_linear(units = 4, input_dim = 2)
y <- layer(x)
yNote that the weights w and b are automatically tracked by the layer upon being set as layer attributes.
Note you also have access to a quicker shortcut for adding weight to
a layer: the add_weight method:
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units, input_dim) {
    super()$`__init__`()
    self$w <- self$add_weight(
      shape = shape(input_dim, units),
      initializer = "random_normal",
      trainable = TRUE
    )
    self$b <- self$add_weight(
      shape = shape(units),
      initializer = "zeros",
      trainable = TRUE
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  }
)It’s important to call
super()$__init__() in the
initialize method.
Note that tensor operations are executed using the Keras
backend(). See the Keras Backend article for details on the
various functions available from Keras backends.
Besides trainable weights, you can add non-trainable weights to a layer as well. Such weights are meant not to be taken into account during backpropagation, when you are training the layer.
Here’s how to add and use a non-trainable weight:
layer_compute_sum <- Layer(
  classname = "ComputeSum",
  initialize = function(input_dim) {
    super()$`__init__`()
    self$total <- tf$Variable(
      initial_value = tf$zeros(shape(input_dim)),
      trainable = FALSE
    )
  },
  call = function(inputs, ...) {
    self$total$assign_add(tf$reduce_sum(inputs, axis = 0L))
    self$total
  }
)
x <- tf$ones(shape(2,2))
mysum <- layer_compute_sum(input_dim = 2)
print(mysum(x))
print(mysum(x))It’s part of layer$weights but it gets categorized as a
non-trainable weight:
In Linear example above, our Linear layer took an input_dim argument
that was used to compute the shape of the weights w and b in
initialize:
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units, input_dim) {
    super()$`__init__`()
    self$w <- self$add_weight(
      shape = shape(input_dim, units),
      initializer = "random_normal",
      trainable = TRUE
    )
    self$b <- self$add_weight(
      shape = shape(units),
      initializer = "zeros",
      trainable = TRUE
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  }
)In many cases, you may not know in advance the size of your inputs, and you would like to lazily create weights when that value becomes known, some time after instantiating the layer.
In the Keras API, we recommend creating layer weights in the
build(inputs_shape) method of your layer. Like this:
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units) {
    super()$`__init__`()
    self$units <- units
  },
  build = function(input_shape) {
    self$w <- self$add_weight(
      shape = shape(input_shape[2], self$units),
      initializer = "random_normal",
      trainable = TRUE
    )
    self$b <- self$add_weight(
      shape = shape(self$units),
      initializer = "zeros",
      trainable = TRUE
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  }
)The call method of your layer will automatically run
build the first time it is called. You now have a layer that’s lazy and
easy to use:
If you assign a Layer instance as attribute of another Layer, the outer layer will start tracking the weights of the inner layer.
We recommend creating such sublayers in the initialize
method (since the sublayers will typically have a build method, they
will be built when the outer layer gets built).
# Let's assume we are reusing the Linear class
# with a `build` method that we defined above.
layer_mlp_block <- Layer(
  classname = "MLPBlock",
  initialize = function() {
    super()$`__init__`()
    self$linear_1 <- layer_linear(units = 32)
    self$linear_2 <- layer_linear(units = 32)
    self$linear_3 <- layer_linear(units = 1)
  },
  call = function(inputs, ...) {
    inputs %>% 
      self$linear_1() %>% 
      tf$nn$relu() %>% 
      self$linear_2() %>% 
      tf$nn$relu() %>% 
      self$linear_3()
  }
)
mlp <- layer_mlp_block()
y <- mlp(tf$ones(shape(3, 64)))  # The first call to the `mlp` will create the weights
length(mlp$weights)
length(mlp$trainable_weights)When writing the call method of a layer, you can create
loss tensors that you will want to use later, when writing your training
loop. This is doable by calling self$add_loss(value):
# A layer that creates an activity regularization loss
layer_activity_reg <- Layer(
  classname = "ActivityRegularizationLayer",
  initialize = function(rate = 1e-2) {
    super()$`__init__`()
    self$rate <- rate
  },
  call = function(inputs) {
    self$add_loss(self$rate * tf$reduce_sum(inputs))
    inputs
  }
)These losses (including those created by any inner layer) can be
retrieved via layer$losses. This property is reset at the
start of every call to the top-level layer, so that
layer$losses always contains the loss values created during
the last forward pass.
layer_outer <- Layer(
  classname = "OuterLayer",
  initialize = function() {
    super()$`__init__`()
    self$dense <- layer_dense(
      units = 32, 
      kernel_regularizer = regularizer_l2(1e-3)
    )
  },
  call = function(inputs) {
    self$dense(inputs)
  }
)
layer <- layer_outer()
x <- layer(tf$zeros(shape(1,1)))
# This is `1e-3 * sum(layer.dense.kernel ** 2)`,
# created by the `kernel_regularizer` above.
layer$lossesIf you need your custom layers to be serializable as part of a
Functional model, you can optionally implement a get_config
method.
Note that the initialize method of the base Layer class
takes some keyword arguments, in particular a name and a
dtype. It’s good practice to pass these arguments to the
parent class in initialize and to include them in the layer
config:
layer_linear <- Layer(
  classname = "Linear", 
  initialize = function(units, ...) {
    super()$`__init__`(...)
    self$units <- units
  },
  build = function(input_shape) {
    self$w <- self$add_weight(
      shape = shape(input_shape[2], self$units),
      initializer = "random_normal",
      trainable = TRUE
    )
    self$b <- self$add_weight(
      shape = shape(self$units),
      initializer = "zeros",
      trainable = TRUE
    )
  },
  call = function(inputs, ...) {
    tf$matmul(inputs, self$w) + self$b
  },
  get_config = function() {
    list(
      units = self$units
    )
  }
)
layer <- layer_linear(units = 64)
config <- get_config(layer)
new_layer <- from_config(config)If you need more flexibility when deserializing the layer from its
config, you can also override the from_config class method.
This is the base implementation of from_config:
Some layers, in particular the layer_batch_normalization
and the layer_dropout, have different behaviors during
training and inference. For such layers, it is standard practice to
expose a training (boolean) argument in the call method.
By exposing this argument in call, you enable the built-in training and evaluation loops (e.g. fit) to correctly use the layer in training and inference.