Spatial Transformer Networks
Contact me
- Blog -> https://cugtyt.github.io/blog/index
- Email -> cugtyt@qq.com
- GitHub -> Cugtyt@GitHub
本系列博客主页及相关见此处
Abstract
The ability of spatially invariant of CNN is limited. This paper introduces a Spatial Transformer module, which manipulate the data in spatial way, and can be insserted to any CNN network, without andy extra training. It achieved state-of-the-art performance.
1 Introduction
Spatial transformer can be helpful for mutifariout tasks:
- classification. It crops out and scale-normalizes the appropriate region, which simplify the subsequent task.
- co-localisation.
- spatial attention.
3 Spatial Transformers
Spatial transformers mechanism is split into three parts:
- localisation network. It takes the input feature map, and outputs the parameters that should be applied to the feature map.
- grid generator. The output parameters of localisation network are used to create a sampling grid, which is a set of points where the input map should be used in the next stage.
- sampler, produce the output
3.1 Localisation Network
- input: $U \in \mathbb{R}^{H \times W \times C}$
- output: $\theta$
$\theta$ is varied on different transformation type, affine transformation $\theta$ is 6-dim. $\theta = f_{loc}(U)$
3.2 Parameterised Sampling Grid
only 6 params are reuqired to define $A_\theta$
Differentiable Image Sampling
Integer sampling kernel reduces to: