Title Motivation Methods Results Conclusion Citation and Links

Emergence of Visual Center-Periphery Spatial Organization in Deep Convolutional Neural Networks

Published Here in Scientific Reports, 2020

Motivation

Center/Periphery bias hummingbird example
Motivation Figure: Some high level visual brain regions, such as fusiform and IT, maximally respond when objects are centered. Conversely, scene-selective regions, such as PHC, maximally respond when the background remains on the periphery. A) The hummingbird is centered with the background along the periphery therefore satisfying the topographical preferences of fusiform, IT, and PHC. B) The hummingbird is on the periphery while the background occupies the center of the image. This arrangement leads to sub-optimal responses in fusiform, IT, and PHC.

Our team wondered that given these two findings:


  1. Responses in certain layers in Deep Convolutional Neural Networks (DCNNs) show striking similarities to responses in certain visual brain regions (Yamins et al., PNAS, 2013; Yamins et al., NeurIPS, 2014)
  2. These visual brain regions exhibit topographical preferences (Levy et al., Nature Neuroscience, 2001)


Do layers in DCNNs also show topographical preferences?

Methods

Filler image
Methods Figure: Creating topographical correlation maps. We extract the 3D activation patterns from the network convolutional layers. The first 2 dimensions have a spatial relation with the image space (width and height). At each (x, y) position in feature maps, we extract a pattern vector with the length equivalent to the depth and construct the RDM matrix from the neural network activity patterns at each (x, y) location. Comparison of these RDM matrices with a brain ROI RDM results in a 2D correlation map which we up-sample to the image size (topographical map). Pictures used in this figure are not examples of the stimulus set due to copyright.

We measured the responses from human brains and a DCNN after "viewing" a stimuli set of 156 natural images:


  • Brain Data: Subjects (n=15) viewed the 156 images many times in an fMRI scanner. We generated a Representational Dissimilarity Matrix (RDM) for EVC, Fusiform, IT, and PHC by computing the pairwise distance between the brain activations from each pair of images.
  • Model Data: Hybrid-CNN (Zhou et al., NeurIPS, 2014) is an AlexNet with 5 convolutional layers and 3 fully connected layers trained on both object and scene recognition tasks. We generated an RDM for each unit (each unit corresponds to a spatial location in the image) within each of the 5 convolutional layers by computing the pairwise distance between the model activations from each pair of the 156 images (156 images were not used in model training).


We then correlate (Spearman) the RDMs from the brain regions with the RDMs from the model units using RSA (Kriegeskorte, Frontiers in Systems Neuroscience, 2008) to produce a topographical correlation map. This allows us to quantify and visualize the topographical correspondence between brain regions and DCNNs.

Results

Filler image
Results Figure: Topographical correspondence between convolutional layers of DCNNs and human ventral visual regions. For each brain-model mapping (EVC, Fusiform, IT, PHC), the first five maps show the correlational topographical maps between each convolutional layer and the brain ROI; the second five maps show the corresponding significance maps (two-sided sign permutation tests, cluster defining threshold Pā€‰<ā€‰0.01, and corrected significance level Pā€‰<ā€‰0.05).

Results are presented in two steps:


  1. We replicated many previous works showing a correspondence between convolutional network layers and brain regions (Motivation point #1). Specifically in our data,
    • EVC: Significant in layers 1 and 2
    • Fusiform: Significant in layers 2, 3, 4, and 5
    • IT: Significant in layers 4 and 5
    • PHC: Significant in layers 2, 3, 4, and 5
  2. We analyzed whether each brain regions corresponding significant network layers showed the same center/periphery bias as the brain region (See figure on left).
    • EVC/Layers 1-2: Shows a randomly distributed topographical preferences
    • Fusiform/Layers 2-5: Shows strong center bias
    • IT/Layers 4-5: Shows strong center bias
    • PHC/Layers 2-5: Shows clear periphery bias in layer 2 with a transition to a distributed pattern

Conclusion

Filler image

Our results revealed foveally biased fusiform and IT highly correlated with unit activations of the network with the center selective visual feld and peripherally biased PHC strongly correlated with unit activations of the network with periphery selective receptive felds. We demonstrated for the frst time a topographical correspondence (central/periphery biases) between ventral brain regions and unit activations of the Hybrid-CNN.


These findings support two main hypotheses in vision neuroscience:
  • The human visual pathway optimizes the cost function of visual recognition
  • The characteristics of neural tuning, internal neural representations and brain area functions along this pathway are most likely the result of this cost function optimization (Marbelstone et al., Frontiers in Computational Neuroscience, 2016)


Future directions:
  • Explore whether the topographical biases found in the brain and model are a result of the statistics of the image or learned weights.
  • Use DCNNs to investigate the underlying computational principles involved in the brain's hierarchical and topographical organization.