Interesting paper , its a paper all about scaling convnets kinda not new architecture but also a new architecture per say but the authors provide we detailed analysis and back up by data that scaling any of

Width of Network - Number of Neurons in a layer .

Depth - Increasing the Depth i.e - number of layers.

Resolution - The input image - W X H

**Observation 1 : **As , they say the scaling up any dimension of network , width , depth or resolution improves accuracy , but accuracy gain diminishes for bigger models.

Interestingly the number of FLOPs also increase if we increase all these parameters(duh!) but there is definitely a sweet spot .

**Observation 2 **: In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling.

Ill just add these here -

**compound scaling** method, which uses a compound coefficientφto uniformly scaled network width, depth, and resolution in a principled way:

depth: d=α^φ

width: w=β^φ

resolution: r=γ^φ

s.t. α·β2·γ2≈2α≥1,

β≥1,γ≥1

where α,β,γ are constants that can be determined by a small grid search. Intuitively, φis a user-specified coefficient that controls how many more resources are available for model scaling, whileα,β,γspecify how to assign these extra resources to network width, depth, and resolution respectively. Notably, the FLOPS of a regular convolution op is proportional to d,w^2,r^2, i.e., doubling network depth will double FLOPS, but doubling network width or resolution will increase FLOPS by four times. Since **convolution ops usually dominate the computation cost in ConvNets**, scaling a ConvNet with equation 3 will approximately increase total FLOPS by(α·β^2·γ^2)^φ. In this paper, we constraint α·β^2·γ^2≈2 such that for any newφ, the total FLOPS will approximately increase by2^φ.

So basically , I can estimate how much maximum FLOPs will be required by using φ (2^φ) while , hoping for increase in accuracy , I have control to maximize the resource usage .

**Architecture** -

Similar to MnasNet , mobile inverted bottleneck **MBConv** with **squeeze-and-excitation** optimization.

Method to fix the parameters is :

keep φ= 1, assuming twice more resources available, and do a small grid search ofα,β,γbased on Equation 2 and 3. In particular, we find the best values for EfficientNet-B0 areα= 1.2,β=1.1,γ= 1.15, under constraint ofα·β^2·γ^2≈2.

Then

we then fixα,β,γ as constants and scale up baseline network with differentφusing Equation 3, to obtain EfficientNet-B1 to B7 .

Experiments : Basically scaling anything with this compound method increases accuracy . Lots of experiments best to look at the paper. Also can be transfer learned .

## Comments