Python-ConvNeXt: Code release for ConvNeXt model

A ConvNet for the 2020s

Official PyTorch implementation of ConvNeXt, from the following paper:

A ConvNet for the 2020s. arXiv 2022.
Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell and Saining Xie
Facebook AI Research, UC Berkeley

We propose ConvNeXt, a pure ConvNet model constructed entirely from standard ConvNet modules. ConvNeXt is accurate, efficient, scalable and very simple in design.


  • ImageNet-1K Training Code
  • ImageNet-22K Pre-training Code
  • ImageNet-1K Fine-tuning Code
  • Downstream Transfer (Detection, Segmentation) Code

Results and Pre-trained Models

ImageNet-1K trained models

name resolution [email protected] #params FLOPs model
ConvNeXt-T 224x224 82.1 28M 4.5G model
ConvNeXt-S 224x224 83.1 50M 8.7G model
ConvNeXt-B 224x224 83.8 89M 15.4G model
ConvNeXt-B 384x384 85.1 89M 45.0G model
ConvNeXt-L 224x224 84.3 198M 34.4G model
ConvNeXt-L 384x384 85.5 198M 101.0G model

ImageNet-22K trained models

name resolution [email protected] #params FLOPs 22k model 1k model
ConvNeXt-B 224x224 85.8 89M 15.4G model model
ConvNeXt-B 384x384 86.8 89M 47.0G - model
ConvNeXt-L 224x224 86.6 198M 34.4G model model
ConvNeXt-L 384x384 87.5 198M 101.0G - model
ConvNeXt-XL 224x224 87.0 350M 60.9G model model
ConvNeXt-XL 384x384 87.8 350M 179.0G - model

ImageNet-1K trained models (isotropic)

name resolution [email protected] #params FLOPs model
ConvNeXt-S 224x224 78.7 22M 4.3G model
ConvNeXt-B 224x224 82.0 87M 16.9G model
ConvNeXt-L 224x224 82.6 306M 59.7G model


Please check for installation instructions.


We give an example evaluation command for a ImageNet-22K pre-trained, then ImageNet-1K fine-tuned ConvNeXt-B:


python --model convnext_base --eval true \
--resume \
--input_size 224 --drop_path 0.2 \
--data_path /path/to/imagenet-1k


python -m torch.distributed.launch --nproc_per_node=8 \
--model convnext_base --eval true \
--resume \
--input_size 224 --drop_path 0.2 \
--data_path /path/to/imagenet-1k

This should give

* [email protected] 85.820 [email protected] 97.868 loss 0.563
  • For evaluating other model variants, change --model, --resume, --input_size accordingly. You can get the url to pre-trained models from the tables above.
  • Setting model-specific --drop_path is not strictly required in evaluation, as the DropPath module in timm behaves the same during evaluation; but it is required in training. See or our paper for the values used for different models.


See for training and fine-tuning instructions.


This repository is built using the timm library, DeiT and BEiT repositories.


This project is released under the MIT license. Please see the LICENSE file for more information.


If you find this repository helpful, please consider citing:

  author  = {Zhuang Liu and Hanzi Mao and Chao-Yuan Wu and Christoph Feichtenhofer and Trevor Darrell and Saining Xie},
  title   = {A ConvNet for the 2020s},
  journal = {arXiv preprint arXiv:2201.03545},
  year    = {2022},


  • [Feature Request] quantization code for ConvNeXt
    [Feature Request] quantization code for ConvNeXt

    Jan 17, 2022

    Actually, I tried to use torch.fx to quant ConvNeXt to see the performance after quantization. But I get this error: TypeError: dequantize() takes no arguments (1 given). Could you please help?

    import torch
    from torch.quantization import quantize_fx
    torch.backends.cudnn.enabled = False
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    device_cpu = torch.device("cpu")
    save_ckpt_path = ''
    from models.convnext import convnext_tiny
    model = convnext_tiny(False)
    model.eval() # essential
    inp_size = (224,224)
    res = model(torch.randn((1,3,inp_size[0],inp_size[1])).to(device_cpu))
    graph_module = torch.fx.symbolic_trace(model)
    qconfig_dict = {'': torch.quantization.get_default_qat_qconfig('qnnpack')}
    # qconfig_dict = {'': torch.quantization.get_default_qat_qconfig('fbgemm')}
    mp = quantize_fx.prepare_fx(graph_module, qconfig_dict)
    def eval_fn(model, device=device_cpu):
        with torch.no_grad():
            for i in range(20):
                ims = torch.rand((1,3,224,224)).to(device)
                output = model(ims)
    eval_fn(mp, device=device_cpu)
    mq = quantize_fx.convert_fx(mp)
    dummy_input = torch.rand(1, 3, inp_size[0], inp_size[1])
    torchscript_model = torch.jit.trace(mq, dummy_input)
    from torch.utils.mobile_optimizer import optimize_for_mobile
    torchscript_model = optimize_for_mobile(torchscript_model), save_ckpt_path)
    torchscript_model._save_for_lite_interpreter(save_ckpt_path.replace('.pt', '.ptl'))
    Feature Request 
  • Update

    Jan 21, 2022

    The minor changes are to allow the model to work with stages other than 4.

    CLA Signed 
  • how to increase the reproducibility ?
    how to increase the reproducibility ?

    Jan 22, 2022

    run to train, however the accuracy is random and the interval is above 0.5%. Would you like tell me how to improve reproducibility and reduce the accuracy interval?

    I know there are some code in to improve the reproducibility: # fix the seed for reproducibility seed = args.seed + utils.get_rank() torch.manual_seed(seed) np.random.seed(seed) cudnn.benchmark = True

  • Add  Detectron2 support for ConvNext
    Add Detectron2 support for ConvNext

    Jan 22, 2022

    Hello, I have ported the existing ConvNeXt model running on mmdetection, to work with Detectron2 framework.

    All the required files are given under object_detection/detectron2 including the main model, the configuration files.

    Kindly accept the PR if it helps. :)


    CLA Signed 
  • convnext is slower than swin ?
    convnext is slower than swin ?

    Jan 24, 2022

    I use convnext_tiny as pretrained model to finetune on my own dataset, but i found convnext_tiny is slower than swin_tiny, I use 4 nvidia 1080ti. convnext tiny cost three more times than swin_tiny . But the flops of two model is similar, I don't know why convnext is slower?

  • KeyError: 'DistOptimizerHook is not in the hook registry'
    KeyError: 'DistOptimizerHook is not in the hook registry'

    Jan 24, 2022

    Hello, when I was running the target detection code, the following problems occurred, which should be caused by the missing part of the files in mmdet / Models / detectors. Can you release the perfect code?

    Traceback (most recent call last): File "tools/", line 189, in cfg.get('momentum_config', None)) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/", line 540, in register_training_hooks main() File "tools/", line 185, in main self.register_optimizer_hook(optimizer_config) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/", line 448, in register_optimizer_hook meta=meta) File "/root/data/UniverseNet-master/mmdet/apis/", line 166, in train_detector hook = mmcv.build_from_cfg(optimizer_config, HOOKS) File "/opt/conda/lib/python3.7/site-packages/mmcv/utils/", line 45, in build_from_cfg f'{obj_type} is not in the {} registry')cfg.get('momentum_config', None))

    File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/", line 540, in register_training_hooks KeyError: 'DistOptimizerHook is not in the hook registry'

  • Changed the dict params while Loading weights
    Changed the dict params while Loading weights

    Jan 13, 2022

    There was a typo in code in the

    def convnext_xlarge(pretrained=False, in_22k=False, **kwargs):
        model = ConvNeXt(depths=[3, 3, 27, 3], dims=[256, 512, 1024, 2048], **kwargs)
        if pretrained:
            url = model_urls['convnext_xlarge_22k'] if in_22k else model_urls['convnext_xlarge_1k']
            checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", check_hash=True)
        return model

    the url contained the line

        url = model_urls['convnext_xlarge_22k'] if in_22k else model_urls['convnext_xlarge_1k']

    updated "convnext_large_1k" to 'convnext_xlarge_1k'] in model_urls

  • Question 1x1 conv vs linear
    Question 1x1 conv vs linear

    Jan 17, 2022

    Congratulations on your work and thanks for sharing! I'd like to naively ask, what is the reason behind implementing 1x1 convs with fully connected layers? I know they are equivalent but I had been thinking the latter is less efficient.

    Thanks in advance!

  • The url was also wrong
    The url was also wrong

    Jan 13, 2022

    Updated the url

    Remarks, There is some issue with model at "convnext_xlarge_1k": "", I have updated the url and dict_key of model urls earlier the the 1K model of x_large wasn't loading I fixed that path and variable defined above but while executing I got the issue of 403 error

    To further check if everything seems to work right or is it the issue of url only I tried downloading the model weights and using them and it gave me the following error PytorchStreamReader failed reading zip archive: failed finding central directory

    It seems that there is some issue with torch hub or model still dosen't exist at torch hub

    CLA Signed 
  • werid learning rate occur
    werid learning rate occur

    Jan 13, 2022


    i trained convnext model in my project, but i found werid lr happened.

    i set lr = 0.002, but in the training loop, the lr param seems to be only 0.00002.

    someone knowns why?

    convnext_1 3661642070433_ pic

  • Custom Data
    Custom Data

    Jan 16, 2022

    Do you will be release anything about for train the model for custom datas?

  • Image Classification Web Demo
    Image Classification Web Demo

    Jan 17, 2022

    Web Demo on Huggingface Spaces using Gradio for Image classification

    CLA Signed 
  • A question about processing ImageNet-22K dataset
    A question about processing ImageNet-22K dataset

    Jan 18, 2022

    Thanks for sharing your fantastic work~ I want to know how to process ImageNet-22K dataset that download from, where can i get data labels?

  • The batch-sizes of single machine commands are not adjusted
    The batch-sizes of single machine commands are not adjusted

    Jan 21, 2022

    On the training doc, I believe we need to adjust the batch-size (or the LR) on the single machine commands to maintain the total batch-size the same.

    For example, currently the ConvNeXt-S reports:

    • Multi-node: --nodes 4 --ngpus 8 --batch_size 128 --lr 4e-3
    • Single-machine: --nproc_per_node=8 --batch_size 128 --lr 4e-3 <- I believe here it should be --batch_size 512

    Same applies for the other variants.

  • hello,what is this problem,thanks~
    hello,what is this problem,thanks~

    Jan 20, 2022

    Traceback (most recent call last): File "", line 122, in main() File "", line 113, in main args.dist_url = get_init_file().as_uri() File "", line 41, in get_init_file os.makedirs(str(get_shared_folder()), exist_ok=True) File "", line 37, in get_shared_folder raise RuntimeError("No shared folder available") RuntimeError: No shared folder available

  • Add Weights and Biases Integration
    Add Weights and Biases Integration

    Jan 18, 2022

    This PR adds support for Weights and Biases metric and model checkpointing.


    I have tested the implementation by training ConvNext Tiny on CIFAR 100 dataset for 10 epochs. To enable logging metrics using W&B use --enable_wandb true. To save the model checkpoints as versioned artifacts use --wandb_ckpt true along with --enable_wandb.

    python --epochs 10 --model convnext_tiny --data_set CIFAR --data_path datasets --num_workers 8 --warmup_epochs 0  --save_ckpt true --output_dir model_ckpt --finetune path/to/model.pth --cutmix 0 --mixup 0 --lr 4e-4 --enable_wandb true --wandb_ckpt true

    You can also set the name of the W&B project where all the runs will be logged using the --project argument (default='convnext').

    Result (Feature Addition)

    • Able to easily share the experiments. Here's an example W&B run that shows train and test metrics from fine-tuning a ConvNext tiny on CIFAR 100 for 10 epochs.
    • Ability to check out the exact configuration used to train the model. Configs can thus be easily compared.
    • CPU/GPU metrics - memory allocated, memory utilized, etc.
    • Versioned model checkpoints which are easy to share.

    Note: I achieved a ~88% test top 1 accuracy which is really impressive. Thanks for working on this.

    The screen recording below summarizes the features being added for a few lines of code.


    • I have tested the implementation on a single machine with one GPU and with 2 GPUs and seems to work as expected. You can find the W&B run associated with training on 2 K80s.
    • If --enable_wandb true is not passed the code will work as expected, thus the addition is not enforcing any dependency on W&B.
    CLA Signed