Vox-adv-cpk.pth.tar Jun 2026
: Short for "checkpoint", it indicates that the file contains a model checkpoint. In deep learning, checkpoints are saved during training at certain intervals, allowing for the model to be resumed from a specific point or used for inference.
: The checkpoint is frequently used in Hugging Face Spaces and Google Colab notebooks. This allows users to run the model without any local setup, using only a web browser. The load_checkpoints() function in these environments is used to load the file from a cloud storage path.
It serves as the primary neural network backbone for deepfake and puppet animation frameworks like Avatarify and the First Order Motion Model (FOMM) . Technical Specifications and Architecture
On many GitHub repositories dealing with First Order Motion Models, you will often see two main checkpoint files offered: vox-cpk.pth.tar and vox-adv-cpk.pth.tar . According to the developer: Vox-adv-cpk.pth.tar
It is a .pth.tar file, which is a common format used in PyTorch to store model weights, architecture configuration, and training state (checkpoint).
The versatility of vox-adv-cpk.pth.tar is demonstrated by its integration into a wide range of projects beyond the core research repository. It has become a standard artifact for facial motion transfer:
The adversarial training framework adds a discriminator that tries to distinguish between real and generated frames. The generator learns to produce increasingly realistic animations to fool the discriminator, resulting in higher-quality outputs with fewer artifacts. The adversarial version also uses a specialized configuration file ( vox-adv-256.yaml ) compared to the standard configuration ( vox-256.yaml ). : Short for "checkpoint", it indicates that the
AliaksandrSiarohin/first-order-model: This repository ... - GitHub
While Vox-adv-cpk.pth.tar remains an iconic landmark file for open-source AI, the landscape of image animation has evolved. Newer frameworks have built upon this foundation to resolve its inherent limitations:
The release of Vox-adv-cpk.pth.tar marked a democratization of deepfake-style technology. Before this, high-quality facial animation required massive datasets and training times for every specific identity. This allows users to run the model without
The "Vox" in the filename refers to the dataset, a large-scale audio-visual collection of human speakers. The "adv" suffix typically denotes adversarial training , indicating that the model was refined using a Generative Adversarial Network (GAN) framework to produce more realistic, high-fidelity results. The file extensions .pth and .tar signify a PyTorch model state dictionary packaged within a compressed archive. Core Functionality
It fills in any empty pixels (areas that were not in the original photo but are needed for the motion) using the trained knowledge of faces.