Qwen 2 VL 72B Instruct

This guide offers step-by-step instructions for installing and running the Qwen2-VL-72B-Instruct model on your personal computer. This advanced model combines visual processing with natural language understanding in a powerful and versatile package.
Download Qwen 2 VL 72B Instruct

Understanding Qwen 2 VL 72B Instruct

Qwen-2 VL 72B Instruct is Alibaba’s 72-billion-parameter transformer model, engineered for advanced performance in natural language processing and visual tasks. It offers a comprehensive approach between computational capability and task performance, making it an excellent choice for a wide range of multimodal applications, from complex image analysis to sophisticated text generation.

Installation Guide for Qwen 2 VL 72B Instruct

Step 1: Preparing Your System

Set Up Python for Windows

  • Acquire Python from here.
  • During installation, ensure “Add Python to PATH” is selected.
  • Confirm installation: Open Command Prompt, enter python –version

Python Setup for macOS

  • Launch Terminal and execute:
macOS Python Setup
brew install python
  • For Homebrew installation, visit brew.sh if needed.
  • Verify installation: Type python3 –version in Terminal

Python Installation on Linux

  • For Ubuntu and similar distributions, use:
Linux Python Setup
sudo apt-get install python3
  • Check installation: Enter python3 –version in Terminal

Git Configuration

  • Windows: Obtain from Git for Windows.
  • macOS: Type git –version in Terminal; follow prompts if not installed.
  • Linux: Install via Terminal:
Linux Git Setup
sudo apt-get install git

Step 2: Setting Up Your Project Environment

Establish Project Folder

  • Access Command Prompt (Windows) or Terminal (macOS/Linux).
  • Create and enter your project directory:
Project Directory Setup
mkdir qwen2_vl_72b_workspace
cd qwen2_vl_72b_workspace

Initialize Virtual Environment

  • Create a dedicated environment for the project:
Virtual Environment Creation
python -m venv qwen2_vl_72b_env
  • Activate your new environment:

Windows:

Windows Environment Activation
qwen2_vl_72b_env\Scripts\activate
macOS/Linux:
macOS/Linux Environment Activation
source qwen2_vl_72b_env/bin/activate

Upgrade Package Manager

  • Ensure pip is up-to-date:
Pip Upgrade
pip install --upgrade pip

Step 3: Installing Required Dependencies

PyTorch Installation

  • Install PyTorch with CUDA capabilities:
PyTorch Installation
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  • Note: Adjust cu118 to match your system’s CUDA version if different.

Transformers Library Setup

  • Install the latest Transformers library:
Transformers Installation
pip install git+https://github.com/huggingface/transformers.git

Additional Packages

  • Install other necessary components:
Additional Package Installation
pip install accelerate safetensors sentencepiece qwen-vl-utils

Step 4: Acquiring the Qwen2-VL-72B Model

Model Download Script

  • Create a file named download_qwen2_vl.py
  • Insert the following code:
Model Download Script

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
model_name = "Qwen/Qwen2-VL-72B-Instruct"
Download and save the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.save_pretrained("./qwen2_vl_72b_instruct")
Download and save the model
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
model.save_pretrained("./qwen2_vl_72b_instruct")
Download and save the processor
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
processor.save_pretrained("./qwen2_vl_72b_instruct")
print("Download complete. Model, tokenizer, and processor saved in './qwen2_vl_72b_instruct'")

Initiate Model Download

  • Execute the script to acquire the model:
Execute Download Script
python download_qwen2_vl.py
  • Note: This process may take several hours due to the model’s size (hundreds of GB).

Step 5: Verifying the Qwen2-VL-72B Model

Prepare the Verification Script

  • In your project directory, create a new file named test_qwen2_vl.py
  • Open the file in your preferred text editor and insert the following code:
Qwen2-VL-72B Verification Script

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
from qwen_vl_utils import process_vision_info
from PIL import Image
import requests
Load the model, tokenizer, and processor
model = AutoModelForCausalLM.from_pretrained(
"./qwen2_vl_72b_instruct",
device_map="auto",
torch_dtype="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("./qwen2_vl_72b_instruct", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("./qwen2_vl_72b_instruct", trust_remote_code=True)
Prepare an image
image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
image = Image.open(requests.get(image_url, stream=True).raw)
Prepare the input
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"image": image,
},
{"type": "text", "text": "Describe this image in detail."},
],
}
] Process the input
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
)
inputs = inputs.to("cuda")
Generate response
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
] output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print("Model output:")
print(output_text[0])

Run the Verification Script

  • Open your command prompt or terminal
  • Navigate to your project directory if you’re not already there
  • Execute the script with the following command:
Execute Verification Script
python test_qwen2_vl.py
  • Wait for the script to process. This may take several minutes due to the model’s size.
  • If successful, you’ll see a detailed analysis of the sample image printed in your console.

Interpret the Results

  • Review the output text carefully. It should provide a comprehensive description of the image contents.
  • Look for details such as:
    • Main subjects or objects in the image
    • Colors, textures, and spatial relationships
    • Any text or recognizable symbols
    • Overall scene or context interpretation
  • If the output seems coherent and relevant to the image, your Qwen2-VL-72B model is functioning correctly.

Troubleshoot Common Issues

  • If you encounter a “CUDA out of memory” error:
    • Try reducing max_new_tokens in the generate() function
    • Close other GPU-intensive applications
    • Consider using a CPU-only setup by changing device to “cpu”
  • For “module not found” errors, ensure all required packages are installed:
    Install Missing Packages
    pip install transformers torch Pillow requests
  • If the model files aren’t found, double-check the path in from_pretrained() matches your directory structure

Exploring Qwen2-VL-72B’s Capabilities

Advanced Visual-Language Processing

  • Optimized for high-performance inference on various hardware configurations
  • Capable of processing and analyzing complex images alongside sophisticated text queries
  • Suitable for demanding applications requiring deep understanding of visual and textual content

Large-Scale Yet Efficient

  • Achieves a balance between extensive model size and practical performance
  • Demonstrates strong capabilities in detailed image understanding and nuanced description
  • Ideal for deployments where advanced AI capabilities are crucial

Versatile Application Scope

  • Applicable in various domains including advanced e-commerce, content analysis, and AI-assisted research
  • Can be fine-tuned for specific, complex visual-language tasks
  • Supports integration into high-performance computing environments

Enhanced Multilingual Capabilities

  • Capable of processing and generating text in multiple languages with high proficiency
  • Enables sophisticated cross-lingual visual question answering and image captioning
  • Facilitates development of advanced multilingual AI applications
Congratulations on successfully setting up and verifying the Qwen2-VL-72B-Instruct model! You now have a state-of-the-art tool at your disposal for a wide range of complex visual-language tasks. Experiment with different images and sophisticated queries to fully explore the model’s extensive capabilities and push the boundaries of AI-powered visual and textual analysis.