Qwen 2 VL 7B Instruct

This guide will walk you through the process of setting up and testing the Qwen2-VL-7B-Instruct model on your computer. This powerful tool combines image processing and language understanding capabilities.
Download Qwen 2 VL 7B Instruct

What is Qwen 2 VL 7B Instruct?

Qwen-2 VL 7B Instruct is a 7-billion-parameter transformer model created by Alibaba, designed for advanced natural language understanding, text generation, and multimodal tasks such as visual question answering. It excels in providing accurate responses to instructional prompts, making it ideal for interactive AI applications.

How to Download and Install Qwen 2 VL 7B Instruct?

Step 1: Preparing Your Computer – Windows

Install Python for Windows

  • Download Python here.
  • Double-click the downloaded file to start the installation.
  • Important: Check the box that says “Add Python to PATH” before clicking Install Now.
  • Complete the installation by following the prompts.

Verify Python Installation on Windows

  • Open Command Prompt.
  • Type python –version and press Enter.
  • You should see something like Python 3.12.5 (or your installed version).

Step 1: Preparing Your Computer – Mac

Install Python for Mac

  • Open the Terminal.
  • Install Python by typing:
macOS Command
brew install python

Verify Python Installation on Mac

  • In the Terminal, type python3 –version and press Enter.
  • You should see the installed Python version.

Step 1: Preparing Your Computer – Linux

Install Python for Linux

  • Open the Terminal.
  • For Ubuntu or Debian-based distributions, type:
Linux Command
sudo apt-get install python3
  • For other distributions, use the appropriate package manager.

Verify Python Installation on Linux

  • In the Terminal, type python3 –version and press Enter.
  • You should see the installed Python version.

Step 1: Preparing Your Computer – Git Installation

Install Git for Windows

  • Download Git here.
  • Run the installer and follow the default settings.

Verify Git Installation on Windows

  • Open Command Prompt.
  • Type git –version and press Enter.
  • You should see the installed Git version.

Install Git for Mac

  • Open Terminal and type git –version.
  • If Git is not installed, you will be prompted to install it.
  • Follow the installation prompts to complete the process.

Install Git for Linux

  • Open Terminal and run:
Git Installation Command
sudo apt-get install git

Verify Git Installation on Mac and Linux

  • In the Terminal, type git –version and press Enter.
  • You should see the installed Git version.

Step 2: Setting Up the Project Environment

Open Command Prompt or Terminal

  • Windows: Press Windows Key + R, type cmd, and press Enter.
  • Mac/Linux: Open the Terminal application.

Create a Project Directory

  • Create the Folder:
Create Project Directory
mkdir qwen2_vl_project
  • Navigate into the Folder:
Navigate to Project Directory
cd qwen2_vl_project

Set Up a Virtual Environment

  • Create the Virtual Environment:
Create Virtual Environment
python -m venv qwen2_vl_env

Activate the Virtual Environment

Windows:

Activate Virtual Environment (Windows)
qwen2_vl_env\Scripts\activate
Mac/Linux:
Activate Virtual Environment (Mac/Linux)
source qwen2_vl_env/bin/activate
Note: You’ll see (qwen2_vl_env) at the beginning of your command line now.

Upgrade pip (Python Package Installer)

Upgrade pip
pip install --upgrade pip

Step 3: Installing Required Libraries

Install PyTorch with CUDA Support

Install PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Note: cu118 corresponds to CUDA version 11.8. If your GPU uses a different version, replace cu118 with your version (e.g., cu117 for CUDA 11.7).

Check Your CUDA Version

  • Windows: Open Command Prompt, type nvidia-smi and press Enter.
  • Linux: Open Terminal, type nvidia-smi.
  • Mac: Macs typically don’t support CUDA.

Look for “CUDA Version” in the output.

Install the Transformers Library

Install Transformers Library
pip install git+https://github.com/huggingface/transformers.git

Install Additional Required Packages

Install Additional Packages
pip install accelerate safetensors sentencepiece

Install Qwen VL Utilities

Install Qwen VL Utilities
pip install qwen-vl-utils

Step 4: Downloading the Qwen2-VL Model

Create the Download Script

Create a new file named download_qwen2_vl.py in your project folder.

Add Code to the Download Script

Copy and paste the following code into download_qwen2_vl.py:

Download Qwen2-VL Script

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
model_name = "Qwen/Qwen2-VL-7B-Instruct"
Download and save the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.save_pretrained("./qwen2_vl_7b_instruct")
Download and save the model
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
model.save_pretrained("./qwen2_vl_7b_instruct")
Download and save the processor
processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
processor.save_pretrained("./qwen2_vl_7b_instruct")
print("Download complete. Model, tokenizer, and processor saved in './qwen2_vl_7b_instruct'")

Run the Download Script

In your Command Prompt or Terminal, make sure you’re in the qwen2_vl_project folder, then type:

Run Download Script
python download_qwen2_vl.py
This script downloads the model files and saves them in a folder named qwen2_vl_7b_instruct.

Step 5: Testing the Qwen2-VL Model

Create the Test Script

Create a new file named test_qwen2_vl.py in your project folder.

Add Code to the Test Script

Copy and paste the following code into test_qwen2_vl.py:

Qwen2-VL Test Script

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
from qwen_vl_utils import process_vision_info
from PIL import Image
import requests
Load the model, tokenizer, and processor
model = AutoModelForCausalLM.from_pretrained("./qwen2_vl_7b_instruct", device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("./qwen2_vl_7b_instruct", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("./qwen2_vl_7b_instruct", trust_remote_code=True)
Prepare a test image
image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
image = Image.open(requests.get(image_url, stream=True).raw)
Prepare the input
query = "Describe this image in detail."
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": query},
],
}
] Process the input
inputs = processor.apply_chat_template(messages, return_tensors="pt").to("cuda")
Generate the output
output_ids = model.generate(**inputs, max_new_tokens=100)
result = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("Model output:")
print(result)

Run the Test Script

In your Command Prompt or Terminal, type:

Run Test Script
python test_qwen2_vl.py
If everything is set up correctly, you’ll see a detailed description of the test image printed out.

Troubleshooting: KeyError: ‘qwen2_vl’

If you encounter this error, it means the Transformers library is outdated. To fix:

Re-install Transformers
pip install git+https://github.com/huggingface/transformers.git

Key Features of Qwen-2 VL 7B

Multimodal Capabilities

  • Processes text, images, and videos
  • Suitable for visual question answering, image captioning, and video summarization
  • Handles images with varying resolutions
  • Can process videos up to 20 minutes long

Advanced Visual Understanding

  • Achieves high scores on visual benchmarks like MathVista, DocVQA, and MTVQA
  • Excels in tasks involving complex visual inputs
  • State-of-the-art performance in image understanding

Multilingual Support

  • Processes and understands text in multiple languages
  • Supports most European languages, Japanese, Korean, Arabic, and others
  • Ideal for global applications and multilingual tasks

Device Integration

  • Can be integrated with robots, mobile phones, and other devices
  • Enables automated operations based on visual and text inputs
  • Suitable for automation and smart device control applications

Troubleshooting Qwen2-VL Installation

CUDA Errors or GPU Not Recognized

  • Ensure your NVIDIA drivers are up to date.
  • Check that your CUDA version matches the one used in the PyTorch installation (cu118).

Out of Memory Errors

  • Close other programs that might be using the GPU.
  • Reduce the max_new_tokens parameter in the test script (e.g., change max_new_tokens=100 to max_new_tokens=50).

Qwen2-VL Model Limitations

Limitation Category Description
Audio Processing No Audio Support: Cannot process audio within videos.
Data Timeframe Data Up to June 2023: May not recognize information after this date.
Recognition Capabilities Limited Recognition: May not identify all famous people or brands.
Task Complexity Complex Instructions: Might struggle with multi-step or very complex tasks.
Counting Accuracy May miscount objects in complex scenes.
Spatial Understanding Limited understanding of 3D positions and relationships.
Congratulations! You’ve successfully installed and tested the Qwen2-VL-7B-Instruct model on your computer. You can now use this powerful tool for various tasks involving images and language.