Qwen 2 VL 7B Instruct | [Install & Run Guide for VL AI]

This guide will walk you through the process of setting up and testing the Qwen2-VL-7B-Instruct model on your computer. This powerful tool combines image processing and language understanding capabilities.
Download Qwen 2 VL 7B Instruct

What is Qwen 2 VL 7B Instruct?

Qwen-2 VL 7B Instruct is a 7-billion-parameter transformer model created by Alibaba, designed for advanced natural language understanding, text generation, and multimodal tasks such as visual question answering. It excels in providing accurate responses to instructional prompts, making it ideal for interactive AI applications.

How to Download and Install Qwen 2 VL 7B Instruct?

Step 1: Preparing Your Computer – Windows

Install Python for Windows

Download Python here.
Double-click the downloaded file to start the installation.
Important: Check the box that says “Add Python to PATH” before clicking Install Now.
Complete the installation by following the prompts.

Verify Python Installation on Windows

Open Command Prompt.
Type python –version and press Enter.
You should see something like Python 3.12.5 (or your installed version).

Step 1: Preparing Your Computer – Mac

Install Python for Mac

Open the Terminal.
Install Python by typing:

macOS Command

brew install python

If you don’t have Homebrew installed, visit the official Homebrew website for installation instructions.

Verify Python Installation on Mac

In the Terminal, type python3 –version and press Enter.
You should see the installed Python version.

Step 1: Preparing Your Computer – Linux

Install Python for Linux

Open the Terminal.
For Ubuntu or Debian-based distributions, type:

Linux Command

sudo apt-get install python3

For other distributions, use the appropriate package manager.

Verify Python Installation on Linux

In the Terminal, type python3 –version and press Enter.
You should see the installed Python version.

Step 1: Preparing Your Computer – Git Installation

Install Git for Windows

Download Git here.
Run the installer and follow the default settings.

Verify Git Installation on Windows

Open Command Prompt.
Type git –version and press Enter.
You should see the installed Git version.

Install Git for Mac

Open Terminal and type git –version.
If Git is not installed, you will be prompted to install it.
Follow the installation prompts to complete the process.

Install Git for Linux

Open Terminal and run:

Git Installation Command

sudo apt-get install git

Verify Git Installation on Mac and Linux

In the Terminal, type git –version and press Enter.
You should see the installed Git version.

Step 2: Setting Up the Project Environment

Open Command Prompt or Terminal

Windows: Press Windows Key + R, type cmd, and press Enter.
Mac/Linux: Open the Terminal application.

Create a Project Directory

Create the Folder:

Create Project Directory

mkdir qwen2_vl_project

Navigate into the Folder:

Navigate to Project Directory

cd qwen2_vl_project

Set Up a Virtual Environment

Create the Virtual Environment:

Create Virtual Environment

python -m venv qwen2_vl_env

Activate the Virtual Environment

Windows:

Activate Virtual Environment (Windows)

qwen2_vl_env\Scripts\activate

Mac/Linux:

Activate Virtual Environment (Mac/Linux)

source qwen2_vl_env/bin/activate

Note: You’ll see (qwen2_vl_env) at the beginning of your command line now.

Upgrade pip (Python Package Installer)

Upgrade pip

pip install --upgrade pip

Step 3: Installing Required Libraries

Install PyTorch with CUDA Support

Install PyTorch

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Note: cu118 corresponds to CUDA version 11.8. If your GPU uses a different version, replace cu118 with your version (e.g., cu117 for CUDA 11.7).

Check Your CUDA Version

Windows: Open Command Prompt, type nvidia-smi and press Enter.
Linux: Open Terminal, type nvidia-smi.
Mac: Macs typically don’t support CUDA.

Look for “CUDA Version” in the output.

Install the Transformers Library

Install Transformers Library

pip install git+https://github.com/huggingface/transformers.git

Install Additional Required Packages

Install Additional Packages

pip install accelerate safetensors sentencepiece

Install Qwen VL Utilities

pip install qwen-vl-utils

Step 4: Downloading the Qwen2-VL Model

Create the Download Script

Create a new file named download_qwen2_vl.py in your project folder.

Add Code to the Download Script

Copy and paste the following code into download_qwen2_vl.py:

Download Qwen2-VL Script


 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
 model_name = "Qwen/Qwen2-VL-7B-Instruct"
 Download and save the tokenizer
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
 tokenizer.save_pretrained("./qwen2_vl_7b_instruct")
 Download and save the model
 model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", trust_remote_code=True)
 model.save_pretrained("./qwen2_vl_7b_instruct")
 Download and save the processor
 processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True)
 processor.save_pretrained("./qwen2_vl_7b_instruct")
 print("Download complete. Model, tokenizer, and processor saved in './qwen2_vl_7b_instruct'")

Run the Download Script

In your Command Prompt or Terminal, make sure you’re in the qwen2_vl_project folder, then type:

Run Download Script

python download_qwen2_vl.py

This script downloads the model files and saves them in a folder named qwen2_vl_7b_instruct.

Step 5: Testing the Qwen2-VL Model

Create the Test Script

Create a new file named test_qwen2_vl.py in your project folder.

Add Code to the Test Script

Copy and paste the following code into test_qwen2_vl.py:

Qwen2-VL Test Script


 from transformers import AutoTokenizer, AutoModelForCausalLM, AutoProcessor
 from qwen_vl_utils import process_vision_info
 from PIL import Image
 import requests
 Load the model, tokenizer, and processor
 model = AutoModelForCausalLM.from_pretrained("./qwen2_vl_7b_instruct", device_map="auto", trust_remote_code=True)
 tokenizer = AutoTokenizer.from_pretrained("./qwen2_vl_7b_instruct", trust_remote_code=True)
 processor = AutoProcessor.from_pretrained("./qwen2_vl_7b_instruct", trust_remote_code=True)
 Prepare a test image
 image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
 image = Image.open(requests.get(image_url, stream=True).raw)
 Prepare the input
 query = "Describe this image in detail."
 messages = [
 {
 "role": "user",
 "content": [
 {"type": "image", "image": image},
 {"type": "text", "text": query},
 ],
 }
 ]
Process the input
 inputs = processor.apply_chat_template(messages, return_tensors="pt").to("cuda")
 Generate the output
 output_ids = model.generate(**inputs, max_new_tokens=100)
 result = tokenizer.decode(output_ids[0], skip_special_tokens=True)
 print("Model output:")
 print(result)

Run the Test Script

In your Command Prompt or Terminal, type:

Run Test Script

python test_qwen2_vl.py

If everything is set up correctly, you’ll see a detailed description of the test image printed out.

Troubleshooting: KeyError: ‘qwen2_vl’

If you encounter this error, it means the Transformers library is outdated. To fix:

Re-install Transformers

pip install git+https://github.com/huggingface/transformers.git

Key Features of Qwen-2 VL 7B

Multimodal Capabilities

Processes text, images, and videos
Suitable for visual question answering, image captioning, and video summarization
Handles images with varying resolutions
Can process videos up to 20 minutes long

Advanced Visual Understanding

Achieves high scores on visual benchmarks like MathVista, DocVQA, and MTVQA
Excels in tasks involving complex visual inputs
State-of-the-art performance in image understanding

Multilingual Support

Processes and understands text in multiple languages
Supports most European languages, Japanese, Korean, Arabic, and others
Ideal for global applications and multilingual tasks

Device Integration

Can be integrated with robots, mobile phones, and other devices
Enables automated operations based on visual and text inputs
Suitable for automation and smart device control applications

Troubleshooting Qwen2-VL Installation

CUDA Errors or GPU Not Recognized

Ensure your NVIDIA drivers are up to date.
Check that your CUDA version matches the one used in the PyTorch installation (cu118).

Out of Memory Errors

Close other programs that might be using the GPU.
Reduce the max_new_tokens parameter in the test script (e.g., change max_new_tokens=100 to max_new_tokens=50).

Qwen2-VL Model Limitations

Limitation Category	Description
Audio Processing	No Audio Support: Cannot process audio within videos.
Data Timeframe	Data Up to June 2023: May not recognize information after this date.
Recognition Capabilities	Limited Recognition: May not identify all famous people or brands.
Task Complexity	Complex Instructions: Might struggle with multi-step or very complex tasks.
Counting Accuracy	May miscount objects in complex scenes.
Spatial Understanding	Limited understanding of 3D positions and relationships.

Congratulations! You’ve successfully installed and tested the Qwen2-VL-7B-Instruct model on your computer. You can now use this powerful tool for various tasks involving images and language.