Sorry, your browser cannot access this site
This page requires browser support (enable) JavaScript
Learn more >

Qwen 千问

Mac

Mac电脑配置:M2芯片(arm架构)。

创建环境:

1
2
3
4
conda create -n qwen python=3.11 -y
conda activate qwen
python -m pip install -U pip setuptools wheel
pip install mlx mlx-lm huggingface_hub

下载:

1
2
python -m mlx_lm.chat --model Qwen/Qwen3.5-0.8B
python -m mlx_lm.chat --model Qwen/Qwen3.5-4B

测试:

1
mlx_lm.generate --model Qwen/Qwen3.5-0.8B --prompt "你好"
1
mlx_lm.generate --model Qwen/Qwen3.5-4B --prompt "你好"

在python里调用:

Qwen3.5-0.8B模型如果开启思考容易陷入死循环,所以建议不开。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from mlx_lm import load, generate

MODEL_ID = "Qwen/Qwen3.5-0.8B"

model, tokenizer = load(MODEL_ID)

messages = [
{"role": "user", "content": "你好,请用一句话介绍你自己。"}
]

prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=False,
)

text = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=256,
verbose=False,
)

print(text)

Qwen3.5-0.8B模型默认是开启思考的,但是因为思考过程容易很长,所以max_tokens一定要设置地大一点,不然会输出到一半就停止了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from mlx_lm import load, generate

MODEL_ID = "Qwen/Qwen3.5-4B"

model, tokenizer = load(MODEL_ID)

messages = [
{"role": "user", "content": "你好,请用一句话介绍你自己。"}
]

prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)

text = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=16384,
verbose=False,
)

print(text)

如果只需要最终思考后的输出:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from mlx_lm import load, generate

MODEL_ID = "Qwen/Qwen3.5-4B"

model, tokenizer = load(MODEL_ID)

messages = [
{"role": "user", "content": "你好,请用一句话介绍你自己。"}
]

prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)

text = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=16384,
verbose=False,
)

if "</think>" in text:
thinking, answer = text.split("</think>", 1)
else:
thinking, answer = "", text

print(answer)