python里

2025-04-24 10:23 59

在Python中统计英文句子的单词个数，可以通过多种方法实现，以下是几种常见且高效的方式：

一、使用`split（）`方法

这是最简单直接的方法，通过空格分割字符串并统计列表长度。

```python

输入英文句子

sentence = input("请输入英文句子：")

使用split()按空格分割

words = sentence.split()

统计单词个数

word_count = len(words)

print(f"单词个数：{word_count}")

```

注意事项：

1. 该方法将多个连续空格视为一个分隔符，符合英语单词间隔习惯；

2. 仅统计以字母或数字开头的连续字符序列，忽略标点符号。

二、使用正则表达式提取单词

通过正则表达式匹配英文单词，可更精确地处理标点符号和特殊字符。

```python

import re

from collections import Counter

def count_words(text):

转换为小写

text = text.lower()

使用正则表达式提取单词（\b\w+\b）

words = re.findall(r'\b\w+\b', text)

统计频率

word_counts = Counter(words)

return word_counts

输入文本

text = input("请输入英文句子：")

统计

counts = count_words(text)

输出结果

print(f"总单词数：{len(counts)}")

print("单词频率：")

for word, freq in counts.items():

print(f"{word}: {freq}")

```

优势：

自动忽略标点符号（如逗号、句号等）；

支持处理包含特殊字符的单词（如连字符、缩写）。

三、使用`collections.Counter`优化统计

`Counter`是Python标准库中用于高效计数的工具，可简化代码并提升性能。

```python

import re

from collections import Counter

def count_words(filepath):

try:

with open(filepath, 'r', encoding='utf-8') as f:

text = f.read().lower()

words = re.findall(r'\b\w+\b', text)

word_counts = Counter(words)

return word_counts

except FileNotFoundError:

print(f"文件未找到：{filepath}")

return None

except Exception as e:

print(f"发生错误：{e}")

return None

示例：统计文件中的单词

filepath = 'example.txt'

counts = count_words(filepath)

if counts:

print(f"总单词数：{len(counts)}")

print("单词频率：")

for word, freq in counts.items():

print(f"{word}: {freq}")

```

适用场景：

处理大文件时效率更高；

需要后续分析单词频率时直接获取结果。

四、处理特殊情况

标点符号处理：上述方法已通过正则表达式过滤标点，若需保留部分符号（如缩写），可调整正则表达式；

大小写敏感：通过`text.lower（）`统一转换为小写，避免同一单词因大小写差异被重复计数。

以上方法可根据具体需求选择，简单场景推荐使用`split（）`，复杂场景建议结合正则表达式和`Counter`提升效率。

本文地址： http://www.hahawenanjuzi.cn/fendoujuzi/309471.html

声明：本站内容均来自网络，如有侵权，请联系我们。