利用通义千问生成数字字母汉字和特殊字符的排序

最近有这么一个排序的需求：

按特殊字符、数字、字母、汉字顺序排序，第一位相同，第二位也是按这个顺序排，第三位依次类推，如果遇到两个比较项都是数字，就不按位比较了，需要向后取出所有连续的数字，按数字大小比较

网上查找了一番，发现都没有可以直接用的代码，要不就是按首位排序，要不就是比较项比较简单

正好可以试一下通义千问

下面的链接就是与通义千问所有的对话，有兴趣可以点开看看，不过就是不知道有没有有效期

# 我

首先我先提出我的需求：

使用 js 排序
规则如下：
按 特殊字符、数字、字母、汉字 顺序排序，第一位相同，第二位也是按这个顺序排，第三位依次类推，如果遇到两个比较项都是数字，就不按位比较了，按数字大小比较

1
2
3

# 通义千问

在 JavaScript 中，根据您给出的排序规则（特殊字符、数字、字母、汉字的顺序，并且数字间直接比较大小），我们可以定义一个自定义的排序函数。这个函数会用到 localeCompare 方法来处理字母和汉字的排序，并通过正则表达式来区分数字和其他字符类型，对于数字，则直接比较它们的数值大小。以下是一个可能的实现方式：

function customSort(a, b) {
  const specialCharsRegex = /^[^\w\s]/
  const digitRegex = /^\d+$/

  // 检查是否都是数字，如果是则直接比较数值
  if (digitRegex.test(a) && digitRegex.test(b)) {
    return a - b
  }

  // 分别检查 a 和 b 是否为数字，数字应该在非数字之前
  const aIsDigit = digitRegex.test(a)
  const bIsDigit = digitRegex.test(b)
  if (aIsDigit !== bIsDigit) {
    return aIsDigit ? -1 : 1
  }

  // 使用 localeCompare 比较非数字的情况，注意设置 sensitivity 为 'base' 忽略大小写
  // 并且用'zh'作为 locale 确保汉字的正确排序
  let comparisonResult = a.localeCompare(b, 'zh', { sensitivity: 'base' })

  // 如果 localeCompare 的结果是 0，意味着在字母和汉字层面它们相等，
  // 但因为我们需要保持原始顺序（稳定排序），所以不改变它们的位置
  return comparisonResult !== 0 ? comparisonResult : 0
}

// 示例数组
const items = ['A', '3', '中', '!', '1', 'b', '2', '国', '_', 'a', '5', '四']

// 使用自定义排序函数
const sortedItems = items.sort(customSort)

console.log(sortedItems)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

这段代码首先定义了一个 customSort 函数，它根据输入项的类型（特殊字符、数字、字母、汉字）来决定比较逻辑。如果两个项都是数字，则直接比较它们的数值；如果一个是数字另一个不是，则数字排在前面；对于非数字项，使用 localeCompare 确保正确的语言敏感性排序，特别是对于汉字。请注意，为了确保稳定性（即相等的元素维持原有的顺序），在 localeCompare 比较结果为 0 时，直接返回 0，不交换两元素的位置。

请根据实际应用场景调整上述代码，比如调整 locale 以适应不同的地区设置或添加更多的特殊情况处理。

# 我

比较项不是只有一位，有可能有多位，比如 items = ['a3g 分诶', '342', 'fesfi', '@$']

# 通义千问

理解您的需求，如果比较项包含多位字符，且需要按照您指定的规则进行逐位排序，那么我们需要对每个字符串进行逐字符处理并应用相同的排序逻辑。这里的关键在于将字符串分解为字符数组，然后对每个字符应用之前的排序逻辑。考虑到这一点，我们可以修改上面的函数来实现这一需求：

function characterWiseCustomSort(strA, strB) {
  const maxLength = Math.max(strA.length, strB.length)
  for (let i = 0; i < maxLength; i++) {
    const charA = strA[i] || ''
    const charB = strB[i] || ''

    const specialCharsRegex = /^[^\w\s]/
    const digitRegex = /^\d+$/

    // 比较当前字符
    if (digitRegex.test(charA) && digitRegex.test(charB)) {
      // 如果都是数字，直接比较数值
      if (parseInt(charA, 10) !== parseInt(charB, 10)) {
        return parseInt(charA, 10) - parseInt(charB, 10)
      }
    } else if (charA !== charB) {
      // 非数字字符，使用localeCompare比较
      const comparisonResult = charA.localeCompare(charB, undefined, { sensitivity: 'base' })
      if (comparisonResult !== 0) {
        return comparisonResult
      }
    }

    // 如果当前位相同，继续比较下一位
  }

  // 如果所有对应位置的字符都相同，则长度短的字符串应当排在前面（可选，根据具体需求调整）
  return strA.length - strB.length
}

// 使用此函数作为 sort 方法的比较函数
const items = ['a3g 分诶', '342', 'fesfi', '@$']
const sortedItems = items.sort((a, b) => characterWiseCustomSort(a, b))

console.log(sortedItems)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

这个修改后的函数 characterWiseCustomSort 会逐个字符地比较两个字符串 strA 和 strB，先比较第一个字符，如果相同则比较下一个，直到找到不同或遍历完所有字符。对于数字字符，直接比较它们的数值；对于非数字字符，使用 localeCompare 进行比较，确保符合汉字、字母等的正确排序规则。如果所有对应位置的字符都相同，则可以通过字符串长度来决定排序（可选行为，根据实际需求调整）。

# 我

这个排序有点问题，比如 21 应该排在 100 的前面，但现在是按一位一位比较的，21 排在 100 后面了

# 通义千问

您是对的，之前的实现确实没有完全解决数字整体大小的比较问题。为了解决这个问题，当遇到数字时，我们不应该一位一位地比较，而应该先识别出完整的数字串，再进行整体的数值比较。下面是修正后的代码，它会在遇到数字时，先提取完整的数字串进行比较，确保数字按其完整值的大小排序：

function extractNumber(str, startIndex) {
  let endIndex = startIndex
  while (endIndex < str.length && /\d/.test(str[endIndex])) {
    endIndex++
  }
  return [parseInt(str.substring(startIndex, endIndex), 10), endIndex]
}

function characterWiseCustomSort(strA, strB) {
  let indexA = 0,
    indexB = 0

  while (indexA < strA.length && indexB < strB.length) {
    const charA = strA[indexA]
    const charB = strB[indexB]

    const isDigitA = /\d/.test(charA)
    const isDigitB = /\d/.test(charB)

    if (isDigitA && isDigitB) {
      // 两个字符都是数字，提取完整的数字串进行比较
      const [numA, nextIndexA] = extractNumber(strA, indexA)
      const [numB, nextIndexB] = extractNumber(strB, indexB)

      if (numA !== numB) {
        return numA - numB
      }
      // 更新索引到数字串之后
      indexA = nextIndexA
      indexB = nextIndexB
    } else if (isDigitA) {
      return -1 // 数字在非数字之前
    } else if (isDigitB) {
      return 1 // 非数字在数字之后
    } else {
      // 都是非数字字符，使用localeCompare比较
      const comparisonResult = charA.localeCompare(charB, undefined, { sensitivity: 'base' })
      if (comparisonResult !== 0) {
        return comparisonResult
      }
      // 继续比较下一个字符
      indexA++
      indexB++
    }
  }

  // 较短的字符串排在前面
  return strA.length - strB.length
}

const items = ['21', '100', 'a3g 分诶', '342', 'fesfi', '@$']
const sortedItems = items.sort((a, b) => characterWiseCustomSort(a, b))

console.log(sortedItems)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

这个版本的 characterWiseCustomSort 函数通过引入 extractNumber 辅助函数来识别并比较完整的数字串，从而解决了数字整体排序的问题。现在，"21"将会正确地排在"100"前面。

# 我

最后排序结果不是先特殊字符、数字、字母、汉字

# 通义千问

我明白了，之前的实现主要集中在处理数字串的连续比较上，而忽视了最初规定的整体排序逻辑，即先特殊字符、接着数字、字母、最后汉字的全局顺序。让我们修正这个问题，确保整体遵循规定的排序规则，并同时正确处理数字串：

function extractNextToken(str, startIndex) {
  if (startIndex >= str.length) return null
  const char = str[startIndex]
  if (/\d/.test(char)) {
    // 提取完整的数字
    let endIndex = startIndex
    while (endIndex < str.length && /\d/.test(str[endIndex])) {
      endIndex++
    }
    return { type: 'number', value: parseInt(str.substring(startIndex, endIndex), 10), endIndex }
  } else if (/[a-zA-Z]/.test(char)) {
    // 提取单个字母
    return { type: 'letter', value: char, endIndex: startIndex + 1 }
  } else if (/[\u4e00-\u9fff]/.test(char)) {
    // 提取单个汉字
    return { type: 'chinese', value: char, endIndex: startIndex + 1 }
  } else {
    // 特殊字符
    return { type: 'special', value: char, endIndex: startIndex + 1 }
  }
}

function tokenWiseCustomSort(strA, strB) {
  let indexA = 0,
    indexB = 0

  while (true) {
    const tokenA = extractNextToken(strA, indexA)
    const tokenB = extractNextToken(strB, indexB)

    if (!tokenA && !tokenB) break // 都处理完了

    if (!tokenA) return 1 // A 已经处理完，B 还有剩余，A 应排在 B 后面
    if (!tokenB) return -1 // B 已经处理完，A 还有剩余，B 应排在 A 后面

    // 比较类型
    if (tokenA.type !== tokenB.type) {
      const typeOrder =
        ['special', 'number', 'letter', 'chinese'].indexOf(tokenA.type) -
        ['special', 'number', 'letter', 'chinese'].indexOf(tokenB.type)
      if (typeOrder !== 0) return typeOrder
    }

    // 类型相同，比较值
    if (tokenA.value !== tokenB.value) {
      if (typeof tokenA.value === 'number' && typeof tokenB.value === 'number') {
        return tokenA.value - tokenB.value
      } else {
        return tokenA.value.localeCompare(tokenB.value)
      }
    }

    // 类型和值都相同，移动到下一个字符
    indexA = tokenA.endIndex
    indexB = tokenB.endIndex
  }

  return 0 // 完全相等
}

const items = ['@$', '34', 'ab', '你好', '21 世界', 'a3g 分诶', '100', 'fesfi']
const sortedItems = items.sort((a, b) => tokenWiseCustomSort(a, b))

console.log(sortedItems)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

这个版本的 tokenWiseCustomSort 函数首先定义了 extractNextToken 函数来逐个提取字符串中的特殊字符、数字、字母、汉字，并为每种类型分配一个顺序。随后，函数逐个比较两个字符串中的标记（token），首先比较类型，然后（如果类型相同）比较值。对于数字和非数字字符，采取了适当的比较方法，确保整个排序过程严格遵循您指定的规则。

# 我

extractNextToken 这个函数考虑一下小数的情况，不需要考虑负数

# 通义千问

了解，为了使 extractNextToken 函数能够正确处理小数（包括正数和小数点），而不考虑负数，我们可以调整正则表达式来匹配小数点后的数字序列。下面是更新后的 extractNextToken 函数：

function extractNextToken(str, startIndex) {
  if (startIndex >= str.length) return null
  const char = str[startIndex]
  if (/\d+(\.\d+)?/.test(char + str.slice(startIndex + 1).charAt(0))) {
    // 提取完整的数字，包括小数
    let endIndex = startIndex
    while (endIndex < str.length && /\d|\./.test(str[endIndex])) {
      endIndex++
    }
    return { type: 'number', value: parseFloat(str.substring(startIndex, endIndex)), endIndex }
  } else if (/[a-zA-Z]/.test(char)) {
    // 提取单个字母
    return { type: 'letter', value: char, endIndex: startIndex + 1 }
  } else if (/[\u4e00-\u9fff]/.test(char)) {
    // 提取单个汉字
    return { type: 'chinese', value: char, endIndex: startIndex + 1 }
  } else {
    // 特殊字符
    return { type: 'special', value: char, endIndex: startIndex + 1 }
  }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

这个调整后的 extractNextToken 函数现在能够识别并正确提取包括小数在内的数字串。请注意，这里我们仍然不直接处理负数，假设输入数据中不含负号。

# 我

/\d+(.\d+)?/.test(char + str.slice(startIndex + 1).charAt(0))

这里对吗

# 通义千问

对不起，我的解释有误，直接在正则表达式的测试中尝试匹配后续字符可能导致逻辑不准确。正确的做法是在判断是否开始提取数字时，仅检查当前字符是否为数字或小数点，一旦确定开始提取数字，则使用循环来查找数字串的结束位置，允许包括小数点和小数部分。这里是修正后的 extractNextToken 函数实现：

function extractNextToken(str, startIndex) {
  if (startIndex >= str.length) return null
  const char = str[startIndex]

  if (/\d/.test(char)) {
    // 可能是整数或小数，开始提取
    let endIndex = startIndex
    while (endIndex < str.length && (/\d/.test(str[endIndex]) || str[endIndex] === '.')) {
      endIndex++
    }
    // 确保提取的是有效的数字或小数
    if (endIndex > startIndex && (str[endIndex - 1] !== '.' || endIndex - startIndex > 1)) {
      return { type: 'number', value: parseFloat(str.substring(startIndex, endIndex)), endIndex }
    }
  } else if (/[a-zA-Z]/.test(char)) {
    return { type: 'letter', value: char, endIndex: startIndex + 1 }
  } else if (/[\u4e00-\u9fff]/.test(char)) {
    return { type: 'chinese', value: char, endIndex: startIndex + 1 }
  } else {
    return { type: 'special', value: char, endIndex: startIndex + 1 }
  }

  return null // 默认情况，理论上不会到达这里
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

这个修正后的版本在遇到数字字符时，会检查后续字符直到遇到非数字或小数点后的非数字为止，这样可以正确处理包括整数和小数在内的数字串。注意，这里还加入了一个条件以排除仅有一个小数点的情况，确保提取的是有效数字或小数。

# 我

改成这种是否可以？

if (/\d/.test(char)) {
  const reg = /\d+(.\d+)?/
  const result = str.slice(startIndex).match(reg)
  return { type: 'number', value: parseFloat(result), endIndex: startIndex + result.length }
}

1
2
3
4
5

# 通义千问

您的修改思路是正确的，使用正则表达式直接匹配数字（包括整数和小数）是一个简洁的方法。不过，直接使用 match 方法时，需要注意几个细节来确保逻辑的准确性。以下是根据您的想法调整后的代码，同时修正了一些细节：

function extractNextToken(str, startIndex) {
  if (startIndex >= str.length) return null
  const char = str[startIndex]

  if (/\d/.test(char)) {
    // 使用正则表达式匹配数字（整数或小数）
    const reg = /\d+(\.\d+)?/
    const matchResult = str.slice(startIndex).match(reg)
    if (matchResult) {
      // 确保有匹配到内容
      const matchedStr = matchResult[0]
      return {
        type: 'number',
        value: parseFloat(matchedStr),
        endIndex: startIndex + matchedStr.length
      }
    }
  } else if (/[a-zA-Z]/.test(char)) {
    return { type: 'letter', value: char, endIndex: startIndex + 1 }
  } else if (/[\u4e00-\u9fff]/.test(char)) {
    return { type: 'chinese', value: char, endIndex: startIndex + 1 }
  } else {
    return { type: 'special', value: char, endIndex: startIndex + 1 }
  }

  return null // 理论上不会到达这里
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

在这个版本中，我们使用正则表达式\d+(.\d+)?来匹配数字，包括整数和小数。通过 str.slice(startIndex).match(reg) 获取匹配结果，并确保匹配成功后再构造返回对象。这里直接使用了 matchResult[0]来获取匹配到的字符串，然后计算新的 endIndex。这样修改后，函数应该能够正确地处理数字，包括含有小数点的情况。

# 总结

到这里，我们的需求就已经解决了

可以看到 AI 给出的方案还是很好的，而且比我们自己写代码可快多了

编辑

上次更新: 5/14/2024, 6:48:53 AM

← version 1.5 检测网络是否能够访问 Disqus→