SnakeViz 分析结果¶

SnakeViz 分析结果，程序中耗时最长的前 10 行函数如下（按 tottime 排序）：

排名	函数位置	函数名	调用次数	总耗时（秒）	单次平均耗时（秒）	累计耗时（秒）	说明
1	`cpu.py:93`	`one_qubit_base`	605,616	797.2	0.001316	802	单量子比特门的基础操作，频繁调用，矩阵乘法密集
2	`terms.py:291`	`__call__`	81,840	301.4	0.003683	381.1	哈密顿量项的逐项期望值计算，调用频繁
3	`numpy.py:788`	`calculate_expectation_state`	2,046	13.46	0.00658	478.9	计算量子态的期望值，涉及大量线性代数运算
4	`hamiltonians.py:103`	`expectation`	2,046	5.425	0.002651	1349	哈密顿量期望值计算，可能包含多个子项
5	`hamiltonians.py:692`	`__matmul__`	2,046	0.5154	0.0002519	1316	哈密顿量与量子态的矩阵乘法
6	`cpu.py:162`	`apply_gate`	605,616	7.039	1.162e-05	945.8	应用量子门操作，极度频繁
7	`numpy.py:76`	`cast`	689,622	1.775	2.574e-06	330.4	数据类型转换，频繁发生在矩阵运算中
8	`numpy.py:413`	`execute_circuit`	2,046	3.884	0.001898	800.6	执行整个量子电路，包含多个门操作
9	`circuit.py:1099`	`__call__`	2,046	0.02156	1.054e-05	805.1	电路调用入口，封装了执行逻辑
10	`circuit.py:1062`	`execute`	2,046	0.02156	1.054e-05	805.1	电路执行主逻辑，调用 `execute_circuit`

📌 主要耗时原因分析：¶

高频调用的基础操作：
- apply_gate 和 one_qubit_base 是最底层的量子门操作函数，调用次数高达 60 万次以上，哪怕每次只耗时微秒级，总体也会累积成主导的耗时。
矩阵乘法与期望值计算：
- __matmul__、expectation、calculate_expectation_state 等函数涉及大量密集的线性代数运算，尤其是大规模矩阵与向量的乘法。
电路执行逻辑重复调用：
- execute_circuit、__call__、execute 等函数在每次优化迭代中都会被调用一次，且每次都执行完整电路，导致累计耗时显著。
数据类型转换开销：
- cast 函数频繁出现在矩阵运算中，虽然单次耗时极小，但调用次数极高，累计耗时也不可忽视。

🛠️ 优化建议：¶

减少冗余调用：检查是否有重复执行的电路或门操作，是否可以缓存部分结果。
批量处理：如果可能，将多个期望值计算合并为矩阵批处理。
使用更高效的线性代数库：如 NumPy 的 einsum 或使用 GPU 加速（如 CuPy）。
优化电路结构：减少不必要的门操作，合并连续门。
并行化：考虑使用多线程或多进程对多个期望值或电路执行并行处理。

优化策略与函数¶

🧠 优化优先级最高的函数¶

函数位置	函数名	调用次数	总耗时（秒）	单次耗时	优化建议
`cpu.py:93`	`one_qubit_base`	605,616	797.2	0.001316	减少重复计算，使用矩阵缓存
`terms.py:291`	`__call__`	81,840	301.4	0.003683	合并期望值计算，向量化处理
`numpy.py:788`	`calculate_expectation_state`	2,046	13.46	0.00658	使用高效线性代数库，如 CuPy 或 JAX
`hamiltonians.py:103`	`expectation`	2,046	5.425	0.002651	减少重复构造哈密顿量，考虑稀疏矩阵优化
`cpu.py:162`	`apply_gate`	605,616	7.039	1.162e-05	合并连续门操作，减少中间态复制

🔧 优化方法详解¶

矩阵缓存与门融合¶

对于 one_qubit_base 和 apply_gate，每次都重新构造和应用门矩阵是非常低效的。
优化策略：
- 对常见门（如 H, X, Y, Z, RX, RZ）预先缓存其矩阵表示。
- 如果多个门连续作用在同一量子比特上，可提前合并为一个复合门。

期望值计算向量化¶

terms.py:291.__call__ 和 calculate_expectation_state 中的期望值计算是逐项进行的。
优化策略：
- 将多个哈密顿量项合并为一个稀疏矩阵。
- 使用 NumPy 的 einsum 或 SciPy 的稀疏矩阵乘法一次性计算所有期望值。

减少中间态复制¶

apply_gate 和 execute_circuit 中频繁创建新量子态副本。
优化策略：
- 使用就地操作（in-place）更新量子态。
- 尽量避免不必要的 .copy() 或 astype()。

并行化与批处理¶

多次执行电路、计算期望值是独立的，可以并行。
优化策略：
- 使用 multiprocessing 或 joblib 并行处理多个电路或哈密顿量项。
- 如果使用 GPU，可考虑将多个电路打包为 batch 一次性执行。

🧭 优化的基本思想总结¶

“减少重复、合并操作、向量化计算、并行执行。”

空间换时间：缓存常用矩阵，避免重复构造。
结构优化：合并连续门、稀疏矩阵表示哈密顿量。
计算优化：使用高效库（如 NumPy、SciPy、CuPy、JAX）。
流程优化：避免中间态复制，减少 Python 层开销。

调用栈分析¶

包括每个函数的调用次数（ncalls）、总耗时（tottime）、每次调用耗时（percall）、累计耗时（cumtime）、每次累计耗时（percall）以及函数位置（filename:lineno(function)）：

🧠 调用路径一：`cpu.py:93(one_qubit_base)` 的调用栈¶

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
605616	797.2	0.001316	802	0.001324	cpu.py:93(one_qubit_base)
605616	7.039	1.162e-05	945.8	0.001562	cpu.py:162(apply_gate)
2046	0.02156	1.054e-05	805.1	0.3935	circuit.py:1062(execute)
2046	0.01439	7.032e-06	805.8	0.3938	circuit.py:1099(call)
2046	3.884	0.001898	800.6	0.3913	numpy.py:413(execute_circuit)
482856	1.332	2.759e-06	794.1	0.001645	abstract.py:577(apply)

circuit.py:1099(__call__) [ncalls=2046, cumtime=805.8]
└── circuit.py:1062(execute) [ncalls=2046, cumtime=805.1]
    └── numpy.py:413(execute_circuit) [ncalls=2046, cumtime=800.6]
        └── cpu.py:162(apply_gate) [ncalls=605616, cumtime=945.8]
            └── cpu.py:93(one_qubit_base) [ncalls=605616, cumtime=802]

🧮 调用路径二：`terms.py:291(call)` 的调用栈¶

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
81840	301.4	0.003683	381.1	0.004656	terms.py:291(call)
689622	1.775	2.574e-06	330.4	0.0004791	numpy.py:76(cast)
696786	254.5	0.0003653	254.8	0.0003657	~:0(<method 'astype' of 'numpy.ndarray' objects>)
122760	0.5241	4.269e-06	154.8	0.001261	terms.py:118(call)

terms.py:118(__call__) [ncalls=122760, cumtime=154.8]
└── terms.py:291(__call__) [ncalls=81840, cumtime=381.1]
    ├── numpy.py:76(cast) [ncalls=689622, cumtime=330.4]
    └── ~:0(astype) [ncalls=696786, cumtime=254.8]

⚛️ 调用路径三：计算哈密顿量期望值的调用栈（`hamiltonians.py:103(expectation)`）¶

ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
2046	5.425	0.002651	1349	0.6592	hamiltonians.py:103(expectation)
2046	0.007122	3.481e-06	1349	0.6592	hamiltonians.py:516(expectation)
2046	0.5154	0.0002519	1316	0.643	hamiltonians.py:692(matmul)
2046	17.04	0.008328	42.15	0.0206	hamiltonians.py:673(apply_gates)
2046	13.46	0.00658	478.9	0.2341	numpy.py:788(calculate_expectation_state)

hamiltonians.py:103(expectation) [ncalls=2046, cumtime=1349]
└── hamiltonians.py:516(expectation) [ncalls=2046, cumtime=1349]
    └── hamiltonians.py:692(__matmul__) [ncalls=2046, cumtime=1316]
        └── numpy.py:788(calculate_expectation_state) [ncalls=2046, cumtime=478.9]
            └── hamiltonians.py:673(apply_gates) [ncalls=2046, cumtime=42.15]

🔍 已知调用路径¶

我们已经确认 apply_gate 被 numpy.py:413(execute_circuit) 调用，这条路径的调用次数是：

execute_circuit: 2046 次
apply_gate: 605616 次

这意味着每次 execute_circuit 平均调用约 296 次 apply_gate，这在量子线路中是合理的（多个门操作）。

⚠️ 异常点：`apply_gate` 的累计耗时高达 945.8 秒，而 `execute_circuit` 的累计耗时仅为 800.6 秒¶

这说明：

apply_gate 不止被 execute_circuit 调用，它还有其他调用来源。

🔎 可能的其他调用来源（根据 SnakeViz 表格）¶

以下函数也可能是 apply_gate 的调用者，基于它们的累计耗时和调用次数：

`abstract.py:577(apply)`¶

调用次数：482856
累计耗时：794.1 秒
这个函数可能是一个抽象门应用接口，间接调用 apply_gate

`hamiltonians.py:673(apply_gates)`¶

调用次数：2046
累计耗时：42.15 秒
该函数用于在计算期望值时应用门操作，也可能调用 apply_gate

`numpy.py:788(calculate_expectation_state)`¶

调用次数：2046
累计耗时：478.9 秒
这个函数可能在构建量子态时调用 apply_gate，尤其是在期望值计算中

`hamiltonians.py:692(matmul)`¶

调用次数：2046
累计耗时：1316 秒
如果 __matmul__ 实现了门与态的乘法，也可能间接调用 apply_gate

🧠 总结推断¶

apply_gate 的调用来源可能包括：

可能调用者函数	文件位置	调用次数	累计耗时	是否合理推断
`execute_circuit`	numpy.py:413	2046	800.6 s	✅ 已确认
`apply`	abstract.py:577	482856	794.1 s	✅ 高度可能
`calculate_expectation_state`	numpy.py:788	2046	478.9 s	✅ 合理
`apply_gates`	hamiltonians.py:673	2046	42.15 s	✅ 合理
`__matmul__`	hamiltonians.py:692	2046	1316 s	⚠️ 可能间接

如果你希望进一步验证这些路径，可以使用 pstats 的 callers() 方法或 SnakeViz 的 icicle 图来查看 apply_gate 的所有父函数。

我也可以帮你构建一个 Python 脚本来自动提取这些调用关系。是否需要我继续生成这个脚本模板？

需要优化函数¶

🔧 优化优先级函数列表¶

函数位置	函数名	累计耗时（s）	调用次数	每次调用耗时（s）	优化建议
`cpu.py:93`	`one_qubit_base`	802	605,616	0.001316	✅ 核心瓶颈，考虑矩阵乘法优化或稀疏表示
`cpu.py:162`	`apply_gate`	945.8	605,616	0.001562	✅ 多路径调用，建议缓存门矩阵或批处理
`hamiltonians.py:692`	`__matmul__`	1316	2046	0.643	✅ 大矩阵乘法，可尝试并行化或稀疏优化
`terms.py:291`	`__call__`	381.1	81,840	0.004656	✅ 类型转换频繁，建议避免重复 `astype`
`numpy.py:788`	`calculate_expectation_state`	478.9	2046	0.2341	✅ 状态计算密集，可考虑张量分解或缓存
`hamiltonians.py:103`	`expectation`	1349	2046	0.6592	✅ 顶层期望值计算，建议减少重复计算项
`abstract.py:577`	`apply`	794.1	482,856	0.001645	⚠️ 调度器函数，优化子函数分发逻辑
`terms.py:118`	`__call__`	154.8	122,760	0.001261	✅ 可能存在重复计算，建议合并逻辑
`numpy.py:76`	`cast`	330.4	689,622	0.000479	⚠️ 类型转换频繁，建议减少不必要转换

🧠 优化策略建议¶

矩阵乘法优化：
- 使用 numba 或 einsum 加速
- 利用稀疏矩阵表示减少乘法次数
门操作批处理：
- 将多个门合并为单一操作
- 使用 GPU 或 SIMD 并行执行
期望值计算缓存：
- 对重复态或哈密顿量结构进行缓存
- 使用哈希索引避免重复计算
类型转换精简：
- 避免在 __call__ 中多次 astype
- 统一数据类型入口，减少转换次数
调度器逻辑优化：
- 减少不必要的后端判断
- 使用函数指针或映射表替代 if-else

one_qubit_base函数优化¶

优化kernel的调用逻辑，减少不必要的计算和内存访问。

    def one_qubit_base(self, state, nqubits, target, kernel, gate, qubits):
        ncontrols = len(qubits) - 1 if qubits is not None else 0
        m = nqubits - target - 1
        nstates = 1 << (nqubits - ncontrols - 1)
        if ncontrols:
            kernel = getattr(self.gates, "multicontrol_{}_kernel".format(kernel))
            return kernel(state, gate, qubits, nstates, m)
        kernel = getattr(self.gates, "{}_kernel".format(kernel))
        return kernel(state, gate, nstates, m)

优化函数引用的获取和调用方式，减少中间变量（kernel）的创建和使用。

    def one_qubit_base(self, state, nqubits, target, kernel, gate, qubits):
        ncontrols = len(qubits) - 1 if qubits is not None else 0
        m = nqubits - target - 1
        nstates = 1 << (nqubits - ncontrols - 1)
        if ncontrols:
            return getattr(self.gates, "multicontrol_{}_kernel".format(kernel))(state, gate, qubits, nstates, m)
        return getattr(self.gates, "{}_kernel".format(kernel))(state, gate, nstates, m)

这种优化虽然看起来微小，但在高频调用场景下（如one_qubit_base函数被调用60万次），累积的性能提升是可观的。测试结果显示这种优化可以带来1%-10%的性能提升。

In [23]:

Copied!





import timeit
import numpy as np
import sys
import os

# 修正：直接设置路径，不使用__file__
# 根据当前工作目录设置qibojit路径
current_dir = os.getcwd()  # 获取当前工作目录
qibojit_path = os.path.join(current_dir, 'qibojit-repo', 'src')
if os.path.exists(qibojit_path):
    sys.path.append(qibojit_path)
    print(f"添加qibojit路径: {qibojit_path}")
else:
    print(f"路径不存在: {qibojit_path}")

class MockGates:
    """模拟gates对象，用于测试"""
    def __init__(self):
        # 创建一些模拟的kernel函数
        for name in ['apply_x', 'apply_y', 'apply_z', 'apply_gate']:
            setattr(self, f"{name}_kernel", self._mock_kernel)
            setattr(self, f"multicontrol_{name}_kernel", self._mock_kernel)
    
    def _mock_kernel(self, *args):
        """模拟kernel函数执行"""
        # 模拟真实的kernel计算开销
        if len(args) > 0 and hasattr(args[0], '__len__'):
            # 模拟一些计算操作
            return np.sum(np.abs(args[0])**2)
        return 1.0

class PerformanceTest:
    def __init__(self):
        self.gates = MockGates()
        # 使用较小的状态向量以加快测试
        self.state = np.random.random(2**8) + 1j * np.random.random(2**8)  # 8量子比特状态
        self.gate = np.array([[0, 1], [1, 0]], dtype=complex)  # X门
        self.qubits = np.array([0, 1, 2], dtype=int)  # 控制量子比特
        
        # 测试参数
        self.nqubits = 8
        self.target = 5
        self.kernel_name = 'apply_x'
        
        # 预计算kernel名称映射（用于方案2）
        self._kernel_names = {
            'single': {
                'apply_x': 'apply_x_kernel',
                'apply_y': 'apply_y_kernel',
                'apply_z': 'apply_z_kernel',
                'apply_gate': 'apply_gate_kernel',
            },
            'multi': {
                'apply_x': 'multicontrol_apply_x_kernel',
                'apply_y': 'multicontrol_apply_y_kernel',
                'apply_z': 'multicontrol_apply_z_kernel',
                'apply_gate': 'multicontrol_apply_gate_kernel',
            }
        }
        
        # 预初始化kernel缓存（用于混合策略）
        self._kernel_cache = self._initialize_common_kernels()
    
    def _initialize_common_kernels(self):
        """预初始化常用的kernel"""
        common_single = ['apply_x', 'apply_y', 'apply_z', 'apply_gate']
        common_multi = ['multicontrol_apply_x', 'multicontrol_apply_y', 'multicontrol_apply_z', 'multicontrol_apply_gate']
        
        cache = {'single': {}, 'multi': {}}
        
        for kernel_name in common_single:
            full_name = f"{kernel_name}_kernel"
            if hasattr(self.gates, full_name):
                cache['single'][kernel_name] = getattr(self.gates, full_name)
        
        for kernel_name in common_multi:
            full_name = f"{kernel_name}_kernel"
            if hasattr(self.gates, full_name):
                cache['multi'][kernel_name] = getattr(self.gates, full_name)
        
        return cache
    
    def _get_kernel(self, kernel_name, is_multicontrol=False):
        """获取kernel，优先从缓存获取，未命中则动态加载并缓存"""
        cache_type = 'multi' if is_multicontrol else 'single'
        cache = self._kernel_cache[cache_type]
        
        if kernel_name not in cache:
            full_name = f"{kernel_name}_kernel"
            if is_multicontrol:
                full_name = f"multicontrol_{full_name}"
            
            kernel_func = getattr(self.gates, full_name)
            cache[kernel_name] = kernel_func
        
        return cache[kernel_name]
    
    def original_method(self):
        """原始实现"""
        ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
        m = self.nqubits - self.target - 1
        nstates = 1 << (self.nqubits - ncontrols - 1)
        
        if ncontrols:
            kernel = getattr(self.gates, "multicontrol_{}_kernel".format(self.kernel_name))
            return kernel(self.state, self.gate, self.qubits, nstates, m)
        kernel = getattr(self.gates, "{}_kernel".format(self.kernel_name))
        return kernel(self.state, self.gate, nstates, m)
    
    def hybrid_strategy(self):
        """混合策略"""
        ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
        m = self.nqubits - self.target - 1
        nstates = 1 << (self.nqubits - ncontrols - 1)
        
        kernel_func = self._get_kernel(self.kernel_name, is_multicontrol=bool(ncontrols))
        
        if ncontrols:
            return kernel_func(self.state, self.gate, self.qubits, nstates, m)
        else:
            return kernel_func(self.state, self.gate, nstates, m)
    
    def simple_optimization(self):
        """简单优化"""
        ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
        m = self.nqubits - self.target - 1
        nstates = 1 << (self.nqubits - ncontrols - 1)
        
        if ncontrols:
            return getattr(self.gates, f"multicontrol_{self.kernel_name}_kernel")(self.state, self.gate, self.qubits, nstates, m)
        else:
            return getattr(self.gates, f"{self.kernel_name}_kernel")(self.state, self.gate, nstates, m)
    
    def kernel_names_mapping(self):
        """kernel名称映射方案"""
        ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
        m = self.nqubits - self.target - 1
        nstates = 1 << (self.nqubits - ncontrols - 1)
        
        if ncontrols:
            kernel_name = self._kernel_names['multi'].get(self.kernel_name, f"multicontrol_{self.kernel_name}_kernel")
            kernel_func = getattr(self.gates, kernel_name)
            return kernel_func(self.state, self.gate, self.qubits, nstates, m)
        else:
            kernel_name = self._kernel_names['single'].get(self.kernel_name, f"{self.kernel_name}_kernel")
            kernel_func = getattr(self.gates, kernel_name)
            return kernel_func(self.state, self.gate, nstates, m)
    
    def run_performance_test(self, num_iterations=50000):
        """运行性能测试"""
        print(f"运行性能测试，迭代次数: {num_iterations}")
        print("=" * 60)
        
        # 验证所有方法产生相同结果
        original_result = self.original_method()
        hybrid_result = self.hybrid_strategy()
        simple_result = self.simple_optimization()
        mapping_result = self.kernel_names_mapping()
        
        print(f"结果验证:")
        print(f"原始方法结果: {original_result}")
        print(f"混合策略结果: {hybrid_result}")
        print(f"简单优化结果: {simple_result}")
        print(f"名称映射结果: {mapping_result}")
        
        # 修正：正确使用np.allclose()比较多个值
        results_array = np.array([original_result, hybrid_result, simple_result, mapping_result])
        print(f"结果一致性: {np.allclose(results_array, results_array[0])}")
        print()
        
        # 性能测试
        methods = {
            'original_method': self.original_method,
            'mixed_strategy': self.hybrid_strategy,
            'simple_optimization': self.simple_optimization,
            'name_mapping': self.kernel_names_mapping,
        }
        
        results = {}
        
        for name, method in methods.items():
            # 预热
            for _ in range(100):
                method()
            
            # 实际测试
            time_taken = timeit.timeit(method, number=num_iterations)
            results[name] = time_taken
            print(f"{name}: {time_taken:.6f}秒")
        
        print()
        print("性能对比:")
        baseline = results['original_method']
        for name, time_taken in results.items():
            speedup = baseline / time_taken
            print(f"{name}: {time_taken:.6f}秒 (加速比: {speedup:.3f}x)")
        
        return results

def run_test_with_real_gates():
    """尝试使用真实的gates对象进行测试"""
    try:
        from qibojit.custom_operators import gates
        print("使用真实的gates对象进行测试")
        
        # 首先检查可用的kernel函数
        print("可用的kernel函数:")
        for attr in dir(gates):
            if 'kernel' in attr:
                print(f"  {attr}")
        
        class RealGatesTest:
            def __init__(self):
                self.gates = gates
                # 使用更小的状态向量以避免内存问题
                self.state = np.random.random(2**6) + 1j * np.random.random(2**6)  # 6量子比特状态
                # 使用单位矩阵作为门，避免参数不匹配
                self.gate = np.array([[1, 0], [0, 1]], dtype=complex)  # 单位门
                self.qubits = np.array([0], dtype=int)  # 只使用一个控制量子比特
                self.nqubits = 6
                self.target = 2
                self.kernel_name = 'apply_x'
            
            def original_method(self):
                """原始实现"""
                ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
                m = self.nqubits - self.target - 1
                nstates = 1 << (self.nqubits - ncontrols - 1)
                
                if ncontrols:
                    kernel = getattr(self.gates, "multicontrol_{}_kernel".format(self.kernel_name))
                    # 修正：确保参数类型匹配
                    return kernel(self.state.copy(), self.gate, self.qubits, nstates, m)
                kernel = getattr(self.gates, "{}_kernel".format(self.kernel_name))
                return kernel(self.state.copy(), self.gate, nstates, m)
            
            def simple_optimization(self):
                """简单优化"""
                ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
                m = self.nqubits - self.target - 1
                nstates = 1 << (self.nqubits - ncontrols - 1)
                
                if ncontrols:
                    return getattr(self.gates, f"multicontrol_{self.kernel_name}_kernel")(self.state.copy(), self.gate, self.qubits, nstates, m)
                else:
                    return getattr(self.gates, f"{self.kernel_name}_kernel")(self.state.copy(), self.gate, nstates, m)
        
        real_test = RealGatesTest()
        
        # 验证方法可用性
        try:
            print("测试简单优化方法...")
            result2 = real_test.simple_optimization()
            print(f"简单优化结果: {result2}")
            
            print("测试原始方法...")
            result1 = real_test.original_method()
            print(f"原始方法结果: {result1}")
            

            
            print("真实gates测试方法可用")
            
            # 运行性能测试
            methods = {
                'original_method': real_test.original_method,
                'simple_optimization': real_test.simple_optimization,
            }
            
            results = {}
            for number in range(50000,200000,10000):
                for name, method in methods.items():
                    time_taken = timeit.timeit(method, number=number)  # 减少迭代次数
                    results[name] = time_taken
                    print(f"{name}{number}: {time_taken:.6f}秒")
                print(f"{number}: {results['original_method'] / results['simple_optimization']:.3f}x")
            
            return results
            
        except Exception as e:
            print(f"真实gates测试失败: {e}")
            import traceback
            traceback.print_exc()
            return None
            
    except ImportError as e:
        print(f"无法导入真实gates: {e}")
        return None

# 运行测试
print("=== 模拟环境性能测试 ===")
test = PerformanceTest()
results = test.run_performance_test(num_iterations=50000)

print("\n=== 真实环境性能测试 ===")
real_results = run_test_with_real_gates()

# 简单可视化
try:
    import matplotlib.pyplot as plt
    
    methods = list(results.keys())
    times = list(results.values())
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(methods, times, color=['blue', 'green', 'red', 'orange'])
    plt.ylabel('times/s')
    plt.title('Kernel Performance Comparison')
    plt.xticks(rotation=45)
    
    # 添加数值标签
    for bar, time in zip(bars, times):
        plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(times)*0.01,
                f'{time:.6f}s', ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()
except ImportError:
    print("matplotlib未安装，跳过可视化")
import timeit
import numpy as np
import sys
import os

# 修正：直接设置路径，不使用__file__
# 根据当前工作目录设置qibojit路径
current_dir = os.getcwd()  # 获取当前工作目录
qibojit_path = os.path.join(current_dir, 'qibojit-repo', 'src')
if os.path.exists(qibojit_path):
    sys.path.append(qibojit_path)
    print(f"添加qibojit路径: {qibojit_path}")
else:
    print(f"路径不存在: {qibojit_path}")

class MockGates:
    """模拟gates对象，用于测试"""
    def __init__(self):
        # 创建一些模拟的kernel函数
        for name in ['apply_x', 'apply_y', 'apply_z', 'apply_gate']:
            setattr(self, f"{name}_kernel", self._mock_kernel)
            setattr(self, f"multicontrol_{name}_kernel", self._mock_kernel)
    
    def _mock_kernel(self, *args):
        """模拟kernel函数执行"""
        # 模拟真实的kernel计算开销
        if len(args) > 0 and hasattr(args[0], '__len__'):
            # 模拟一些计算操作
            return np.sum(np.abs(args[0])**2)
        return 1.0

class PerformanceTest:
    def __init__(self):
        self.gates = MockGates()
        # 使用较小的状态向量以加快测试
        self.state = np.random.random(2**8) + 1j * np.random.random(2**8)  # 8量子比特状态
        self.gate = np.array([[0, 1], [1, 0]], dtype=complex)  # X门
        self.qubits = np.array([0, 1, 2], dtype=int)  # 控制量子比特
        
        # 测试参数
        self.nqubits = 8
        self.target = 5
        self.kernel_name = 'apply_x'
        
        # 预计算kernel名称映射（用于方案2）
        self._kernel_names = {
            'single': {
                'apply_x': 'apply_x_kernel',
                'apply_y': 'apply_y_kernel',
                'apply_z': 'apply_z_kernel',
                'apply_gate': 'apply_gate_kernel',
            },
            'multi': {
                'apply_x': 'multicontrol_apply_x_kernel',
                'apply_y': 'multicontrol_apply_y_kernel',
                'apply_z': 'multicontrol_apply_z_kernel',
                'apply_gate': 'multicontrol_apply_gate_kernel',
            }
        }
        
        # 预初始化kernel缓存（用于混合策略）
        self._kernel_cache = self._initialize_common_kernels()
    
    def _initialize_common_kernels(self):
        """预初始化常用的kernel"""
        common_single = ['apply_x', 'apply_y', 'apply_z', 'apply_gate']
        common_multi = ['multicontrol_apply_x', 'multicontrol_apply_y', 'multicontrol_apply_z', 'multicontrol_apply_gate']
        
        cache = {'single': {}, 'multi': {}}
        
        for kernel_name in common_single:
            full_name = f"{kernel_name}_kernel"
            if hasattr(self.gates, full_name):
                cache['single'][kernel_name] = getattr(self.gates, full_name)
        
        for kernel_name in common_multi:
            full_name = f"{kernel_name}_kernel"
            if hasattr(self.gates, full_name):
                cache['multi'][kernel_name] = getattr(self.gates, full_name)
        
        return cache
    
    def _get_kernel(self, kernel_name, is_multicontrol=False):
        """获取kernel，优先从缓存获取，未命中则动态加载并缓存"""
        cache_type = 'multi' if is_multicontrol else 'single'
        cache = self._kernel_cache[cache_type]
        
        if kernel_name not in cache:
            full_name = f"{kernel_name}_kernel"
            if is_multicontrol:
                full_name = f"multicontrol_{full_name}"
            
            kernel_func = getattr(self.gates, full_name)
            cache[kernel_name] = kernel_func
        
        return cache[kernel_name]
    
    def original_method(self):
        """原始实现"""
        ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
        m = self.nqubits - self.target - 1
        nstates = 1 << (self.nqubits - ncontrols - 1)
        
        if ncontrols:
            kernel = getattr(self.gates, "multicontrol_{}_kernel".format(self.kernel_name))
            return kernel(self.state, self.gate, self.qubits, nstates, m)
        kernel = getattr(self.gates, "{}_kernel".format(self.kernel_name))
        return kernel(self.state, self.gate, nstates, m)
    
    def hybrid_strategy(self):
        """混合策略"""
        ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
        m = self.nqubits - self.target - 1
        nstates = 1 << (self.nqubits - ncontrols - 1)
        
        kernel_func = self._get_kernel(self.kernel_name, is_multicontrol=bool(ncontrols))
        
        if ncontrols:
            return kernel_func(self.state, self.gate, self.qubits, nstates, m)
        else:
            return kernel_func(self.state, self.gate, nstates, m)
    
    def simple_optimization(self):
        """简单优化"""
        ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
        m = self.nqubits - self.target - 1
        nstates = 1 << (self.nqubits - ncontrols - 1)
        
        if ncontrols:
            return getattr(self.gates, f"multicontrol_{self.kernel_name}_kernel")(self.state, self.gate, self.qubits, nstates, m)
        else:
            return getattr(self.gates, f"{self.kernel_name}_kernel")(self.state, self.gate, nstates, m)
    
    def kernel_names_mapping(self):
        """kernel名称映射方案"""
        ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
        m = self.nqubits - self.target - 1
        nstates = 1 << (self.nqubits - ncontrols - 1)
        
        if ncontrols:
            kernel_name = self._kernel_names['multi'].get(self.kernel_name, f"multicontrol_{self.kernel_name}_kernel")
            kernel_func = getattr(self.gates, kernel_name)
            return kernel_func(self.state, self.gate, self.qubits, nstates, m)
        else:
            kernel_name = self._kernel_names['single'].get(self.kernel_name, f"{self.kernel_name}_kernel")
            kernel_func = getattr(self.gates, kernel_name)
            return kernel_func(self.state, self.gate, nstates, m)
    
    def run_performance_test(self, num_iterations=50000):
        """运行性能测试"""
        print(f"运行性能测试，迭代次数: {num_iterations}")
        print("=" * 60)
        
        # 验证所有方法产生相同结果
        original_result = self.original_method()
        hybrid_result = self.hybrid_strategy()
        simple_result = self.simple_optimization()
        mapping_result = self.kernel_names_mapping()
        
        print(f"结果验证:")
        print(f"原始方法结果: {original_result}")
        print(f"混合策略结果: {hybrid_result}")
        print(f"简单优化结果: {simple_result}")
        print(f"名称映射结果: {mapping_result}")
        
        # 修正：正确使用np.allclose()比较多个值
        results_array = np.array([original_result, hybrid_result, simple_result, mapping_result])
        print(f"结果一致性: {np.allclose(results_array, results_array[0])}")
        print()
        
        # 性能测试
        methods = {
            'original_method': self.original_method,
            'mixed_strategy': self.hybrid_strategy,
            'simple_optimization': self.simple_optimization,
            'name_mapping': self.kernel_names_mapping,
        }
        
        results = {}
        
        for name, method in methods.items():
            # 预热
            for _ in range(100):
                method()
            
            # 实际测试
            time_taken = timeit.timeit(method, number=num_iterations)
            results[name] = time_taken
            print(f"{name}: {time_taken:.6f}秒")
        
        print()
        print("性能对比:")
        baseline = results['original_method']
        for name, time_taken in results.items():
            speedup = baseline / time_taken
            print(f"{name}: {time_taken:.6f}秒 (加速比: {speedup:.3f}x)")
        
        return results

def run_test_with_real_gates():
    """尝试使用真实的gates对象进行测试"""
    try:
        from qibojit.custom_operators import gates
        print("使用真实的gates对象进行测试")
        
        # 首先检查可用的kernel函数
        print("可用的kernel函数:")
        for attr in dir(gates):
            if 'kernel' in attr:
                print(f"  {attr}")
        
        class RealGatesTest:
            def __init__(self):
                self.gates = gates
                # 使用更小的状态向量以避免内存问题
                self.state = np.random.random(2**6) + 1j * np.random.random(2**6)  # 6量子比特状态
                # 使用单位矩阵作为门，避免参数不匹配
                self.gate = np.array([[1, 0], [0, 1]], dtype=complex)  # 单位门
                self.qubits = np.array([0], dtype=int)  # 只使用一个控制量子比特
                self.nqubits = 6
                self.target = 2
                self.kernel_name = 'apply_x'
            
            def original_method(self):
                """原始实现"""
                ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
                m = self.nqubits - self.target - 1
                nstates = 1 << (self.nqubits - ncontrols - 1)
                
                if ncontrols:
                    kernel = getattr(self.gates, "multicontrol_{}_kernel".format(self.kernel_name))
                    # 修正：确保参数类型匹配
                    return kernel(self.state.copy(), self.gate, self.qubits, nstates, m)
                kernel = getattr(self.gates, "{}_kernel".format(self.kernel_name))
                return kernel(self.state.copy(), self.gate, nstates, m)
            
            def simple_optimization(self):
                """简单优化"""
                ncontrols = len(self.qubits) - 1 if self.qubits is not None else 0
                m = self.nqubits - self.target - 1
                nstates = 1 << (self.nqubits - ncontrols - 1)
                
                if ncontrols:
                    return getattr(self.gates, f"multicontrol_{self.kernel_name}_kernel")(self.state.copy(), self.gate, self.qubits, nstates, m)
                else:
                    return getattr(self.gates, f"{self.kernel_name}_kernel")(self.state.copy(), self.gate, nstates, m)
        
        real_test = RealGatesTest()
        
        # 验证方法可用性
        try:
            print("测试简单优化方法...")
            result2 = real_test.simple_optimization()
            print(f"简单优化结果: {result2}")
            
            print("测试原始方法...")
            result1 = real_test.original_method()
            print(f"原始方法结果: {result1}")
            

            
            print("真实gates测试方法可用")
            
            # 运行性能测试
            methods = {
                'original_method': real_test.original_method,
                'simple_optimization': real_test.simple_optimization,
            }
            
            results = {}
            for number in range(50000,200000,10000):
                for name, method in methods.items():
                    time_taken = timeit.timeit(method, number=number)  # 减少迭代次数
                    results[name] = time_taken
                    print(f"{name}{number}: {time_taken:.6f}秒")
                print(f"{number}: {results['original_method'] / results['simple_optimization']:.3f}x")
            
            return results
            
        except Exception as e:
            print(f"真实gates测试失败: {e}")
            import traceback
            traceback.print_exc()
            return None
            
    except ImportError as e:
        print(f"无法导入真实gates: {e}")
        return None

# 运行测试
print("=== 模拟环境性能测试 ===")
test = PerformanceTest()
results = test.run_performance_test(num_iterations=50000)

print("\n=== 真实环境性能测试 ===")
real_results = run_test_with_real_gates()

# 简单可视化
try:
    import matplotlib.pyplot as plt
    
    methods = list(results.keys())
    times = list(results.values())
    
    plt.figure(figsize=(10, 6))
    bars = plt.bar(methods, times, color=['blue', 'green', 'red', 'orange'])
    plt.ylabel('times/s')
    plt.title('Kernel Performance Comparison')
    plt.xticks(rotation=45)
    
    # 添加数值标签
    for bar, time in zip(bars, times):
        plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + max(times)*0.01,
                f'{time:.6f}s', ha='center', va='bottom')
    
    plt.tight_layout()
    plt.show()
except ImportError:
    print("matplotlib未安装，跳过可视化")

添加qibojit路径: e:\quantum computing\Open-Source Projects for Quantum Computing Simulator\qibojit\qibojit-repo\src
=== 模拟环境性能测试 ===
运行性能测试，迭代次数: 50000
============================================================
结果验证:
原始方法结果: 177.0538840335347
混合策略结果: 177.0538840335347
简单优化结果: 177.0538840335347
名称映射结果: 177.0538840335347
结果一致性: True

original_method: 0.687339秒
mixed_strategy: 0.661780秒
simple_optimization: 0.525799秒
name_mapping: 0.510526秒

性能对比:
original_method: 0.687339秒 (加速比: 1.000x)
mixed_strategy: 0.661780秒 (加速比: 1.039x)
simple_optimization: 0.525799秒 (加速比: 1.307x)
name_mapping: 0.510526秒 (加速比: 1.346x)

=== 真实环境性能测试 ===
使用真实的gates对象进行测试
可用的kernel函数:
  apply_five_qubit_gate_kernel
  apply_four_qubit_gate_kernel
  apply_fsim_kernel
  apply_gate_kernel
  apply_multi_qubit_gate_kernel
  apply_swap_kernel
  apply_three_qubit_gate_kernel
  apply_two_qubit_gate_kernel
  apply_x_kernel
  apply_y_kernel
  apply_z_kernel
  apply_z_pow_kernel
  multicontrol_apply_fsim_kernel
  multicontrol_apply_gate_kernel
  multicontrol_apply_swap_kernel
  multicontrol_apply_two_qubit_gate_kernel
  multicontrol_apply_x_kernel
  multicontrol_apply_y_kernel
  multicontrol_apply_z_kernel
  multicontrol_apply_z_pow_kernel
测试简单优化方法...
简单优化结果: [0.49232478+0.4266018j  0.65352059+0.55626587j 0.92522344+0.1975287j
 0.57204344+0.51056033j 0.1607084 +0.49399837j 0.52865292+0.0456277j
 0.54601613+0.08493896j 0.71973855+0.51248814j 0.05435725+0.58581207j
 0.12366696+0.96942588j 0.00322969+0.88112864j 0.01036404+0.34883267j
 0.63271603+0.65510642j 0.87780938+0.86687329j 0.50955663+0.26400664j
 0.58861407+0.68435953j 0.66715502+0.11770161j 0.59454392+0.82455297j
 0.42256418+0.28679692j 0.06406048+0.20072472j 0.36340476+0.20459191j
 0.47677679+0.22918879j 0.76620207+0.53035512j 0.25044655+0.4689252j
 0.587563  +0.5286728j  0.08669136+0.07663705j 0.55702681+0.17591081j
 0.53670984+0.56426809j 0.32304943+0.03090943j 0.53908394+0.55298196j
 0.47355356+0.4459447j  0.8943081 +0.06265075j 0.26187606+0.04075815j
 0.36198835+0.11387272j 0.52167474+0.18809906j 0.98787046+0.97410734j
 0.9133646 +0.58314612j 0.07452548+0.45856729j 0.81622202+0.33858417j
 0.78959845+0.46885304j 0.43119006+0.74367877j 0.68793617+0.24613793j
 0.23688684+0.61408906j 0.3100413 +0.1480535j  0.34091141+0.63261345j
 0.41594191+0.18955635j 0.69456664+0.55302466j 0.26523346+0.25188066j
 0.10349014+0.11633114j 0.3433109 +0.25983223j 0.91698768+0.82810975j
 0.85284796+0.24805716j 0.57548862+0.8572856j  0.68216398+0.39035824j
 0.58408737+0.30058502j 0.89343809+0.44169146j 0.19339614+0.78277841j
 0.03762146+0.52426445j 0.81215509+0.17150426j 0.53671351+0.97955016j
 0.38802433+0.00308232j 0.57643139+0.22977221j 0.05288281+0.51203863j
 0.71193823+0.62294094j]
测试原始方法...
原始方法结果: [0.49232478+0.4266018j  0.65352059+0.55626587j 0.92522344+0.1975287j
 0.57204344+0.51056033j 0.1607084 +0.49399837j 0.52865292+0.0456277j
 0.54601613+0.08493896j 0.71973855+0.51248814j 0.05435725+0.58581207j
 0.12366696+0.96942588j 0.00322969+0.88112864j 0.01036404+0.34883267j
 0.63271603+0.65510642j 0.87780938+0.86687329j 0.50955663+0.26400664j
 0.58861407+0.68435953j 0.66715502+0.11770161j 0.59454392+0.82455297j
 0.42256418+0.28679692j 0.06406048+0.20072472j 0.36340476+0.20459191j
 0.47677679+0.22918879j 0.76620207+0.53035512j 0.25044655+0.4689252j
 0.587563  +0.5286728j  0.08669136+0.07663705j 0.55702681+0.17591081j
 0.53670984+0.56426809j 0.32304943+0.03090943j 0.53908394+0.55298196j
 0.47355356+0.4459447j  0.8943081 +0.06265075j 0.26187606+0.04075815j
 0.36198835+0.11387272j 0.52167474+0.18809906j 0.98787046+0.97410734j
 0.9133646 +0.58314612j 0.07452548+0.45856729j 0.81622202+0.33858417j
 0.78959845+0.46885304j 0.43119006+0.74367877j 0.68793617+0.24613793j
 0.23688684+0.61408906j 0.3100413 +0.1480535j  0.34091141+0.63261345j
 0.41594191+0.18955635j 0.69456664+0.55302466j 0.26523346+0.25188066j
 0.10349014+0.11633114j 0.3433109 +0.25983223j 0.91698768+0.82810975j
 0.85284796+0.24805716j 0.57548862+0.8572856j  0.68216398+0.39035824j
 0.58408737+0.30058502j 0.89343809+0.44169146j 0.19339614+0.78277841j
 0.03762146+0.52426445j 0.81215509+0.17150426j 0.53671351+0.97955016j
 0.38802433+0.00308232j 0.57643139+0.22977221j 0.05288281+0.51203863j
 0.71193823+0.62294094j]
真实gates测试方法可用
original_method50000: 0.520845秒
simple_optimization50000: 0.507761秒
50000: 1.026x
original_method60000: 0.659021秒
simple_optimization60000: 0.609389秒
60000: 1.081x
original_method70000: 0.763404秒
simple_optimization70000: 0.709089秒
70000: 1.077x
original_method80000: 0.846560秒
simple_optimization80000: 0.795685秒
80000: 1.064x
original_method90000: 0.949721秒
simple_optimization90000: 0.903281秒
90000: 1.051x
original_method100000: 1.087429秒
simple_optimization100000: 1.075576秒
100000: 1.011x
original_method110000: 1.118144秒
simple_optimization110000: 1.100071秒
110000: 1.016x
original_method120000: 1.342429秒
simple_optimization120000: 1.203490秒
120000: 1.115x
original_method130000: 1.414257秒
simple_optimization130000: 1.297099秒
130000: 1.090x
original_method140000: 1.497074秒
simple_optimization140000: 1.404276秒
140000: 1.066x
original_method150000: 1.614698秒
simple_optimization150000: 1.475749秒
150000: 1.094x
original_method160000: 1.680039秒
simple_optimization160000: 1.574654秒
160000: 1.067x
original_method170000: 1.854070秒
simple_optimization170000: 1.631000秒
170000: 1.137x
original_method180000: 2.034253秒
simple_optimization180000: 1.823956秒
180000: 1.115x
original_method190000: 2.013327秒
simple_optimization190000: 1.977299秒
190000: 1.018x

No description has been provided for this image

使用简单的优化可以提高one_qubit_base函数约1%-10%的性能。

In [ ]: