Commit 00cc4554 by jiangdongchen

rename and extract key information

parent ae59d2a2
.vscode/
others/
Papers/
\ No newline at end of file
Papers/
psrc/__pycache__/
json/
\ No newline at end of file
......@@ -4,19 +4,30 @@
- logLevel
- 取10表示DEBUG级别
- 取20表示INFO级别
- tableNum 需要处理的工作表数量
- maxItem 每个工作表的最大条目数
- python3.12.10
- 无法import的库使用pip install逐个安装
- `openai`, `pypdf`
- `python-Levenshtein`
# 使用方法
- 多模型交叉验证
- 成功后的日志样例在logs文件夹下
# 需求与解决方案
1. 下载论文pdf
1. 常用网站agent下载
2. 输出无法下载的条目
2. 自动化重命名
1. 读取excel表格中的论文名称和索引
2. 循环:读取pdf中的论文名称
1. 和excel表格中的论文名称进行模糊匹配
2. 匹配成功后
\ No newline at end of file
2. 自动化提取信息和格式化
1. 通过config.json读取配置对象
2. 遍历excel的工作表
1. 读取excel表格中的论文名称和索引
2. 循环:
1. 读取pdf中的论文名称和关键信息,存储到json文件夹下
2. 和excel表格中的论文名称进行模糊匹配
3. 匹配成功后
1. 用pdf文件中的论文名称和索引标准化重命名pdf文件和excel表格中的论文名称
2. 将pdf文件中的关键信息写入excel表格中, 包括作者姓名、机构、国家
4. 匹配失败后,输出无法匹配的条目
o 使用warning记录无法匹配的条目,方便后续处理
\ No newline at end of file
......@@ -3,7 +3,9 @@
"base_url": "https://api.siliconflow.cn/v1",
"model": "Pro/deepseek-ai/DeepSeek-V3",
"pdf_dir": "./Papers",
"result_path": "./result.json",
"excel_path": "./others/reference.xlsx",
"logLevel": 20
"result_dir": "./json",
"excel_path": "./others/论文被引用情况-陈老师-2025.05.01.xlsx",
"logLevel": 20,
"tableNum": 1,
"maxItem": 64
}
\ No newline at end of file
2025-05-07 22:39:27,684 - INFO - 程序启动,日志文件保存在: C:\Project\ReferenceTools\papertools\logs\citation_process.log
2025-05-07 22:39:28,032 - INFO - Processing sheet: j24-DianNao family
2025-05-07 22:39:28,034 - INFO - Processing 2-A Configurable Cloud-Scale DNN Processor for Real-Time AI.pdf
2025-05-07 22:39:46,762 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:39:46,768 - INFO - Renamed: 2-A Configurable Cloud-Scale DNN Processor for Real-Time AI.pdf -> 2-A Configurable Cloud-Scale DNN Processor for Real-Time AI.pdf
2025-05-07 22:39:46,769 - INFO - Matched: 2-A Configurable Cloud-Scale DNN Processor for Real-Time AI.pdf -> idx: 2, excel_name: A configurable cloud-scale DNN processor for real-time AI
2025-05-07 22:39:46,770 - INFO - Change: 2-A Configurable Cloud-Scale DNN Processor for Real-Time AI.pdf -> 2-A Configurable Cloud-Scale DNN Processor for Real-Time AI.pdf
2025-05-07 22:39:46,770 - INFO - Processing 26-Adversarial Deep Learning and Security with a Hardware Perspective.pdf
2025-05-07 22:39:52,751 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:39:52,756 - INFO - Renamed: 26-Adversarial Deep Learning and Security with a Hardware Perspective.pdf -> 26-Adversarial Deep Learning and Security with a Hardware Perspective.pdf
2025-05-07 22:39:52,757 - INFO - Matched: 26-Adversarial Deep Learning and Security with a Hardware Perspective.pdf -> idx: 26, excel_name: Adversarial Deep Learning and Security with a Hardware Perspective
2025-05-07 22:39:52,758 - INFO - Change: 26-Adversarial Deep Learning and Security with a Hardware Perspective.pdf -> 26-Adversarial Deep Learning and Security with a Hardware Perspective.pdf
2025-05-07 22:39:52,758 - INFO - Processing 27-Analysis and Optimization of Direct Convolution Execution on Multi-Core Processors.pdf
2025-05-07 22:40:03,722 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:40:03,728 - INFO - Renamed: 27-Analysis and Optimization of Direct Convolution Execution on Multi-Core Processors.pdf -> 27-Analysis and Optimization of Direct Convolution Execution on Multi-Core Processors.pdf
2025-05-07 22:40:03,729 - INFO - Matched: 27-Analysis and Optimization of Direct Convolution Execution on Multi-Core Processors.pdf -> idx: 27, excel_name: Analysis and Optimization of Direct Convolution Execution on Multi-Core Processors
2025-05-07 22:40:03,730 - INFO - Change: 27-Analysis and Optimization of Direct Convolution Execution on Multi-Core Processors.pdf -> 27-Analysis and Optimization of Direct Convolution Execution on Multi-Core Processors.pdf
2025-05-07 22:40:03,730 - INFO - Processing 3-A Domain-Specific Architecture for Deep Neural Networks.pdf
2025-05-07 22:40:11,616 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:40:11,619 - INFO - Renamed: 3-A Domain-Specific Architecture for Deep Neural Networks.pdf -> 3-A Domain-Specific Architecture for Deep Neural Networks.pdf
2025-05-07 22:40:11,621 - INFO - Matched: 3-A Domain-Specific Architecture for Deep Neural Networks.pdf -> idx: 3, excel_name: A domain-specific architecture for deep neural networks
2025-05-07 22:40:11,621 - INFO - Change: 3-A Domain-Specific Architecture for Deep Neural Networks.pdf -> 3-A Domain-Specific Architecture for Deep Neural Networks.pdf
2025-05-07 22:40:11,621 - INFO - Processing 53-An Analog Nearest Class with Multiple Centroids Classifier Implementation, for Depth of Anesthesia Monitoring.pdf
2025-05-07 22:40:24,488 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:40:24,498 - INFO - Renamed: 53-An Analog Nearest Class with Multiple Centroids Classifier Implementation, for Depth of Anesthesia Monitoring.pdf -> 53-An Analog Nearest Class with Multiple Centroids Classifier Implementation, for Depth of Anesthesia Monitoring.pdf
2025-05-07 22:40:24,499 - INFO - Matched: 53-An Analog Nearest Class with Multiple Centroids Classifier Implementation, for Depth of Anesthesia Monitoring.pdf -> idx: 53, excel_name: An Analog Nearest Class with Multiple Centroids Classifier Implementation, for Depth of Anesthesia Monitoring
2025-05-07 22:40:24,499 - INFO - Change: 53-An Analog Nearest Class with Multiple Centroids Classifier Implementation, for Depth of Anesthesia Monitoring.pdf -> 53-An Analog Nearest Class with Multiple Centroids Classifier Implementation, for Depth of Anesthesia Monitoring.pdf
2025-05-07 22:40:24,499 - INFO - Processing 54-Floating Gate Transistor-Based Accurate Digital In-Memory Computing for Deep Neural Networks.pdf
2025-05-07 22:40:40,867 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:40:40,875 - INFO - Renamed: 54-Floating Gate Transistor-Based Accurate Digital In-Memory Computing for Deep Neural Networks.pdf -> 54-Floating Gate Transistor-Based Accurate Digital In-Memory Computing for Deep Neural Networks.pdf
2025-05-07 22:40:40,877 - INFO - Matched: 54-Floating Gate Transistor-Based Accurate Digital In-Memory Computing for Deep Neural Networks.pdf -> idx: 54, excel_name: Floating Gate Transistor‐Based Accurate Digital In‐Memory Computing for Deep Neural Networks
2025-05-07 22:40:40,877 - INFO - Change: 54-Floating Gate Transistor-Based Accurate Digital In-Memory Computing for Deep Neural Networks.pdf -> 54-Floating Gate Transistor-Based Accurate Digital In-Memory Computing for Deep Neural Networks.pdf
2025-05-07 22:40:40,878 - INFO - Processing 61-A carbon-nanotube-based tensor processing unit.pdf
2025-05-07 22:41:06,610 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:41:06,618 - INFO - Renamed: 61-A carbon-nanotube-based tensor processing unit.pdf -> 61-A carbon-nanotube-based tensor processing unit.pdf
2025-05-07 22:41:06,619 - INFO - Matched: 61-A carbon-nanotube-based tensor processing unit.pdf -> idx: 61, excel_name: A carbon-nanotube-based tensor processing unit
2025-05-07 22:41:06,619 - INFO - Change: 61-A carbon-nanotube-based tensor processing unit.pdf -> 61-A carbon-nanotube-based tensor processing unit.pdf
2025-05-07 22:41:06,619 - INFO - Processing 62-Advancements in Accelerating Deep Neural Network Inference on AIoT Devices- A Survey.pdf
2025-05-07 22:41:22,719 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:41:22,726 - INFO - Renamed: 62-Advancements in Accelerating Deep Neural Network Inference on AIoT Devices- A Survey.pdf -> 62-Advancements in Accelerating Deep Neural Network Inference on AIoT Devices- A Survey.pdf
2025-05-07 22:41:22,727 - INFO - Matched: 62-Advancements in Accelerating Deep Neural Network Inference on AIoT Devices- A Survey.pdf -> idx: 62, excel_name: Advancements in accelerating deep neural network inference on aiot devices: A survey
2025-05-07 22:41:22,727 - INFO - Change: 62-Advancements in Accelerating Deep Neural Network Inference on AIoT Devices- A Survey.pdf -> 62-Advancements in Accelerating Deep Neural Network Inference on AIoT Devices- A Survey.pdf
2025-05-07 22:41:22,728 - INFO - Processing Arax_ A Runtime Framework for Decoupling.pdf
2025-05-07 22:41:36,150 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:41:36,159 - INFO - Renamed: Arax_ A Runtime Framework for Decoupling.pdf -> 57-Arax- A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators.pdf
2025-05-07 22:41:36,160 - INFO - Matched: Arax_ A Runtime Framework for Decoupling.pdf -> idx: 57, excel_name: Arax: a runtime framework for decoupling applications from heterogeneous accelerators
2025-05-07 22:41:36,160 - INFO - Change: Arax_ A Runtime Framework for Decoupling.pdf -> 57-Arax- A Runtime Framework for Decoupling Applications from Heterogeneous Accelerators.pdf
2025-05-07 22:41:36,161 - INFO - Processing ASRPU_A Programmable Accelerator for.pdf
2025-05-07 22:41:45,877 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:41:45,882 - INFO - Renamed: ASRPU_A Programmable Accelerator for.pdf -> 17-ASRPU- A Programmable Accelerator for Low-Power Automatic Speech Recognition.pdf
2025-05-07 22:41:45,884 - INFO - Matched: ASRPU_A Programmable Accelerator for.pdf -> idx: 17, excel_name: ASRPU: A Programmable Accelerator for Low-Power Automatic Speech Recognition
2025-05-07 22:41:45,884 - INFO - Change: ASRPU_A Programmable Accelerator for.pdf -> 17-ASRPU- A Programmable Accelerator for Low-Power Automatic Speech Recognition.pdf
2025-05-07 22:41:45,884 - INFO - Processing ASurvey on DeepLearning Hardware Accelerators for.pdf
2025-05-07 22:42:18,728 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:42:18,735 - INFO - Renamed: ASurvey on DeepLearning Hardware Accelerators for.pdf -> 31-A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms.pdf
2025-05-07 22:42:18,736 - INFO - Matched: ASurvey on DeepLearning Hardware Accelerators for.pdf -> idx: 31, excel_name: A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms
2025-05-07 22:42:18,737 - INFO - Change: ASurvey on DeepLearning Hardware Accelerators for.pdf -> 31-A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms.pdf
2025-05-07 22:42:18,737 - INFO - Processing A_Systolic_Array_with_Activation_Stationary_Dataflow_for_Deep_Fully-Connected_Networks.pdf
2025-05-07 22:42:29,508 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:42:29,515 - INFO - Renamed: A_Systolic_Array_with_Activation_Stationary_Dataflow_for_Deep_Fully-Connected_Networks.pdf -> 32-A Systolic Array with Activation Stationary Dataflow for Deep Fully-Connected Networks.pdf
2025-05-07 22:42:29,516 - INFO - Matched: A_Systolic_Array_with_Activation_Stationary_Dataflow_for_Deep_Fully-Connected_Networks.pdf -> idx: 32, excel_name: A Systolic Array with Activation Stationary Dataflow for Deep Fully-Connected Networks
2025-05-07 22:42:29,516 - INFO - Change: A_Systolic_Array_with_Activation_Stationary_Dataflow_for_Deep_Fully-Connected_Networks.pdf -> 32-A Systolic Array with Activation Stationary Dataflow for Deep Fully-Connected Networks.pdf
2025-05-07 22:42:29,516 - INFO - Processing Being-ahead_ Benchmarking and Exploring.pdf
2025-05-07 22:42:38,407 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:42:38,408 - INFO - Renamed: Being-ahead_ Benchmarking and Exploring.pdf -> 15-Being-ahead- Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment.pdf
2025-05-07 22:42:38,409 - INFO - Matched: Being-ahead_ Benchmarking and Exploring.pdf -> idx: 15, excel_name: Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment
2025-05-07 22:42:38,409 - INFO - Change: Being-ahead_ Benchmarking and Exploring.pdf -> 15-Being-ahead- Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment.pdf
2025-05-07 22:42:38,410 - INFO - Processing Brain Inspired Computing A Systematic Survey and Future Trends.pdf
2025-05-07 22:42:51,950 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:42:51,954 - INFO - Renamed: Brain Inspired Computing A Systematic Survey and Future Trends.pdf -> 47-Brain Inspired Computing- A Systematic Survey and Future Trends.pdf
2025-05-07 22:42:51,956 - INFO - Matched: Brain Inspired Computing A Systematic Survey and Future Trends.pdf -> idx: 47, excel_name: Brain Inspired Computing: A Systematic Survey and Future Trends
2025-05-07 22:42:51,957 - INFO - Change: Brain Inspired Computing A Systematic Survey and Future Trends.pdf -> 47-Brain Inspired Computing- A Systematic Survey and Future Trends.pdf
2025-05-07 22:42:51,957 - INFO - Processing Brain-Inspired_Computing_A_Systematic_Survey_and_Future_Trends.pdf
2025-05-07 22:43:17,038 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:43:17,044 - INFO - Renamed: Brain-Inspired_Computing_A_Systematic_Survey_and_Future_Trends.pdf -> 47-Brain-Inspired Computing- A Systematic Survey and Future Trends.pdf
2025-05-07 22:43:17,044 - INFO - Matched: Brain-Inspired_Computing_A_Systematic_Survey_and_Future_Trends.pdf -> idx: 47, excel_name: Brain Inspired Computing: A Systematic Survey and Future Trends
2025-05-07 22:43:17,044 - INFO - Change: Brain-Inspired_Computing_A_Systematic_Survey_and_Future_Trends.pdf -> 47-Brain-Inspired Computing- A Systematic Survey and Future Trends.pdf
2025-05-07 22:43:17,045 - INFO - Processing CASH_Compiler Assisted Hardware Design for Improving.pdf
2025-05-07 22:43:30,434 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:43:30,439 - INFO - Renamed: CASH_Compiler Assisted Hardware Design for Improving.pdf -> 12-CASH- Compiler Assisted Hardware Design for Improving DRAM Energy Efficiency in CNN Inference.pdf
2025-05-07 22:43:30,441 - INFO - Matched: CASH_Compiler Assisted Hardware Design for Improving.pdf -> idx: 12, excel_name: CASH: Compiler Assisted Hardware Design for Improving DRAM Energy Efficiency in CNN Inference
2025-05-07 22:43:30,442 - INFO - Change: CASH_Compiler Assisted Hardware Design for Improving.pdf -> 12-CASH- Compiler Assisted Hardware Design for Improving DRAM Energy Efficiency in CNN Inference.pdf
2025-05-07 22:43:30,442 - INFO - Processing Cloud-backed mobile cognition.pdf
2025-05-07 22:43:40,754 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:43:40,757 - INFO - Renamed: Cloud-backed mobile cognition.pdf -> 16-Cloud-backed mobile cognition Power-efficient deep learning in the autonomous vehicle era.pdf
2025-05-07 22:43:40,757 - INFO - Matched: Cloud-backed mobile cognition.pdf -> idx: 16, excel_name: Cloud-backed mobile cognition
2025-05-07 22:43:40,757 - INFO - Change: Cloud-backed mobile cognition.pdf -> 16-Cloud-backed mobile cognition Power-efficient deep learning in the autonomous vehicle era.pdf
2025-05-07 22:43:40,757 - INFO - Processing Design Methodologies and Tools for.pdf
2025-05-07 22:43:49,421 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:43:49,426 - INFO - Renamed: Design Methodologies and Tools for.pdf -> 56-Design Methodologies and Tools for Energy-aware IoT-based Applications.pdf
2025-05-07 22:43:49,428 - INFO - Matched: Design Methodologies and Tools for.pdf -> idx: 56, excel_name: Design Methodologies and Tools for Energy-aware IoT-based Applications
2025-05-07 22:43:49,428 - INFO - Change: Design Methodologies and Tools for.pdf -> 56-Design Methodologies and Tools for Energy-aware IoT-based Applications.pdf
2025-05-07 22:43:49,428 - INFO - Processing Effectively Scheduling Computational Graphs of.pdf
2025-05-07 22:44:03,656 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:44:03,664 - INFO - Renamed: Effectively Scheduling Computational Graphs of.pdf -> 43-Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their Domain-Specific Accelerators.pdf
2025-05-07 22:44:03,666 - INFO - Matched: Effectively Scheduling Computational Graphs of.pdf -> idx: 43, excel_name: Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their {Domain-Specific} Accelerators
2025-05-07 22:44:03,666 - INFO - Change: Effectively Scheduling Computational Graphs of.pdf -> 43-Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their Domain-Specific Accelerators.pdf
2025-05-07 22:44:03,666 - INFO - Processing Efficient_Acceleration_of_Deep_Learning_Inference_on_Resource-Constrained_Edge_Devices_A_Review.pdf
2025-05-07 22:44:20,627 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:44:20,635 - INFO - Renamed: Efficient_Acceleration_of_Deep_Learning_Inference_on_Resource-Constrained_Edge_Devices_A_Review.pdf -> 49-Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices- A Review.pdf
2025-05-07 22:44:20,636 - INFO - Matched: Efficient_Acceleration_of_Deep_Learning_Inference_on_Resource-Constrained_Edge_Devices_A_Review.pdf -> idx: 49, excel_name: Efficient acceleration of deep learning inference on resource-constrained edge devices: A review
2025-05-07 22:44:20,637 - INFO - Change: Efficient_Acceleration_of_Deep_Learning_Inference_on_Resource-Constrained_Edge_Devices_A_Review.pdf -> 49-Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices- A Review.pdf
2025-05-07 22:44:20,637 - INFO - Processing Enabling_Design_Methodologies_and_Future_Trends_for_Edge_AI_Specialization_and_Codesign.pdf
2025-05-07 22:44:37,248 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:44:37,253 - INFO - Renamed: Enabling_Design_Methodologies_and_Future_Trends_for_Edge_AI_Specialization_and_Codesign.pdf -> 23-Enabling Design Methodologies and Future Trends for Edge AI- Specialization and Codesign.pdf
2025-05-07 22:44:37,257 - INFO - Matched: Enabling_Design_Methodologies_and_Future_Trends_for_Edge_AI_Specialization_and_Codesign.pdf -> idx: 23, excel_name: Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Codesign
2025-05-07 22:44:37,258 - INFO - Change: Enabling_Design_Methodologies_and_Future_Trends_for_Edge_AI_Specialization_and_Codesign.pdf -> 23-Enabling Design Methodologies and Future Trends for Edge AI- Specialization and Codesign.pdf
2025-05-07 22:44:37,258 - INFO - Processing Energy and Performance Improvements for Convolutional.pdf
2025-05-07 22:44:47,294 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:44:47,300 - INFO - Renamed: Energy and Performance Improvements for Convolutional.pdf -> 38-Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support.pdf
2025-05-07 22:44:47,302 - INFO - Matched: Energy and Performance Improvements for Convolutional.pdf -> idx: 38, excel_name: Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support
2025-05-07 22:44:47,302 - INFO - Change: Energy and Performance Improvements for Convolutional.pdf -> 38-Energy and Performance Improvements for Convolutional Accelerators Using Lightweight Address Translation Support.pdf
2025-05-07 22:44:47,303 - INFO - Processing Energy-efficient application programming for green cloud computing.pdf
2025-05-07 22:45:09,199 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:45:09,204 - INFO - Renamed: Energy-efficient application programming for green cloud computing.pdf -> 42-Energy-efficient application programming for green cloud computing.pdf
2025-05-07 22:45:09,205 - INFO - Matched: Energy-efficient application programming for green cloud computing.pdf -> idx: 42, excel_name: Energy-Efficient Application Programming for Green Cloud Computing
2025-05-07 22:45:09,207 - INFO - Change: Energy-efficient application programming for green cloud computing.pdf -> 42-Energy-efficient application programming for green cloud computing.pdf
2025-05-07 22:45:09,207 - INFO - Processing Energy-Efficient_Machine_Learning_on_the_Edges.pdf
2025-05-07 22:45:18,439 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:45:18,444 - INFO - Renamed: Energy-Efficient_Machine_Learning_on_the_Edges.pdf -> 20-Energy-Efficient Machine Learning on the Edges.pdf
2025-05-07 22:45:18,472 - INFO - Matched: Energy-Efficient_Machine_Learning_on_the_Edges.pdf -> idx: 20, excel_name: Energy-Efficient Machine Learning on the Edges
2025-05-07 22:45:18,472 - INFO - Change: Energy-Efficient_Machine_Learning_on_the_Edges.pdf -> 20-Energy-Efficient Machine Learning on the Edges.pdf
2025-05-07 22:45:18,472 - INFO - Processing From_Cloud_Down_to_Things_An_Overview_of_Machine_Learning_in_Internet_of_Things.pdf
2025-05-07 22:45:31,652 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:45:31,657 - INFO - Renamed: From_Cloud_Down_to_Things_An_Overview_of_Machine_Learning_in_Internet_of_Things.pdf -> 11-From Cloud Down to Things- An Overview of Machine Learning in Internet of Things.pdf
2025-05-07 22:45:31,659 - INFO - Matched: From_Cloud_Down_to_Things_An_Overview_of_Machine_Learning_in_Internet_of_Things.pdf -> idx: 11, excel_name: From Cloud Down to Things: An Overview of Machine Learning in Internet of Things
2025-05-07 22:45:31,660 - INFO - Change: From_Cloud_Down_to_Things_An_Overview_of_Machine_Learning_in_Internet_of_Things.pdf -> 11-From Cloud Down to Things- An Overview of Machine Learning in Internet of Things.pdf
2025-05-07 22:45:31,660 - INFO - Processing FUSION OF AI WITH IOT (AI2OT) PARADIGM, CURRENT TRENDS,.pdf
2025-05-07 22:45:49,110 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:45:49,116 - INFO - Renamed: FUSION OF AI WITH IOT (AI2OT) PARADIGM, CURRENT TRENDS,.pdf -> 52-FUSION OF AI WITH IOT (AI2OT)- PARADIGM, CURRENT TRENDS, FUTURE DIRECTIONS.pdf
2025-05-07 22:45:49,117 - INFO - Matched: FUSION OF AI WITH IOT (AI2OT) PARADIGM, CURRENT TRENDS,.pdf -> idx: 52, excel_name: FUSION OF AI WITH IOT (AI2OT): PARADIGM, CURRENT TRENDS, FUTURE DIRECTIONS
2025-05-07 22:45:49,118 - INFO - Change: FUSION OF AI WITH IOT (AI2OT) PARADIGM, CURRENT TRENDS,.pdf -> 52-FUSION OF AI WITH IOT (AI2OT)- PARADIGM, CURRENT TRENDS, FUTURE DIRECTIONS.pdf
2025-05-07 22:45:49,118 - INFO - Processing Generator-Based Design of Custom Systems-on-Chip.pdf
2025-05-07 22:45:56,209 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:45:56,215 - INFO - Renamed: Generator-Based Design of Custom Systems-on-Chip.pdf -> 44-Generator-Based Design of Custom Systems-on-Chip for Numerical Data Analysis.pdf
2025-05-07 22:45:56,216 - INFO - Matched: Generator-Based Design of Custom Systems-on-Chip.pdf -> idx: 44, excel_name: Generator-Based Design of Custom Systems-on-Chip for Numerical Data Analysis
2025-05-07 22:45:56,216 - INFO - Change: Generator-Based Design of Custom Systems-on-Chip.pdf -> 44-Generator-Based Design of Custom Systems-on-Chip for Numerical Data Analysis.pdf
2025-05-07 22:45:56,216 - INFO - Processing Heron_Automatically Constrained High-Performance Library.pdf
2025-05-07 22:46:14,131 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:46:14,138 - INFO - Renamed: Heron_Automatically Constrained High-Performance Library.pdf -> 34-Heron- Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators.pdf
2025-05-07 22:46:14,139 - INFO - Matched: Heron_Automatically Constrained High-Performance Library.pdf -> idx: 34, excel_name: Heron: Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators
2025-05-07 22:46:14,139 - INFO - Change: Heron_Automatically Constrained High-Performance Library.pdf -> 34-Heron- Automatically Constrained High-Performance Library Generation for Deep Learning Accelerators.pdf
2025-05-07 22:46:14,139 - INFO - Processing In-datacenter_performance_analysis_of_a_tensor_processing_unit.pdf
2025-05-07 22:47:06,802 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:47:06,802 - INFO - Renamed: In-datacenter_performance_analysis_of_a_tensor_processing_unit.pdf -> 1-In-Datacenter Performance Analysis of a Tensor Processing Unit.pdf
2025-05-07 22:47:06,804 - INFO - Matched: In-datacenter_performance_analysis_of_a_tensor_processing_unit.pdf -> idx: 1, excel_name: In-datacenter performance analysis of a tensor processing unit
2025-05-07 22:47:06,804 - INFO - Change: In-datacenter_performance_analysis_of_a_tensor_processing_unit.pdf -> 1-In-Datacenter Performance Analysis of a Tensor Processing Unit.pdf
2025-05-07 22:47:06,804 - INFO - Processing Laius_Towards Latency Awareness and Improved Utilization of.pdf
2025-05-07 22:47:19,095 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:47:19,099 - INFO - Renamed: Laius_Towards Latency Awareness and Improved Utilization of.pdf -> 13-Laius- Towards Latency Awareness and Improved Utilization of Spatial Multitasking Accelerators in Datacenters.pdf
2025-05-07 22:47:19,101 - INFO - Matched: Laius_Towards Latency Awareness and Improved Utilization of.pdf -> idx: 13, excel_name: Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters
2025-05-07 22:47:19,101 - INFO - Change: Laius_Towards Latency Awareness and Improved Utilization of.pdf -> 13-Laius- Towards Latency Awareness and Improved Utilization of Spatial Multitasking Accelerators in Datacenters.pdf
2025-05-07 22:47:19,102 - INFO - Processing MG3MConv_ Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm.pdf
2025-05-07 22:47:26,381 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
2025-05-07 22:47:26,386 - INFO - Renamed: MG3MConv_ Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm.pdf -> 33-MG3MConv- Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor.pdf
2025-05-07 22:47:26,389 - INFO - Matched: MG3MConv_ Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm.pdf -> idx: 33, excel_name: MG3MConv: Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor
2025-05-07 22:47:26,389 - INFO - Change: MG3MConv_ Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm.pdf -> 33-MG3MConv- Multi-Grained Matrix-Multiplication-Mapping Convolution Algorithm toward the SW26010 Processor.pdf
2025-05-07 22:47:26,389 - INFO - Processing ML_Processors_Are_Going_Multi-Core_A_performance_dream_or_a_scheduling_nightmare.pdf
2025-05-07 22:47:37,222 - INFO - HTTP Request: POST https://api.siliconflow.cn/v1/chat/completions "HTTP/1.1 200 OK"
import json
import logging
import psrc.rename_extractInfo as RE
from openai import OpenAI
import psrc.citationProcess as CP
from pathlib import Path
if __name__ == "__main__":
# 获取当前脚本所在目录
# current_py_dir = os.path.dirname(os.path.abspath(__file__))
# 获取CWD
cwd_dir = Path.cwd()
# 构建 config.json 的完整路径
......@@ -19,12 +14,15 @@ if __name__ == "__main__":
config = json.load(f)
# Path对象后跟/用于连接地址
pdf_dir = (cwd_dir / config["pdf_dir"]).resolve()
rst_dir = (cwd_dir / config["result_path"]).resolve()
excel_path = (cwd_dir / config["excel_path"]).resolve()
# print(excel_path)
# 创建日志目录
log_dir = cwd_dir / "logs"
log_dir.mkdir(exist_ok=True)
# 配置日志系统
log_file = log_dir / "citation_process.log"
logLevel = config["logLevel"]
# logging.basicConfig(...) 是 Python 标准库 logging 模块中的一个函数,用于快速配置日志记录的基本设置.
# 设置日志记录的最低级别为 INFO, 只有日志级别大于等于 INFO 的日志记录才会被处理(例如 INFO、WARNING、ERROR、CRITICAL).
......@@ -34,11 +32,12 @@ if __name__ == "__main__":
# %(asctime)s:日志记录的时间戳(默认格式:YYYY-MM-DD HH:MM:SS)。
# %(levelname)s:日志级别名称(如 INFO, WARNING)。
# %(message)s:日志的具体内容。
level=logLevel, format="%(asctime)s - %(levelname)s - %(message)s"
level=logLevel, format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[
logging.FileHandler(log_file, encoding='utf-8'),
logging.StreamHandler()
]
)
client = OpenAI(api_key=config["api_key"],
base_url=config["base_url"])
# RE.main( pdf_dir, rst_dir, config["model"], client)
RE.read_rough_nameIndex_from_excel(excel_path)
logging.info(f"程序启动,日志文件保存在: {log_file}")
CP.citationProcess(config)
from pathlib import Path
import logging
from openai import OpenAI
import pypdf
import pandas as pd
import openpyxl
from fuzzywuzzy import fuzz
import json
def get_authors( content, configModel, client):
system_prompt = """
......@@ -15,10 +18,12 @@ def get_authors( content, configModel, client):
- Identify all listed authors. Maintain the order presented in the text if possible.
- For each author:
- Extract their full name as accurately as possible. Use "" if a name cannot be clearly identified for an entry.
- Extract all associated institutions/affiliations mentioned for that specific author.
- If an author has no listed institution, use an empty list `[]`.
- If there are many authors and only one afflication, these authors all come from the same afflication. other wise find the corresponding afflication by indicator.
- **Handling Missing Data:** If no authors can be identified in the text, the "authors" field in the JSON should be an empty list `[]`.
- **Institutions:**
- Extract all associated institutions of authors.
- **Countrys:**
- Extract all associated countrys of authors.
- **Handling Missing Data:** If no data of a field can be identified in the text, the field in the JSON should be an empty list `[]`.
- use highcase for first letter of key.
"""
response = client.chat.completions.create(
......@@ -44,7 +49,6 @@ def get_authors( content, configModel, client):
# # 增量输出返回值
# print(chunk.choices[0].delta.content, end="", flush=True) # 不换行刷新输出,流式输出
# Extracts text content from the first page of a PDF.
def extract_first_page_text(pdf_path):
......@@ -70,38 +74,102 @@ def extract_first_page_text(pdf_path):
return None
# excel表格的第4行开始读取索引和论文名称
def read_rough_nameIndex_from_excel(excel_path: Path):
# 读取 Excel 文件中的某个工作表
# 当你读取多个工作表时,pandas.read_excel(sheet_name=None) 会返回一个字典,其中:
# 键 是工作表的名称(sheet_name);
# 值 是每个工作表对应的 DataFrame。
# 通过 items(),你可以在一个循环中轻松地访问这两个部分
# 获取工作表的数据
excel_data = pd.read_excel(excel_path, sheet_name=None)
for sname, data in excel_data.items():
df = data.iloc[2:]
for index, row in df.iterrows():
print(row.iloc[0])
print(row.iloc[1])
def read_rough_nameIndex_from_excel(sheet, maxItem):
index_list = []
paperName_list = []
# 从第4行开始遍历
for idx, row in enumerate(sheet.iter_rows(min_row=4, values_only=True)):
if idx >= maxItem: # 限制读取的行数
break
if row[0] and row[2]: # 确保索引和论文名称都存在
index_list.append(row[0])
paperName_list.append(row[2])
def main(pdf_directory: Path, result_path: Path, configModel: str, client):
return index_list, paperName_list
with open(result_path, "w", encoding="utf-8") as f:
pdf_files = pdf_directory.rglob("*.pdf") # 递归搜索 recursive glob
def citationProcess(config: dict):
for file in pdf_files:
logging.info(f"Extract {file.name}'s authors")
client = OpenAI(api_key=config["api_key"],
base_url=config["base_url"])
first_page_text = extract_first_page_text(file)
logging.debug(first_page_text)
excel_path = Path(config["excel_path"])
# 读取Excel文件
wb = openpyxl.load_workbook(excel_path)
# 遍历工作簿中的所有工作表
for idx, sheet_name in enumerate(wb.sheetnames):
if idx == config["tableNum"]:
break
sheet = wb[sheet_name]
logging.info(f"Processing sheet: {sheet_name}")
index_list, paperName_list = read_rough_nameIndex_from_excel(sheet, config["maxItem"])
if first_page_text is not None:
result = get_authors(first_page_text, configModel, client)
rst_dir = Path.cwd() / config["result_dir"] / sheet_name
rst_dir.mkdir(parents=True, exist_ok=True) # 确保结果目录存在
if result:
f.write(result + "\n")
exit()
pdf_directory = Path.cwd() / config["pdf_dir"] / sheet_name
pdf_files = pdf_directory.rglob("*.pdf") # 递归搜索, 输出所有pdf文件的路径
# 遍历当前工作表对应的所有PDF文件
for file in pdf_files:
logging.info(f"Processing {file.name}")
first_page_text = extract_first_page_text(file)
exit()
\ No newline at end of file
if first_page_text is None:
logging.error(f"Failed to extract text from first page of {file.name}")
continue # 跳过当前文件继续处理下一个
configModel = config["model"]
# 提取关键信息
result = get_authors(first_page_text, configModel, client)
if result is not None:
# 解析JSON结果, 提取论文标题
result_dict = json.loads(result)
pdf_title = result_dict["Title"]
# 遍历Excel表项进行模糊匹配
for idx, excel_name in zip(index_list, paperName_list):
# 预处理字符串
# 返回pdf字符前的字符串,所以加上索引0
clean_excel_name = excel_name.split('.pdf')[0].replace(" ", "").replace("_", "").replace(":", "").replace("-", "")
clean_pdf_title = pdf_title.replace(" ", "").replace("_", "").replace(":", "").replace("-", "")
similarity = fuzz.partial_ratio(clean_pdf_title.lower(), clean_excel_name.lower())
if similarity >= 85:
# 重命名PDF文件
new_pdf_name = f"{idx}-{pdf_title.replace(':', '-')}.pdf" # 将冒号替换为连字符
new_pdf_path = file.parent / new_pdf_name
try:
file.rename(new_pdf_path)
logging.info(f"Renamed: {file.name} -> {new_pdf_name}")
except FileExistsError:
logging.warning(f"Renamed failed: filename {new_pdf_name} already exists with idx {idx}.")
break
# 存储关键信息到json文件中
rst_path = rst_dir / (f"{idx}" + ".json")
rst_path.write_text(result + "\n", encoding='utf-8') # 明确指定UTF-8编码
# 更新Excel中的表项
sheet.cell(row=idx+4, column=3, value=pdf_title) # 第3列是论文名称
logging.info(f"Matched: {file.name} -> idx: {idx}, excel_name: {excel_name}")
logging.info(f"Change: {file.name} -> {new_pdf_name}")
break
# 保存修改后的Excel文件
wb.save(excel_path)
\ No newline at end of file
{
"title": "A carbon-nanotube-based tensor processing unit",
"authors": [
{
"name": "Jia Si",
"affiliations": [
"Key Laboratory for the Physics and Chemistry of Nanodevices and Center for Carbon-based Electronics, School of Electronics, Peking University, Beijing, China"
]
},
{
"name": "Panpan Zhang",
"affiliations": [
"State Key Laboratory of Information Photonics and Optical Communications, Beijing University of Posts and Telecommunications, Beijing, China"
]
},
{
"name": "Chenyi Zhao",
"affiliations": [
"Key Laboratory for the Physics and Chemistry of Nanodevices and Center for Carbon-based Electronics, School of Electronics, Peking University, Beijing, China"
]
},
{
"name": "Dongyi Lin",
"affiliations": [
"Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan, China"
]
},
{
"name": "Lin Xu",
"affiliations": [
"Key Laboratory for the Physics and Chemistry of Nanodevices and Center for Carbon-based Electronics, School of Electronics, Peking University, Beijing, China"
]
},
{
"name": "Haitao Xu",
"affiliations": [
"Beijing Institute of Carbon-based Integrated Circuits, Beijing, China"
]
},
{
"name": "Lijun Liu",
"affiliations": [
"Key Laboratory for the Physics and Chemistry of Nanodevices and Center for Carbon-based Electronics, School of Electronics, Peking University, Beijing, China"
]
},
{
"name": "Jianhua Jiang",
"affiliations": [
"Key Laboratory for the Physics and Chemistry of Nanodevices and Center for Carbon-based Electronics, School of Electronics, Peking University, Beijing, China"
]
},
{
"name": "Lian-Mao Peng",
"affiliations": [
"Key Laboratory for the Physics and Chemistry of Nanodevices and Center for Carbon-based Electronics, School of Electronics, Peking University, Beijing, China",
"Beijing Institute of Carbon-based Integrated Circuits, Beijing, China"
]
},
{
"name": "Zhiyong Zhang",
"affiliations": [
"Key Laboratory for the Physics and Chemistry of Nanodevices and Center for Carbon-based Electronics, School of Electronics, Peking University, Beijing, China",
"Hunan Institute of Advanced Sensing and Information Technology, Xiangtan University, Xiangtan, China",
"Beijing Institute of Carbon-based Integrated Circuits, Beijing, China"
]
}
]
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment