Commit 4ac20bd3 by hanhusheng

增加引用爬取结果,以及CCFA严格匹配脚本

parent cdbf5337
abbr,fullname
PPoPP,ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming
FAST,USENIX Conference on File and Storage Technologies
DAC,Design Automation Conference
HPCA,IEEE International Symposium on High Performance Computer Architecture
MICRO,IEEE/ACM International Symposium on Microarchitecture
SC,"International Conference for High Performance Computing, Networking, Storage, and Analysis"
ASPLOS,International Conference on Architectural Support for Programming Languages and Operating Systems
ISCA,International Symposium on Computer Architecture
USENIX ATC,USENIX Annual Technical Conference
EuroSys,European Conference on Computer Systems
SIGCOMM,"ACM International Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication"
MobiCom,ACM International Conference on Mobile Computing and Networking
INFOCOM,IEEE International Conference on Computer Communications
NSDI,Symposium on Network System Design and Implementation
CCS,ACM Conference on Computer and Communications Security
EUROCRYPT,International Conference on the Theory and Applications of Cryptographic Techniques
S&P,IEEE Symposium on Security and Privacy
CRYPTO,International Cryptology Conference
USENIX Security,USENIX Security Symposium
NDSS,Network and Distributed System Security Symposium
PLDI,ACM SIGPLAN Conference on Programming Language Design and Implementation
POPL,ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
FSE,ACM International Conference on the Foundations of Software Engineering
SOSP,ACM Symposium on Operating Systems Principles
OOPSLA,"Conference on Object-Oriented Programming Systems, Languages,and Applications"
ASE,International Conference on Automated Software Engineering
ICSE,International Conference on Software Engineering
ISSTA,International Symposium on Software Testing and Analysis
OSDI,USENIX Symposium on Operating Systems Design and Implementations
FM,International Symposium on Formal Methods
SIGMOD,ACM SIGMOD Conference
SIGKDD,ACM SIGKDD Conference on Knowledge Discovery and Data Mining
ICDE,IEEE International Conference on Data Engineering
SIGIR,International ACM SIGIR Conference on Research and Development in Information Retrieval
VLDB,International Conference on Very Large Data Bases
STOC,ACM Symposium on Theory of Computing
SODA,ACM-SIAM Symposium on Discrete Algorithms
CAV,International Conference on Computer Aided Verification
FOCS,IEEE Annual Symposium on Foundations of Computer Science
LICS,ACM/IEEE Symposium on Logic in Computer Science
ACM MM,ACM International Conference on Multimedia
SIGGRAPH,ACM Special Interest Group on Computer Graphics
VR,IEEE Virtual Reality
IEEE VIS,IEEE Visualization Conference
AAAI,AAAI Conference on Artificial Intelligence
NeurIPS,Conference on Neural Information Processing Systems
ACL,Annual Meeting of the Association for Computational Linguistics
CVPR,IEEE/CVF Computer Vision and Pattern Recognition Conference
ICCV,International Conference on Computer Vision
ICML,International Conference on Machine Learning
IJCAI,International Joint Conference on Artificial Intelligence
CSCW,ACM Conference on Computer Supported Cooperative Work and Social Computing
CHI,ACM Conference on Human Factors in Computing Systems
UbiComp/IMWUT,"ACM international joint conference on Pervasive and Ubiquitous Computing/ Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"
UIST,ACM Symposium on User Interface Software and Technology
WWW,International World Wide Web Conference
RTSS,IEEE Real-Time Systems Symposium
WINE,Conference on Web and Internet Economics
TOCS,ACM Transactions on Computer Systems
TOS,ACM Transactions on Storage
TCAD,IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
TC,IEEE Transactions on Computers
TPDS,IEEE Transactions on Parallel and Distributed Systems
TACO,ACM Transactions on Architecture and Code Optimization
JSAC,IEEE Journal on Selected Areas in Communications
TMC,IEEE Transactions on Mobile Computing
TON,IEEE/ACM Transactions on Networking
TDSC,IEEE Transactions on Dependable and Secure Computing
TIFS,IEEE Transactions on Information Forensics and Security
,Journal of Cryptology
TOPLAS,ACM Transactions on Programming Languages and Systems
TOSEM,ACM Transactions on Software Engineering and Methodology
TSE,IEEE Transactions on Software Engineering
TSC,IEEE Transactions on Services Computing
TODS,ACM Transactions on Database Systems
TOIS,ACM Transactions on Information Systems
TKDE,IEEE Transactions on Knowledge and Data Engineering
VLDBJ,The VLDB Journal
TIT,IEEE Transactions on Information Theory
IANDC,Information and Computation
SICOMP,SIAM Journal on Computing
TOG,ACM Transactions on Graphics
TIP,IEEE Transactions on Image Processing
TVCG,IEEE Transactions on Visualization and Computer Graphics
AI,Artificial Intelligence
TPAMI,IEEE Transactions on Pattern Analysis and Machine Intelligence
IJCV,International Journal of Computer Vision
JMLR,Journal of Machine Learning Research
TOCHI,ACM Transactions on Computer-Human Interaction
IJHCS,International Journal of Human-Computer Studies
JACM,Journal of the ACM
Proc. IEEE,Proceedings of the IEEE
SCIS,Science China Information Sciences
import openpyxl
import csv
import sys
import re
import os
# ========== 配置 =============
excel_path = '测试输入.xlsx'
ccf_a_csv = 'CCF_A_list.csv'
sheet_names = ["Dadiannao"] # e.g. ["Dadiannao", "Diannao"]
# =============================
# ========== 输出文件 =============
# 测试输出_CCF_A判断.xlsx
# =============================
def read_ccf_a_list(csv_path):
abbr_set = set()
fullname_set = set()
with open(csv_path, encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
for row in reader:
abbr = row['abbr'].strip()
fullname = row['fullname'].strip()
if abbr:
abbr_set.add(abbr)
if fullname:
fullname_set.add(fullname)
return abbr_set, fullname_set
def extract_bracket_content(s):
"""
提取第一个括号内的内容(支持中英文括号)
"""
match = re.search(r'[((](.*?)[))]', s)
return match.group(1).strip() if match else None
def main():
# 1. 读取CCF-A列表
abbr_set, fullname_set = read_ccf_a_list(ccf_a_csv)
# 2. 读取Excel
wb = openpyxl.load_workbook(excel_path)
for sheetname in sheet_names:
if sheetname not in wb.sheetnames:
print(f"错误:Excel中不存在Sheet:{sheetname}")
continue
ws = wb[sheetname]
# 3. 跳过前三行,第四行为标题
header_row_idx = 4
header = [cell.value if cell.value is not None else "" for cell in ws[header_row_idx]]
# 4. 检查第四列和第五列的标题
if len(header) < 5:
print(f"错误:{sheetname} 页标题列数不足5列")
sys.exit(1)
col4 = header[3]
col5 = header[4]
if not (str(col4).strip() == "期刊/会议名称" and str(col5).strip() == "是否是CCF-A"):
print(f"错误:{sheetname} 页,标题栏应该在第四行!!第四列标题为【{col4}】,第五列标题为【{col5}】,不符合要求!")
sys.exit(1)
# 5. 逐行处理
for row in ws.iter_rows(min_row=header_row_idx+1):
name_cell = row[3] # 第四列
result_cell = row[4] # 第五列
name = name_cell.value
if not name or not str(name).strip():
result_cell.value = ""
continue
name_str = str(name).strip()
# 6. 先提取括号内内容
bracket = extract_bracket_content(name_str)
is_ccfa = False
if bracket and bracket in abbr_set:
is_ccfa = True
elif name_str in fullname_set:
is_ccfa = True
result_cell.value = "是" if is_ccfa else "否"
# 7. 保存新文件
base, ext = os.path.splitext(excel_path)
out_path = f"{base}_CCF_A判断{ext}"
wb.save(out_path)
print(f"已保存结果文件: {out_path}")
if __name__ == '__main__':
main()
\ No newline at end of file
# CCF-A 期刊/会议自动判别脚本
## 功能简介
本脚本可自动判断 Excel 文件中每条期刊/会议名称是否属于 CCF-A 类别,并在指定列中填写“是”或“否”。
- 支持批量处理指定 Sheet。
- 自动比对 CCF-A 会议/期刊缩写(括号内)或全称。
- 输出标注好“是否是CCF-A”的新 Excel 文件。
# [!] 本脚本采用精确匹配,因此更容易出现假阴性(即:实际应该是 CCF-A,但被判定为“否”),而不容易出现假阳性(即:实际不是 CCF-A,被判定为“是”)。
## 输入文件说明
### 1. CCF_A_list.csv
- 两列,带表头:`abbr,fullname`
- `abbr`:CCF-A会议/期刊缩写(如 AAAI)
- `fullname`:CCF-A会议/期刊全称(如 AAAI Conference on Artificial Intelligence)
示例:
```
abbr,fullname
AAAI,AAAI Conference on Artificial Intelligence
SIGMOD,ACM SIGMOD International Conference on Management of Data
```
### 2. 测试输入.xlsx
- 至少包含一个待检查的 Sheet,Sheet 名在脚本 `sheet_names` 变量设定。
- 每个 Sheet 的**第4行为标题行**,且**第4列为“期刊/会议名称”****第5列为“是否是CCF-A”**
标题示例(第4行应如下):
| ... | 期刊/会议名称 | 是否是CCF-A | ... |
|-----|--------------|-------------|-----|
---
## 脚本用法
1.`CCF_A_list.csv``测试输入.xlsx` 放在同一目录下。
2. 修改脚本头部 `sheet_names`,如:
```python
sheet_names = ["Dadiannao"]
```
3. 运行脚本:
```bash
python check_ccfa.py
```
4. 程序会输出新文件:
`测试输入_CCF_A判断.xlsx`
其中每条记录的“是否是CCF-A”列将被自动填充“是”或“否”。
---
## 判别规则
- 优先提取“期刊/会议名称”中的括号内容(支持中英文括号),与 CCF-A 列表 `abbr` 精确匹配。
- 如果未匹配,再用整个“期刊/会议名称”与 CCF-A 的 `fullname` 精确匹配。
- 匹配到即判为“是”,否则为“否”。
---
## 依赖环境
- openpyxl
可通过 pip 安装依赖:
```bash
pip install openpyxl
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment