Commit d93eaf37 by ZhangXiaoyun

Merge branch 'master' of http://62.234.201.16/ZhangXiaoyun/prm

parents 0fd1bcf9 56b04132
Currently Loaded Modulefiles:
1) cluster-tools/v1.0 4) slurm-tools/v1.0 7) cuda-cudnn/11.8-8.8.1
2) git/2.31.1 5) cmake/3.21.7
3) python3/3.8.16 6) mpich/3.2.1
Currently Loaded Modulefiles:
1) cluster-tools/v1.0 4) slurm-tools/v1.0 7) cuda-cudnn/11.8-8.8.1
2) git/2.31.1 5) cmake/3.21.7
3) python3/3.8.16 6) mpich/3.2.1
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
2025-02-27 14:48:50,096 INFO worker.py:1816 -- Started a local Ray instance.
[2025-02-27 14:48:52,314 C 974958 974958] shared_memory.cc:32: mmap failed
*** StackTrace Information ***
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(+0x114367a) [0x7ff05c96867a] ray::operator<<()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray6RayLogD1Ev+0x4d1) [0x7ff05c96ad01] ray::RayLog::~RayLog()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(+0x9b9828) [0x7ff05c1de828] plasma::ClientMmapTableEntry::ClientMmapTableEntry()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(+0x9b4810) [0x7ff05c1d9810] plasma::PlasmaClient::Impl::GetStoreFdAndMmap()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN6plasma12PlasmaClient4Impl17HandleCreateReplyERKN3ray8ObjectIDEbPKhPmPSt10shared_ptrINS2_6BufferEE+0x3bf) [0x7ff05c1da6af] plasma::PlasmaClient::Impl::HandleCreateReply()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN6plasma12PlasmaClient4Impl22CreateAndSpillIfNeededERKN3ray8ObjectIDERKNS2_3rpc7AddressEblPKhlPSt10shared_ptrINS2_6BufferEENS_7flatbuf12ObjectSourceEi+0x22b) [0x7ff05c1db12b] plasma::PlasmaClient::Impl::CreateAndSpillIfNeeded()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN6plasma12PlasmaClient22CreateAndSpillIfNeededERKN3ray8ObjectIDERKNS1_3rpc7AddressEblPKhlPSt10shared_ptrINS1_6BufferEENS_7flatbuf12ObjectSourceEi+0x2b) [0x7ff05c1db75b] plasma::PlasmaClient::CreateAndSpillIfNeeded()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core29CoreWorkerPlasmaStoreProvider6CreateERKSt10shared_ptrINS_6BufferEEmRKNS_8ObjectIDERKNS_3rpc7AddressEPS4_bb+0xe6) [0x7ff05c16ef36] ray::core::CoreWorkerPlasmaStoreProvider::Create()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core29CoreWorkerPlasmaStoreProvider11WarmupStoreEv+0x9b) [0x7ff05c1714cb] ray::core::CoreWorkerPlasmaStoreProvider::WarmupStore()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core29CoreWorkerPlasmaStoreProviderC2ERKSsSt10shared_ptrINS_6raylet12RayletClientEES4_INS0_16ReferenceCounterEESt8functionIFNS_6StatusEvEEbSA_IFSsvEE+0x36b) [0x7ff05c171bab] ray::core::CoreWorkerPlasmaStoreProvider::CoreWorkerPlasmaStoreProvider()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core10CoreWorkerC1ERKNS0_17CoreWorkerOptionsERKNS_8WorkerIDE+0x205b) [0x7ff05c0fc5eb] ray::core::CoreWorker::CoreWorker()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core21CoreWorkerProcessImplC2ERKNS0_17CoreWorkerOptionsE+0x566) [0x7ff05c10ed16] ray::core::CoreWorkerProcessImpl::CoreWorkerProcessImpl()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(_ZN3ray4core17CoreWorkerProcess10InitializeERKNS0_17CoreWorkerOptionsE+0x98) [0x7ff05c10fe68] ray::core::CoreWorkerProcess::Initialize()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(+0x74433f) [0x7ff05bf6933f] __pyx_pw_3ray_7_raylet_10CoreWorker_1__cinit__()
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/lib/python3.10/site-packages/ray/_raylet.so(+0x7455e9) [0x7ff05bf6a5e9] __pyx_tp_new_3ray_7_raylet_CoreWorker()
python(_PyObject_MakeTpCall+0x193) [0x4f64b3] _PyObject_MakeTpCall
python(_PyEval_EvalFrameDefault+0x53ee) [0x4f275e] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6f) [0x4fcf3f] _PyFunction_Vectorcall
python(_PyEval_EvalFrameDefault+0x13b2) [0x4ee722] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6f) [0x4fcf3f] _PyFunction_Vectorcall
python(PyObject_Call+0xb8) [0x508cd8] PyObject_Call
python(_PyEval_EvalFrameDefault+0x2de4) [0x4f0154] _PyEval_EvalFrameDefault
python(_PyFunction_Vectorcall+0x6f) [0x4fcf3f] _PyFunction_Vectorcall
python(_PyEval_EvalFrameDefault+0x13b2) [0x4ee722] _PyEval_EvalFrameDefault
python() [0x5924f2] _PyEval_Vector
python(PyEval_EvalCode+0x87) [0x592437] PyEval_EvalCode
python() [0x5c3237] run_eval_code_obj
python() [0x5be380] run_mod
python() [0x4598d6] pyrun_file.cold
python(_PyRun_SimpleFileObject+0x19f) [0x5b890f] _PyRun_SimpleFileObject
python(_PyRun_AnyFileObject+0x43) [0x5b8673] _PyRun_AnyFileObject
python(Py_RunMain+0x38d) [0x5b542d] Py_RunMain
python(Py_BytesMain+0x39) [0x585609] Py_BytesMain
/lib64/libc.so.6(__libc_start_main+0xe5) [0x7ff1bb02dd85] __libc_start_main
python() [0x5854be]
Job start at 2025-02-27 14:46:56
Job run at:
Static hostname: localhost.localdomain
Transient hostname: r8a100-d01
Icon name: computer-server
Chassis: server
Machine ID: af6fe29e1ea7413c9518073fffae5e4a
Boot ID: 41d3b695cf27447cb7da3a3bfb840cb5
Operating System: Rocky Linux 8.7 (Green Obsidian)
CPE OS Name: cpe:/o:rocky:rocky:8:GA
Kernel: Linux 4.18.0-425.10.1.el8_7.x86_64
Architecture: x86-64
Have already added /tools/cluster-modulefiles into $MODULEPATH
/usr/bin/gcc
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/bin/python
/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/bin/python3
############### /home : /home/S/zhangxiaoyun
Disk quotas for user zhangxiaoyun (uid 6191):
Filesystem space quota limit grace files quota limit grace
/home 5198M 16384M 20480M 115k 0 0
############### /workspace
Disk quotas for user zhangxiaoyun (uid 6191):
Filesystem space quota limit grace files quota limit grace
/workspace 78781M 400G 500G 719k 0 0
############### /nfs_global
Disk quotas for user zhangxiaoyun (uid 6191):
Filesystem space quota limit grace files quota limit grace
/nfs_global 1594G 5120G 7168G 39496 5000k 10000k
############### /lustre
Disk quotas for usr zhangxiaoyun (uid 6191):
Filesystem used quota limit grace files quota limit grace
/lustre 0k 8T 10T - 0 3000000 36000000 -
uid 6191 is using default block quota setting
uid 6191 is using default file quota setting
Thu Feb 27 14:46:57 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100 80GB PCIe On | 00000000:35:00.0 Off | 0 |
| N/A 36C P0 56W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100 80GB PCIe On | 00000000:36:00.0 Off | 0 |
| N/A 37C P0 56W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100 80GB PCIe On | 00000000:39:00.0 Off | 0 |
| N/A 37C P0 56W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100 80GB PCIe On | 00000000:3D:00.0 Off | 0 |
| N/A 34C P0 55W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A100 80GB PCIe On | 00000000:9C:00.0 Off | 0 |
| N/A 34C P0 55W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A100 80GB PCIe On | 00000000:9D:00.0 Off | 0 |
| N/A 37C P0 57W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A100 80GB PCIe On | 00000000:A0:00.0 Off | 0 |
| N/A 34C P0 54W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A100 80GB PCIe On | 00000000:A4:00.0 Off | 0 |
| N/A 35C P0 55W / 300W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Use GPU 0,1,2,3,4,5,6,7
PYTHON_EXECUTABLE=/workspace/S/zhangxiaoyun/miniconda3/envs/open_reasoner/bin/python3
Wait 5 seconds ...
Starting workers
Job end at 2025-02-27 14:48:52
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment