The running of memory due to large m for dmrg and iqdmrg of one dimensional spin system

Question

The running of memory due to large m for dmrg and iqdmrg of one dimensional spin system

asked Jan 24, 2018 by Victor Chang (650 points)
edited Jan 24, 2018 by Victor Chang

Hi, Miles.

In order to achieve the accuracy(cutoff) such as 1E-10 for my 1-D spin one system on , I need to put m to 4800 on iqdmrg. Here is my sweep:
Sweeps:
1 Maxm=10, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-07
2 Maxm=20, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-08
3 Maxm=100, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-10
4 Maxm=200, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-15
5 Maxm=400, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-20
6 Maxm=800, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=1.0E-30
7 Maxm=1200, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=0.0E+00
8 Maxm=2400, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=0.0E+00
9 Maxm=4800, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=0.0E+00
10 Maxm=9600, Minm=1, Cutoff=1.0E-10, Niter=2, Noise=0.0E+00

vN Entropy at center bond b=31 = 1.238744298544
Eigs at center bond b=31: 0.5049 0.3166 0.1094 0.0282 0.0180 0.0094 0.0059 0.0026 0.0024 0.0017 
Largest m during sweep 1 was 10
Largest truncation error: 0.0258392
Energy after sweep 1 is 5.602725754556
Sweep 1 CPU time = 2.204s (Wall time = 2.205s)

vN Entropy at center bond b=31 = 1.612233679937
Eigs at center bond b=31: 0.3923 0.2871 0.1657 0.0503 0.0338 0.0229 0.0115 0.0107 0.0076 0.0071 
Largest m during sweep 2 was 20
Largest truncation error: 0.0028786
Energy after sweep 2 is 0.530196370527
Sweep 2 CPU time = 36.54s (Wall time = 7.408s)

vN Entropy at center bond b=31 = 1.925120785867
Eigs at center bond b=31: 0.3345 0.2587 0.1691 0.0615 0.0459 0.0318 0.0192 0.0188 0.0125 0.0120 
Largest m during sweep 3 was 100
Largest truncation error: 4.6379e-05
Energy after sweep 3 is -1.624350530351
Sweep 3 CPU time = 5m, 44.1s (Wall time = 22.07s)

vN Entropy at center bond b=31 = 2.274203834242
Eigs at center bond b=31: 0.2817 0.2263 0.1570 0.0639 0.0589 0.0345 0.0280 0.0265 0.0227 0.0190 
Largest m during sweep 4 was 200
Largest truncation error: 4.08592e-05
Energy after sweep 4 is -1.868673250230
Sweep 4 CPU time = 22m, 17s (Wall time = 1m, 25.4s)

vN Entropy at center bond b=31 = 2.575774377167
Eigs at center bond b=31: 0.2459 0.1961 0.1393 0.0676 0.0569 0.0356 0.0332 0.0332 0.0312 0.0268 
Largest m during sweep 5 was 400
Largest truncation error: 1.87256e-05
Energy after sweep 5 is -1.952988038907
Sweep 5 CPU time = 1h, 23m, 11s (Wall time = 5m, 19.3s)

vN Entropy at center bond b=31 = 2.959973291749
Eigs at center bond b=31: 0.1998 0.1549 0.1120 0.0760 0.0503 0.0458 0.0420 0.0412 0.0307 0.0254 
Largest m during sweep 6 was 800
Largest truncation error: 9.42967e-06
Energy after sweep 6 is -2.003774164313
Sweep 6 CPU time = 5h, 44m, 4s (Wall time = 22m, 6s)

vN Entropy at center bond b=31 = 3.273794426370
Eigs at center bond b=31: 0.1523 0.1090 0.0917 0.0892 0.0524 0.0513 0.0494 0.0318 0.0305 0.0293 
Largest m during sweep 7 was 1200
Largest truncation error: 6.29943e-06
Energy after sweep 7 is -2.024859515379
Sweep 7 CPU time = 11h, 34m, 3s (Wall time = 44m, 28s)

vN Entropy at center bond b=31 = 3.326094274331
Eigs at center bond b=31: 0.1442 0.0967 0.0943 0.0924 0.0517 0.0515 0.0513 0.0307 0.0306 0.0305 
Largest m during sweep 8 was 2400
Largest truncation error: 7.15778e-07
Energy after sweep 8 is -2.028678884827
Sweep 8 CPU time = 46h, 7m, 7s (Wall time = 2h, 59m, 20s)
terminate called after throwing an instance of 'std::bad_alloc'
what():  std::bad_alloc/var/spool/slurm/slurmd/job00768/slurm_script: line 11: 22084 
Aborted                 ./iqdmrg input

As it shows, the memory ran out in sweep 9 with m=4800
I observed my memory usage as following :

https://imgur.com/a/mKIUH

The memory of final sweep increases linearly to the upper limit of my node.
Is there any suggestion to resolve it?
Thank you.

Victor

commented Jan 26, 2018 by chengshu (680 points)

Hi Victor, have you tried "WriteM"? More details can be found at http://itensor.org/docs.cgi?page=classes/dmrg

commented Jan 28, 2018 by miles (70.2k points)

Thanks for posting the comment Chengshu! Feel free to also post an answer to a question such as this one if you are confident your answer could be helpful, especially since the forum supports multiple answers so I could still also post one too.

commented Jan 29, 2018 by Victor Chang (650 points)

Thank you all so much. That really helps.

2 Answers

miles · Answer 1 · 2018-01-28T13:56:11+0000

answered Jan 28, 2018 by miles (70.2k points)

Hi Victor,
Yes please try setting the "WriteM" parameter as Chengshu suggests. That should help quite a bit with memory, although you may find you can only increase maxm by a few more thousand at most, since the memory usage scales quadratically with m.

But if you have more questions about it or are trying something specialized, please feel free to ask more or comment below.

Best regards,
Miles

commented Jan 29, 2018 by Victor Chang (650 points)

Hi, Miles.

Thanks for your all helpful support. I got one more question about the WriteM.
I looked up the memory usage. It indicated that the use of cache memory. Is the cache memory acquired from disk? What's the difference between cache and swap memory?

Best,
Victor

commented Jan 29, 2018 by miles (70.2k points)

Hi Victor,
Unfortunately I'm not sure I know the answer to this question. Here is a link I found which may perhaps give the right explanation: https://unix.stackexchange.com/questions/263764/what-is-difference-between-cached-memory-and-used-memory

Probably the most authoritative place to look is in the documentation for the system usage program you are using, to see how it defines the quantities it reports.

On some unix systems, there is a simple command called "free" which you can use to see how much ram is free. It gives a straightforward and simple report of it.

Best,
Miles

commented Jan 31, 2018 by Victor Chang (650 points)

Yeah. You are probably right. I will try it.
Thank you.

JonSpalding · Answer 2 · 2018-01-30T10:53:12+0000

answered Jan 30, 2018 by JonSpalding (960 points)

I am studying a 1D Hubbard model and I have run into the same problem.

My answer for improving the memory problems is to use a supercomputer...

Also just a warning: sometimes when the memory runs out, your data might not be output to your datafiles. I have a number of datasets that have missing values as a result of this and I didn't catch the error until I tried analyzing the data. I believe ITensor usually outputs error messages, but sometimes ITensor won't catch the error.

commented Feb 1, 2018 by alesa (240 points)

I'm currently running 1D chain on the supercomputer. For Maxm=3500 the program aborted without any warning or error message. The memory I requested for this program is 15 GB and according to the system report I have used all of the 15 GB memory. I'm pretty sure that the program aborted because the memory runs out. But there's no error message shown.

I also have a question about the memory. When I run the same program in the workstation, the program was assigned about 3-4 threads (cpu usage 350%) and the memory usage is about 6GB. When I run the program in the supercomputer, I requested 1 node 16 processors for the program, and 15GB memory is just not enough for this program. What determines the memory usage? A few extra message: the program I run is idmrg, there's no parallel code for it despite the built in BLAS and LAPACK.

commented Feb 1, 2018 by JonSpalding (960 points)

The supercomputer I am using has an option to allocate additional memory to a job (from multiple cores) and also has the option to use large-memory nodes.

The running of memory due to large m for dmrg and iqdmrg of one dimensional spin system

Please log in or register to add a comment.

Please log in or register to answer this question.

2 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Categories