For instance, the entanglement between two sites in the middle of the chain and the rest of the chain.

0 votes

Hi Jon,

For a small number of sites (like two) in the middle of the chain, you can compute the reduced density matrix like this:

Make sure the gauge center is at one of the sites (call .position(j) where j is the first site, say).

Make the "bond tensor" B by doing

`auto B = psi.A(j) * psi.A(j+1);`

Make the reduced density matrix rho by doing

`auto rho = prime(B,Site) * dag(B);`

Use the diagHermitian function to diagonalize rho:

`auto spec = diagHermitian(rho,U,D);`

You'll need to declare U and D first (they can just be default initialized). For more info on diagHermitian see: http://itensor.org/docs.cgi?page=classes/decompThe diagHermitian function returns a

`Spectrum`

object which you can use to get the density matrix eigenvalues conveniently and compute the entropy just like in this code formula:

http://itensor.org/docs.cgi?page=formulas/entanglement_mps

Hi Miles,

I may do it like this,

psi.position(j);

auto wf = psi.A(j-1)*psi.A(j)*psi.A(j+1)*psi.A(j+2);

auto U = psi.A(j)*psi.A(j+1);

ITensor S,V;

auto spectrum = svd(wf,U,S,V);

I calculated the entanglement entropy between two sites in the middle of a chain with 8 sites and the rest of the chain by using both this method and your method, and the results are the same. So is this method correct generally? Thanks.

Jin

I may do it like this,

psi.position(j);

auto wf = psi.A(j-1)*psi.A(j)*psi.A(j+1)*psi.A(j+2);

auto U = psi.A(j)*psi.A(j+1);

ITensor S,V;

auto spectrum = svd(wf,U,S,V);

I calculated the entanglement entropy between two sites in the middle of a chain with 8 sites and the rest of the chain by using both this method and your method, and the results are the same. So is this method correct generally? Thanks.

Jin

Hi Jin and Miles,

I did not realize that the U matrix could be used this way -- that is, does svd factor U out of wf, above? Or is U just providing the dimensions for the left matrix? My understanding was that setting U to some matrix only provides the dimensions.

Also Jin, does your approach require the same amount of time to compute?

Jon

I did not realize that the U matrix could be used this way -- that is, does svd factor U out of wf, above? Or is U just providing the dimensions for the left matrix? My understanding was that setting U to some matrix only provides the dimensions.

Also Jin, does your approach require the same amount of time to compute?

Jon

Jin: yes I think your way of doing it should work too, as long as the "position" (gauge center) is on one of the sites j-1, j, j+1, or j+2 with your code. If the gauge center is at j or j+1 then I think you don't even need to include the j-1 and j+2 site tensors.

Jon: in the diagHermitian method versus the svd method generally, there's no relationship necessarily between the U you would get from one or the other. It depends on which objects are being diagonalized or svd'd. In the specific context of my code and Jin's code, U does turn out to be the same (after each code has run). But in the case of the svd, the svd routine looks at U to decide which indices to treat as "row" versus "column" indices for the purposes of the matrix factorization. For diagHermitian it's different: U can be default initialized, and it is ignored by the routine and overwritten with the U matrix which diagonalizes the first argument. The docs on svd and diagHermitian discuss what each one expects and what it does to its arguments, so you may find that helpful.

Jon: in the diagHermitian method versus the svd method generally, there's no relationship necessarily between the U you would get from one or the other. It depends on which objects are being diagonalized or svd'd. In the specific context of my code and Jin's code, U does turn out to be the same (after each code has run). But in the case of the svd, the svd routine looks at U to decide which indices to treat as "row" versus "column" indices for the purposes of the matrix factorization. For diagHermitian it's different: U can be default initialized, and it is ignored by the routine and overwritten with the U matrix which diagonalizes the first argument. The docs on svd and diagHermitian discuss what each one expects and what it does to its arguments, so you may find that helpful.

Hi Miles, how can I do svd without the j-1 and j+2 site tensors?

Hi Jon, I guess it will take more time to get the product of 4 tensors and that of 2 tensors. This is very time consuming when there are many more tensors.

So I have a new question. For instance, when calculating the entanglement entropy between even sites and odd sites, it seems I need to take the product of all tensors and trace out even (odd) site index. It's extremely slow even for systems with 10+ sites. How can I get the results efficiently? Thanks.

Hi Jon, I guess it will take more time to get the product of 4 tensors and that of 2 tensors. This is very time consuming when there are many more tensors.

So I have a new question. For instance, when calculating the entanglement entropy between even sites and odd sites, it seems I need to take the product of all tensors and trace out even (odd) site index. It's extremely slow even for systems with 10+ sites. How can I get the results efficiently? Thanks.

Jin, could you do this? (pseudocode)

psi_1 = prime(psi, site index at odd sites only)

rho = dagger(psi_1)*psi automatically contracts out even sites

auto spec = diagHermitian(rho,U,D)

psi_1 = prime(psi, site index at odd sites only)

rho = dagger(psi_1)*psi automatically contracts out even sites

auto spec = diagHermitian(rho,U,D)

Hi Jin,

You can do the SVD without the j-1 and j+2 tensors by changing U. The svd routine looks at the indices of U to determine which indices should end up on U, and the rest go onto V. So if you make wf equal to psi.A(j)*psi.A(j+1) then U needs to have just the site indices on sites j and j+1, and not the link indices coming out of psi.A(j) and psi.A(j+1). The documentation on the svd routine explains in more detail how it works. Also Jon's suggestion would be another good way to go, and probably easier to code.

About entanglement entropy between even and odd sites: most bipartitions of a system lead to very difficult entanglement calculations that MPS cannot immediately help with. The only efficient way I know to do such calculations are to (1) use swap gates to change the order of the sites which could work if the order is not too different from the original site order and (2) use Monte Carlo sampling of the second Renyi entropy. This latter way (2) can be done by writing out the tensor network for the second Renyi entropy then sampling over part of this sum in a Monte Carlo fashion, and doing tensor contractions for the rest.

Miles

You can do the SVD without the j-1 and j+2 tensors by changing U. The svd routine looks at the indices of U to determine which indices should end up on U, and the rest go onto V. So if you make wf equal to psi.A(j)*psi.A(j+1) then U needs to have just the site indices on sites j and j+1, and not the link indices coming out of psi.A(j) and psi.A(j+1). The documentation on the svd routine explains in more detail how it works. Also Jon's suggestion would be another good way to go, and probably easier to code.

About entanglement entropy between even and odd sites: most bipartitions of a system lead to very difficult entanglement calculations that MPS cannot immediately help with. The only efficient way I know to do such calculations are to (1) use swap gates to change the order of the sites which could work if the order is not too different from the original site order and (2) use Monte Carlo sampling of the second Renyi entropy. This latter way (2) can be done by writing out the tensor network for the second Renyi entropy then sampling over part of this sum in a Monte Carlo fashion, and doing tensor contractions for the rest.

Miles

Hi Jon, are you asking why your approach of making the density matrix wouldn't work for getting the entanglement of even sites relative to odd sites? I may not have understood your question. But if so, the resulting density matrix would have N/2 indices so would be exponentially costly to compute and store even before calling diagHermitian on it.

Yes, that was my question. It seems then that using a small lattice size N would enable the computation of this even-odd site entropy; one could increase the size until the computer limits are reached?

Are there similar limitations for constructing density matrices from contiguous blocks of sites that do not include the lattice endpoints?

Is there any way of evaluating this entropy using only the SVD for the two ends of the block rather than the entire density matrix of the block? I can try to think about this on my own as well.

Are there similar limitations for constructing density matrices from contiguous blocks of sites that do not include the lattice endpoints?

Is there any way of evaluating this entropy using only the SVD for the two ends of the block rather than the entire density matrix of the block? I can try to think about this on my own as well.

Hi Jon,

Basically yes to all your questions, in the sense of most of those things would work up to a point then the costs would go through the roof. To evaluate the costs, write out the tensor diagrams involved and do the usual scaling analysis where the cost is the product of uncontracted and contracted indices (contracted index pairs only count once).

For the idea of two ends of a block, you could work with the MPS link indices instead, but it would scale at least as m^4 which gets quite tough for m > 100 or so.

Basically yes to all your questions, in the sense of most of those things would work up to a point then the costs would go through the roof. To evaluate the costs, write out the tensor diagrams involved and do the usual scaling analysis where the cost is the product of uncontracted and contracted indices (contracted index pairs only count once).

For the idea of two ends of a block, you could work with the MPS link indices instead, but it would scale at least as m^4 which gets quite tough for m > 100 or so.

Hi Miles,

I'd like to do some rough scaling analysis on computing center-block entropies, and was not quite sure where to start for evaluating the time it takes to compute this. The aim of this, for my research, is to compare the computational time needed for a simple two-system entropy (cutting the system in the middle) and a three-system entropy (a central block cut from the middle of the lattice).

It seems that fundamentally the difference between the two is in computing the diagonal form of a matrix; in the case of a single cut, the diagonal matrix is m x m, computed using SVD function.

On the other hand, the center block (two cuts) requires exact diagonalization of a density matrix; this density matrix is roughly m^2 by m^2 in dimension, or, of size that is squared of the size of the single-cut matrix problem.

So, if it takes time T to compute a single-cut entropy, I can expect a two-cut entropy to take roughly T^2 ?

Sorry if this is an easy or redundant question.

I'd like to do some rough scaling analysis on computing center-block entropies, and was not quite sure where to start for evaluating the time it takes to compute this. The aim of this, for my research, is to compare the computational time needed for a simple two-system entropy (cutting the system in the middle) and a three-system entropy (a central block cut from the middle of the lattice).

It seems that fundamentally the difference between the two is in computing the diagonal form of a matrix; in the case of a single cut, the diagonal matrix is m x m, computed using SVD function.

On the other hand, the center block (two cuts) requires exact diagonalization of a density matrix; this density matrix is roughly m^2 by m^2 in dimension, or, of size that is squared of the size of the single-cut matrix problem.

So, if it takes time T to compute a single-cut entropy, I can expect a two-cut entropy to take roughly T^2 ?

Sorry if this is an easy or redundant question.

Hi Jon, I think it's a bit different in terms of scaling from some of the things you said (if I'm understanding correctly).

What you said about diagonalizing the left/right cut density matrix is correct, assuming the wavefunction is already in the form of an MPS with bond dimension m. Then one can trace all of the sites in the right half, say, and use the MPS tensors to express the reduced density matrix for the left half completely in terms of the virtual indices connecting left to right, so indeed an m x m matrix which would have a cost of m^3 to diagonalize.

But based on how I would think about the other case, that of some 3 site block in the center, is that its reduced density matrix would be a d^3 x d^3 matrix, where d is the local site dimension, since one would trace all sites except the central 3. Then it would scale as (d^3)^3 or d^9 to diagonalize.

Oh but now I see what you're thinking maybe, which is to get the density matrix for the *complement* of the 3-site region by just tracing over those three sites. In that case, yes, the density matrix computing that way from an MPS would have dimensions m^2 x m^2 so would cost (m^2)^3 = m^6 to diagonalize.

So for three sites in the middle, it's much more likely that the cheapest way to do things is to do the first option of tracing all but the three sites, rather than the other way around. But for more sites eventually switching to doing things the other way would be cheaper. Either way it would be an expensive operation.

What you said about diagonalizing the left/right cut density matrix is correct, assuming the wavefunction is already in the form of an MPS with bond dimension m. Then one can trace all of the sites in the right half, say, and use the MPS tensors to express the reduced density matrix for the left half completely in terms of the virtual indices connecting left to right, so indeed an m x m matrix which would have a cost of m^3 to diagonalize.

But based on how I would think about the other case, that of some 3 site block in the center, is that its reduced density matrix would be a d^3 x d^3 matrix, where d is the local site dimension, since one would trace all sites except the central 3. Then it would scale as (d^3)^3 or d^9 to diagonalize.

Oh but now I see what you're thinking maybe, which is to get the density matrix for the *complement* of the 3-site region by just tracing over those three sites. In that case, yes, the density matrix computing that way from an MPS would have dimensions m^2 x m^2 so would cost (m^2)^3 = m^6 to diagonalize.

So for three sites in the middle, it's much more likely that the cheapest way to do things is to do the first option of tracing all but the three sites, rather than the other way around. But for more sites eventually switching to doing things the other way would be cheaper. Either way it would be an expensive operation.

Ok, so basically, I can compute the entropy for a sub-block in the middle of the lattice that has N/4 sites, where N is the maximum number of sites I could completely solve by exact diagonalization.

For the Hubbard model, 12 sites is pretty big, so I would expect to max out with a 3-site block.

That's pretty depressing considering I need more than 5 sites to do anything useful. Looks like monte-carlo sampling is the only possibility. Not sure if it is worth the effort.

For the Hubbard model, 12 sites is pretty big, so I would expect to max out with a 3-site block.

That's pretty depressing considering I need more than 5 sites to do anything useful. Looks like monte-carlo sampling is the only possibility. Not sure if it is worth the effort.

Hi Jon, there's another idea that occurred to me recently that you could think about as an alternative to Monte Carlo sampling (which I agree is probably the only established method for computing such an entanglement efficiently, albeit very slowly).

The idea is to take some region "A" whose entanglement you want to compute. Say A consists of 16 sites in the center. Then trace all of the sites of the density matrix down to the first *two*. Diagonalize that two-site density matrix. Next trace all but the second two (so if the first two were sites n,n+1, now make the denmat for n+2,n+3) and diagonalize this density matrix. Now having done this for all site pairs in region A, use the truncated unitaries / isometries from the diagonalizations to coarse grain the MPS in this region, merging site pairs into single larger sites. Continue this process iteratively, building up a tree tensor network. Finally when the region has just shrunk to two super-sites, diagonalizing this last density matrix gives you the leading part of the eigenvalue spectrum to some controlled accuracy. You won't get the entire spectrum, but you may get most of the important eigenvalues this way.

This idea is inspired by my most recent paper, where I did something similar but in a machine learning context. Here is the link: https://arxiv.org/abs/1801.00315

Fig. 10 is the summary of the whole idea, but where in your case rho would start out as a reduced density matrix, rather than the full density matrix of the whole system.

The idea is to take some region "A" whose entanglement you want to compute. Say A consists of 16 sites in the center. Then trace all of the sites of the density matrix down to the first *two*. Diagonalize that two-site density matrix. Next trace all but the second two (so if the first two were sites n,n+1, now make the denmat for n+2,n+3) and diagonalize this density matrix. Now having done this for all site pairs in region A, use the truncated unitaries / isometries from the diagonalizations to coarse grain the MPS in this region, merging site pairs into single larger sites. Continue this process iteratively, building up a tree tensor network. Finally when the region has just shrunk to two super-sites, diagonalizing this last density matrix gives you the leading part of the eigenvalue spectrum to some controlled accuracy. You won't get the entire spectrum, but you may get most of the important eigenvalues this way.

This idea is inspired by my most recent paper, where I did something similar but in a machine learning context. Here is the link: https://arxiv.org/abs/1801.00315

Fig. 10 is the summary of the whole idea, but where in your case rho would start out as a reduced density matrix, rather than the full density matrix of the whole system.

Ok that is an interesting idea. Would it be easier to just use a MERA from the very beginning?

I am considering using a disorder MERA for another project, so maybe I should just go ahead with it? I think MERA's are inherently better suited to computing entanglement for middle intervals, right?

I am considering using a disorder MERA for another project, so maybe I should just go ahead with it? I think MERA's are inherently better suited to computing entanglement for middle intervals, right?

Hi Jon, a MERA is strictly harder to use than in a tree in every case, since a tree is a special case of MERA. So I would not recommend trying a MERA until you try a tree first. Then I'd only try MERA if the tree isn't giving you satisfactory results.

Miles

Miles

Ok thanks, I'll look into this.

How hard is a MERA to implement in ITensor?

When/if I start a MERA I'll probably start a new thread on the topic.

How hard is a MERA to implement in ITensor?

When/if I start a MERA I'll probably start a new thread on the topic.

I think MERA is pretty straightforward to implement in ITensor. But it's a question of which MERA algorithm... MERA as a technique is pretty underdeveloped I would say. I don't think there's a well defined concept of implementing MERA per se. But if I was going to "grow" a MERA layer by layer I think a good resource would be Glen Evenbly's recent paper about "implicitly disentangled MERA".

I was going to start with Glen's summary-style paper, "Algorithms for MERA" but since it is from 2009 I will definitely review the more recent one. Also I am planning to use his paper on disordered MERAs from october 2017.

Again, though. I would *strongly* suggest you don't try anything related to MERA until you have done the initial algorithm with a tree tensor network instead.

Cheers, Miles

Cheers, Miles

Ok, maybe it is then time to create a new thread. Something like "How to start using MERA",

1. Tree tensor network https://arxiv.org/abs/1801.00315

2. Evenbly paper 2017

Maybe I can write this up as a tutorial instead, "Introduction to MERA in ITensor"

1. Tree tensor network https://arxiv.org/abs/1801.00315

2. Evenbly paper 2017

Maybe I can write this up as a tutorial instead, "Introduction to MERA in ITensor"

I won't want to start this until I am definitely using it actively in my research, so it might be a while, current project is taking its time

Miles, do you think there is a publication waiting for someone that carefully compares these different methods of computing block entanglements? I'd really like to know which one to choose but I don't want to perform a full study unless I can publish it.

Hi Jon,

As far as I know there is not a publication on this subject. But the main way I would know would be by doing a careful search via Google Scholar on related papers and keywords. So a paper on the subject could have appeared since the last time I was thinking about it a few years ago and I may have not noticed. But I think there is very little in the literature about efficient methods for getting the entanglement entropy of a "central" region meaning a region "A" where the full system is BAB and B is the complement of A.

A few people I know told me that the sampling approach was clear to them, and I had gone as far to write a draft about it, but never published as it got pretty tedious to do the sampling really well.

So that's why I think a non-sampling approach based on a tree tensor network would be nice assuming it works well.

- Miles

As far as I know there is not a publication on this subject. But the main way I would know would be by doing a careful search via Google Scholar on related papers and keywords. So a paper on the subject could have appeared since the last time I was thinking about it a few years ago and I may have not noticed. But I think there is very little in the literature about efficient methods for getting the entanglement entropy of a "central" region meaning a region "A" where the full system is BAB and B is the complement of A.

A few people I know told me that the sampling approach was clear to them, and I had gone as far to write a draft about it, but never published as it got pretty tedious to do the sampling really well.

So that's why I think a non-sampling approach based on a tree tensor network would be nice assuming it works well.

- Miles

Hi Miles,

I re-read your machine learning article on a Tree Tensor Network application, and I just want to make sure I know how to implement the steps.

At the first step of the procedure, we produce a 2-site reduced density matrix.

Question: As usual, do we move the orthogonality center into one of these sites before making the reduced density matrix?

We then move to a new pair of sites, we move the orthogonality center before generating isometries for each pair of sites.

Question: Is this correct?

After generating all the isometries, we contract them with the original MPS to produce a lower-rank MPS. Where is the orthogonality center during this stage -- is it moved to each MPS pair before contraction?

My misunderstanding seems rooted in misunderstanding the DMRG procedure (and the role of the orthogonality center) so I'll review that, however, I probably still need your help to understand it completely. I noticed that your machine learning paper doesn't discuss the orthogonality center at all, so either the paper is too short to include all of the details, or it isn't necessary in that application -- correct?

I re-read your machine learning article on a Tree Tensor Network application, and I just want to make sure I know how to implement the steps.

At the first step of the procedure, we produce a 2-site reduced density matrix.

Question: As usual, do we move the orthogonality center into one of these sites before making the reduced density matrix?

We then move to a new pair of sites, we move the orthogonality center before generating isometries for each pair of sites.

Question: Is this correct?

After generating all the isometries, we contract them with the original MPS to produce a lower-rank MPS. Where is the orthogonality center during this stage -- is it moved to each MPS pair before contraction?

My misunderstanding seems rooted in misunderstanding the DMRG procedure (and the role of the orthogonality center) so I'll review that, however, I probably still need your help to understand it completely. I noticed that your machine learning paper doesn't discuss the orthogonality center at all, so either the paper is too short to include all of the details, or it isn't necessary in that application -- correct?

Hi Jon, the answer to your question is that the MPS I am using to compute reduced density matrices in the machine learning context are all product states (bond dimension 1). So every site is automatically an orthogonality center and there is no need in that case to shift it. But you're on the right track, in the sense that if one was coarse graining an MPS (or a sum of MPS like in the machine learning paper) then one might want to place the orthogonality center on one of the two sites being optimized. The reason I say "might" is if one wants to "shortcut" the process of computing the full reduced density matrix. But this is an efficient thing to compute so in principle one can just compute it and not worry about the orthogonality. Similarly when coarse graining with a tree the orthogonality isn't an important consideration (it can be a different story for a MERA though).

In a nutshell the main situation where the ortho center is important is if you are making some change to the MPS and need to do an SVD with a truncation. If the thing you're doing doesn't involve an SVD (such as the coarse graining step) then most likely you don't need to think about the ortho center.

But it's hard to give general "always" or "never" advice. Each situation is a bit different and in the end you just need to really understand the arguments or principles behind each step.

In a nutshell the main situation where the ortho center is important is if you are making some change to the MPS and need to do an SVD with a truncation. If the thing you're doing doesn't involve an SVD (such as the coarse graining step) then most likely you don't need to think about the ortho center.

But it's hard to give general "always" or "never" advice. Each situation is a bit different and in the end you just need to really understand the arguments or principles behind each step.

Ok, so for my application in which I want to compute entropy for a region of a wavefunction cut from a middle (two-boundary) portion of the 1-D MPS, I can move the ortho center to the first site inside the region of interest (as was suggested at the start of this thread for a two-site portion of the MPS), and then build up a set of isometries for pairs of sites. The isometries won't require that I move the ortho center, so long as the ortho center is left within the interval of interest?

Thanks again.

Thanks again.

...