Highlighted
Absent Member.
Absent Member.
3318 views

DST Doesn't Migrate Files Correctly

I have a six node cluster with 16 pools running amongst those nodes. 15 of the 16 have DST pools hanging off them. We have DST migration policies shut off (inactive) all the time, only activating them to run twice a year (December and July). That way, we can back up the DST pools immediately after that, and not have to back them up again until the next time we run DST.

The problem is (and this is our first time running the policies as described above), DST isn't getting all the candidate files. I've duplicated this on two different pools, that are running on two different nodes in the cluster.

Say for example I want everything older than 6 months to go to the DST pool. I'll set the policy that way, and then execute it. It will run, and move a few files from primary storage to the DST pool. Sometimes it will move one, sometimes it will move hundreds. But the point is, it should move ALL of them that are candidates (not modified for more than 6 months) all in one fell swoop. I've ran it at least two dozen times now on each of these pools (without changing anything about the policies), and each time it moves a few more files.

It's not that a few files at a time are suddenly reaching the 6 month mark - I can use 'find' and see that there are hundreds of candidate files. I can also run volume inventory against the primary pool in NRM and see that there are hundreds of GB of files that should be moving to DST. (NRM shows that there are about 450GB of files that are between 6 months and one year old. All of those *should* move when I 'execute now' the policy on the server. But only a few at a time move. I do not have any other restrictions (max size, etc) set in the policy that would cause some of the files to be skipped.

Has anyone else come across this? I'm running oes2sp1, and edirectory 8.84 (if that matters...I don't see how it could though). I do not have Salvage enabled. I do have compression enabled. There is currently no AV running on the servers. The backup product we're using (NetBackup) does not modify the files when it backs them up.

Much oblige,
Sam

Labels (2)
0 Likes
11 Replies
Highlighted
Knowledge Partner
Knowledge Partner

Ah, NetBackup. There's your issue I believe. We also use Netbackup (it stinks).

I believe the issue is that NetBackup requires that you use this parameter for NSS and this is what messes up your DST (depending upon your setup):

/CtimeIsMetadataModTime

Caveats when using Symantec NetBackup 6.5.x with Novell OES2 SP1 Linux | Novell User Communities

Scroll down to like the 5th comment:

Adding the NSS flags in step #2 allows for the full protection of the NSS filesystem, including directory quotas, user quotas, extended attributes, etc. The only side-effect that I've seen is that the "/CtimeIsMetadataModTime" setting doesn't allow me to see the file create timestamp from the command-line anymore. This would change the behavior of Dynamic Storage Technology (DST) to look at ctime as essentially the last time the file was backed up (archive bit reset) instead of the create time of the file. If you want to use DST and desire the use of create time in your DST move job, you're going to have a conflict.
0 Likes
Highlighted
Absent Member.
Absent Member.

kjhurni;1908552 wrote:
Ah, NetBackup. There's your issue I believe. We also use Netbackup (it stinks).

I believe the issue is that NetBackup requires that you use this parameter for NSS and this is what messes up your DST (depending upon your setup):

/CtimeIsMetadataModTime

Caveats when using Symantec NetBackup 6.5.x with Novell OES2 SP1 Linux | Novell User Communities

Scroll down to like the 5th comment:

Adding the NSS flags in step #2 allows for the full protection of the NSS filesystem, including directory quotas, user quotas, extended attributes, etc. The only side-effect that I've seen is that the "/CtimeIsMetadataModTime" setting doesn't allow me to see the file create timestamp from the command-line anymore. This would change the behavior of Dynamic Storage Technology (DST) to look at ctime as essentially the last time the file was backed up (archive bit reset) instead of the create time of the file. If you want to use DST and desire the use of create time in your DST move job, you're going to have a conflict.




Hi kjhurni, thanks for replying.

Two things: 1) While I'm not defending it, Netbackup is the #1 backup software on the planet. It works great for 100% of the remainder of our environment. It's a pain on oes-linux however, as it doesn't honor TSA. This causes problems in that in order for files to be backed up, they will all be decompressed. We came up with a workaround for this (flagging the entire volume +ic before backing it up, then removing that flag after the backup completes).

<aside>
It's really horrible that Novell pushed (forced) us all to another platform with no good way to back it up. I realize there are a couple of niche products out there that will work, however a corporation shouldn't be expected to have to alter their entire backup strategy just for Novell. This was poor, pool planning on the part of Novell. The right thing to do would be to extend support for NetWare into 2020 (Like Jack Messman promised at Brainshare) so that Bangalore has time to get the code right for oes-L. This backup problem is but one of a whole slew of problems on OES-L that did not exist on NetWare (ie. we cannot load up an OES-L server with replicas because it shuts down our tree because edirectory on Linux doesn't work right. The same number of replicas on NetWare work fine).
</aside>


2) I don't think I care about creation time when doing DST comparisons. I care about last modified time. That seems to work ok (see note 1) if I don't use the "Execute now" option in the DST policy. In other words, if I schedule it to run hourly, and let it kick off when it gets to a given time, DST works like it should. (see note 2)

Note 1: I have to use the "hourly" option to get this to work. If I use the "One time" option, it doesn't start. but if I change it to "hourly", it will start, and appears to work ok.

except....

Note 2: There doesn't appear to be a way to stop a rouge DST job from running. For example, if a job kicks off and you realize that it's going to fill up the DST pool, there's no clean way to "kill" the job. The only way I've found is to kill httpstkd, then offline the pool, and bring it back online. It would be great if there was a "stop" button so that I didn't have to impact a thousand users just to stop a process.

As a workaround for this (notice a common trend of required workarounds with OES-L?) what I do is prior to starting the DST job, I make a temporary directory off the root of the nss DST pool, and populate it with about 10 GB of data. Then if the pool fills up, I get the job to stop (if it isn't smart enough to stop on its own once the DST pool is out of space), then I delete the 10gb of temp data I had out there. I then have 10GB of free space.

Why is this important? Because another issue I ran into - one of our departments put an executable program out on one of these pools. The exe file got migrated to the DST pool. Then a DST job filled up the pool. Once the pool was full, the .exe would not longer run. The same thing happened on a different pool with a .vbs script.

Point is, letting the DST pool fill up isn't an option. Since there's not a 'safety threshold' setting in the DST policy page, this is the best way I found to work around it.

whew.... getting back to the original problem: DST not moving the complete set of files seems to be a problem only when I use the "Execute now" option. Since making my original post I've found that if I set it to "hourly", that it works correctly - problem is though (among the others I've outlined here) if an hour passes and the job isn't finished, there's not a way to manually stop it.

Sam

0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

Thanks for the info Sam. Yes, I agree, NBU is #1 (Commvault a close 2nd--although Commvault actually supports OES far better than Veritas/Symantec).

I've tried to pressure Novell to get Symantec (formerly Veritas) to get this fixed, but either Novell doesn't have the pull, or Symantec doesn't care (I think the 2nd is the more likely). The last I heard from Syamntec is that they would "work" on this but had no road map or immediate plans to change anything as it sounded like most of their customers are not using OES2.

Novell's position seems to be: Use a diff. backup product.

It's kinda sad, but its' what we're stuck with. Until we either migrate to another OS (Microsoft) or magically find another $200,000 to either switch backup vendors or run TWO backup solutions and then we have issues fighting with the tape robot about which product gets what tapes/drives.
0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

samthendsgod;1908579 wrote:
Hi kjhurni, thanks for replying.

Two things: 1) While I'm not defending it, Netbackup is the #1 backup software on the planet. It works great for 100% of the remainder of our environment. It's a pain on oes-linux however, as it doesn't honor TSA. This causes problems in that in order for files to be backed up, they will all be decompressed. We came up with a workaround for this (flagging the entire volume +ic before backing it up, then removing that flag after the backup completes).

<aside>
It's really horrible that Novell pushed (forced) us all to another platform with no good way to back it up. I realize there are a couple of niche products out there that will work, however a corporation shouldn't be expected to have to alter their entire backup strategy just for Novell. This was poor, pool planning on the part of Novell. The right thing to do would be to extend support for NetWare into 2020 (Like Jack Messman promised at Brainshare) so that Bangalore has time to get the code right for oes-L. This backup problem is but one of a whole slew of problems on OES-L that did not exist on NetWare (ie. we cannot load up an OES-L server with replicas because it shuts down our tree because edirectory on Linux doesn't work right. The same number of replicas on NetWare work fine).
</aside>


2) I don't think I care about creation time when doing DST comparisons. I care about last modified time. That seems to work ok (see note 1) if I don't use the "Execute now" option in the DST policy. In other words, if I schedule it to run hourly, and let it kick off when it gets to a given time, DST works like it should. (see note 2)

Note 1: I have to use the "hourly" option to get this to work. If I use the "One time" option, it doesn't start. but if I change it to "hourly", it will start, and appears to work ok.

except....

Note 2: There doesn't appear to be a way to stop a rouge DST job from running. For example, if a job kicks off and you realize that it's going to fill up the DST pool, there's no clean way to "kill" the job. The only way I've found is to kill httpstkd, then offline the pool, and bring it back online. It would be great if there was a "stop" button so that I didn't have to impact a thousand users just to stop a process.

As a workaround for this (notice a common trend of required workarounds with OES-L?) what I do is prior to starting the DST job, I make a temporary directory off the root of the nss DST pool, and populate it with about 10 GB of data. Then if the pool fills up, I get the job to stop (if it isn't smart enough to stop on its own once the DST pool is out of space), then I delete the 10gb of temp data I had out there. I then have 10GB of free space.

Why is this important? Because another issue I ran into - one of our departments put an executable program out on one of these pools. The exe file got migrated to the DST pool. Then a DST job filled up the pool. Once the pool was full, the .exe would not longer run. The same thing happened on a different pool with a .vbs script.

Point is, letting the DST pool fill up isn't an option. Since there's not a 'safety threshold' setting in the DST policy page, this is the best way I found to work around it.

whew.... getting back to the original problem: DST not moving the complete set of files seems to be a problem only when I use the "Execute now" option. Since making my original post I've found that if I set it to "hourly", that it works correctly - problem is though (among the others I've outlined here) if an hour passes and the job isn't finished, there's not a way to manually stop it.

Sam


Well Sam, I just duped the same thing as you. Doing an Execute Now randomly grabs files to move over to the shadow volume.

I've opened an SR with Novell and hopefully they can get this fixed.
0 Likes
Highlighted
Absent Member.
Absent Member.

On Tue, 15 Jun 2010 14:26:03 +0000, kjhurni wrote:

> samthendsgod;1908579 Wrote:
>> Hi kjhurni, thanks for replying.



Two things:
(1) Doesn't help you right now, but a STOP button for DST policies is
being worked on


(2)
What is the SR number you refer to for the execute now problem ?


Thanks
Hans
0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

SR #10630303011

We'll know tonight if the scheduled job works (it should)

I don't normally do an "execute now" but we forgot to enable it last night, so I tried it today and it basically did the complete opposite of what I expected.

I have ONE EXCLUDE subdirectory

So what did it do?

it moved stuff in that directory and a few other random files elsewhere and that was it.
0 Likes
Highlighted
Absent Member.
Absent Member.

On Tue, 15 Jun 2010 19:46:02 +0000, kjhurni wrote:

> SR #10630303011
>


Thanks Kevin

I'll be monitoring the SR as I'm curious off the results and findings

Regards
Hans
0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

Me too.

I don't know if it's "fixed" in SP2, but since the OP had the same problem and now I see it as well (granted, the OP was doing Modified files, and I'm currently doing "last accessed").

If it DOES work, then it's a major bug, IMO since it copies the complete opposite of what you tell it to do, not to mention random other stuff as well.
0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

The schedule job ran just fine. The Execute Now, does not. I've duped it on 2 servers so far.

Will have to test on SP2
0 Likes
Highlighted
Absent Member.
Absent Member.

On Wed, 16 Jun 2010 15:36:03 +0000, kjhurni wrote:

> The schedule job ran just fine. The Execute Now, does not. I've duped
> it on 2 servers so far.
>
> Will have to test on SP2


I now of an open bug about a DST policy not moving files correctly, but
this report seems to be slightly different from that though.

Thanks for the update
Hans
0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

It SEEMS to work on SP2 with the "Execute Now"

However, on all 4 servers so far ( 3 of them are OES2 SP1, 1 is OES2 SP2 64-bit) even though I have specified files over 1 year, it still seems to always grab the ~dfsinfo.xml or whatever from the volume root -even though it shows up as being like 1 day or something.
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.