Nexus Upgrade Adventures
We all want our services to be up to date, security patches applied, all bugs fixed, new features and all the other cool things. Well, life is not perfect and quite often you don’t have enough time for this. Other priorities pop up and you end up postponing the upgrade until it becomes a priority. The same thing happened with a certain Nexus service.
When developing software, you often depend on public libraries like Maven repository. During every build, libraries need to be downloaded from their repositories. You will also need a place to store your own artifacts to be reachable for other developers in your team. Instead of downloading libraries from various repositories, it would be nice to have a central point for all dependencies. This is where Sonatype’s Nexus repository manager comes in handy. It allows you to deploy your components, proxy repositories, store binaries and some other pretty handy stuff.
Now, your build will query Nexus repository for libraries. If Nexus doesn’t have it locally, it will query the configured proxied repository and download it, cache it and serve back to your build. Next time the build needs that library, it will be served from Nexus’ cache. Once built, your app needs to be deployed somewhere and Nexus serves this purpose too. If you need to save some XML files too, again same location.
Deploying snapshots, caching libraries from other repositories and saving files on Nexus requires a lot of storage. Eventually, all these things will eat up your storage and you’ll need to clean it up.
One of the features Nexus provides is a cleanup policy. It allows you to define which components should be removed based on certain criteria. Let’s say, all maven snapshots which have not been downloaded in the last 30 days. Creating a cleanup policy is pretty straightforward; set name, format and cleanup criteria:
- Name: snapshots-cleanup
- Format: maven2
- Last downloaded before: 30 days
- Release type: Pre-Release / Snapshot Versions
Cleanup criteria can be component age, component usage, release type or asset name. At least one needs to be set, but all of them can be defined too. Our case was quite simple so only one criteria is set.
This policy needs to be applied to desired hosted maven repositories, in our case maven-snapshots repository. Applying policy will not actually remove anything. This is done by cleanup tasks. By default, the server creates a task named “Cleanup service”, type “Admin – Cleanup repositories using their associated policies” which runs daily at 1AM. This task will execute cleanup of all the repositories which have a policy other than None set. These cleanups perform soft-delete, meaning nothing is really deleted, just marked for deletion. These files can’t be accessed, but storage is still used. To actually reclaim space, the “Admin – Compact blob store” task needs to be created and executed.
We did the same, the snapshots-cleanup policy is applied to maven-snapshots repository. The cleanup service task is executed every night at 1AM and the Compact blob store task one hour later at 2AM. The number of files is reduced and a certain amount of storage is reclaimed. However, it is not what we expected. The Maven-snapshots repository is still taking up too much space.
Turns out, this is a known issue. Cleanup policy is matching files based on the value of their Last downloaded attribute. In our case, if a file was downloaded 31 days ago or more it will be removed because 31 > 30. Makes sense. Yet, what if a file was never downloaded? Its Last downloaded attribute is “has not been downloaded” and for Nexus, has not been downloaded is not greater than 30. We seem to have A LOT of those.
Nexus fixed this in version 3.20.0 so upgrading Nexus finally became a priority.
Time for an upgrade
Upgrading through multiple versions is always going to be a complex process. Ideally, we would go through release notes, check what’s changed and upgrade version by version. However, we don’t have that much time so we’ll try a direct upgrade.
Two rules to keep in mind:
- ALWAYS create a backup;
- NEVER test it in production.
The first step is always to backup everything! Seriously, Nexus upgrades modify the database and it can’t be downgraded.
Our Nexus instance is deployed in a docker container, so upgrading would require a simple change of the docker image version. Easy, right?
We also want to implement full Configuration as Code, which requires version 3.17.0 at least. So to test our CaC, we upgraded to 3.17 and it went surprisingly smooth. Next step, direct upgrade to 3.27.0.
This process can take a bit longer depending on the size of the database. After 10-15 minutes, the upgrade is completed. No errors were found in logs, web UI looks nicer, except for one thing. Most of our repositories are gone! This is where you can start panicking. Especially if you didn’t follow the two rules set out in the beginning.
What could’ve gone wrong? Not all repositories are gone. Only a couple of random repositories are there, some hosted, some proxied. Seemingly without any pattern. Checking up total storage used, it’s roughly the same as before upgrade so the data is not really lost. Going through the logs doesn’t show anything that could relate to missing repositories. Rebuilding the search index doesn’t make sense since it’s used only for proxied repositories and we’ve lost hosted repositories too. At this point, two options are available.
- Go through release notes for every release and hope to figure out what’s broken, then fix it.
- Start from the beginning, upgrade one version at a time. Once you face the same situation again you can narrow your investigation scope to that one specific release
Upgrade no. 2
Having a backup means we could go with option two. We went back to the beginning, restored our backup and started upgrading version by version. Luckily, the same thing happened at version 3.18.0. Checking it’s release notes, one thing came up:
“The table that drives the repository browse tree will be rebuilt after upgrading to version 3.18.0. This will make the UI appear empty for a short period of time.”
Makes sense, our repositories were not really gone, we just couldn’t see them in the browser. So the only thing we need to do is wait for a short period of time. After realising that 30 minutes is probably not considered “a short period of time”, we started checking our other options. Turns out, we can do this manually too. Nexus added a Repair: Rebuild repository browse task which you can trigger manually. Running this task brought back our repositories in a few minutes.
Let’s try another direct upgrade to version 3.27.0. Same thing happens again, only this time we know the cause and the solution. Rebuilding repository browse solved our problem once more. Everything looks good and it’s time to try the cleanup policy. It took a bit longer than usual, but after cleaning everything, we recovered around 70GB of storage.
This “fixed” cleanup policy will now check the Last downloaded attribute and, if it’s “Has not been downloaded”, the task will check the Uploaded date property. If the date matches the criteria, the file gets removed.
Direct Nexus upgrade is safe, but as always, create a backup. It will save you from a lot of stress if something goes wrong. Browse repository improvements Nexus implemented in v3.18.0 will make your repository appear empty, it’s not and can be easily fixed using Repair: Rebuild repository browse task. And finally, the cleanup fix they added in v3.20.0 will save you Gigabytes of data.
DISCLAIMER: While a direct upgrade from 3.17.0 to 3.27.0 was safe in our case, it doesn’t necessarily mean it is safe in every case or for any other version.