RPM Packages Explained

0xbadcafebee · on June 27, 2019

This does not explain RPM packages, title is clickbait. But i'll try my hand at it...

RPM files are a compressed CPIO archive with some magic flags and an embedded key-value store.

When installed or uninstalled, rpms execute various arbitrary stages to manage changes to an operating system before, during, and after install or uninstall. So they introduce not only file changes (including changes to the system's RPM database and its index/lock), but also arbitrary system state changes. Fun!

RPM uses a global package database/index/lock to track what is installed and the dependencies. This can sometimes get corrupt, and then you may have to remove the lock and rebuild it.

(edit: this partly applies to yum) Dependency resolution is crap, because dependency resolution is not dependent on a merkle tree or a transaction log. It's more of a lame recursive DAG that cares more about "what does this system currently have and what can I find in the package repo", versus "what was this thing actually built with/for and does this make sense at all to install". Packages are pretty much never built with a "base operating system" kind of dependency resolution, so it's possible to install a package from a completely different distro/version that was based on the one you are using. If you're lucky you won't be able to install the wrong package because of the recursive dependencies, but not everyone is lucky, and not all packages are built properly. It's also possible to create recursive dependencies so an installed package cannot be uninstalled.

A .spec file defines what and how to build packages. A .srpm contains the source code to the application and the .spec file (handy!). A .rpm file is just one of potentially several packages that can result from building a .spec file, and the source can also be spread along multiple files. It's common for patches to be included in the .srpm and used at build time.

A package repository typically contains both compiled binary packages organized by architecture (.rpm) and source packages (.srpm). If you add or remove a package from a repository, you need to re-generate the metadata files that the repo uses to communicate changes to tools like yum, or yum will have no idea about what you added/removed.

RPM is actually fairly portable. A single .srpm file can build packages for Solaris, Windows, HP-UX, Linux, FreeBSD, etc. It can be a very good compliment to whatever the native packaging is, as long as you keep all packaged files in a unique file tree (like /opt/my_pkgs/).

guggle · on June 28, 2019

Is it any better/different than deb packages ? (I'm NOT trying to start a flame war or Debian vs. Fedora troll, just asking out of pure curiousity).

0xbadcafebee · on June 28, 2019

It's definitely different, but both formats can do basically the same things. Personally I think a .spec and a .srpm are more succinct.

Where they really differ is the ecosystem. Debian has better tools for managing packages, and their community actually builds them better. But I've also been through dependency hell with both; if you want a bleeding edge release, it's way easier to skip packaging and manually build+install updated software to a unique directory.

Leace · on June 27, 2019

This article goes into more detail of RPM: https://xyrillian.de/thoughts/posts/argh-pm.html

gcbw2 · on June 27, 2019

Came for the click bait title of the original article, stayed for the awesome comments :)

dang · on June 27, 2019

Should we change the URL to that above?

iamcreasy · on June 27, 2019

Don't. It's a proof that the comment section is awesome!

DyslexicAtheist · on June 27, 2019

cruel :)

DyslexicAtheist · on June 27, 2019

it's pretty crazy considering that RPM has been around since as long as I first installed Linux from floppy back in 1997.

no wonder rpm no longer get's much love. RPM is too old, but sadly not old enough. My hope is in 20 years time @Foone[1] will warm our hearts with a thread on how he used RPM's to install the gnu toolchain on a fedora based IoT toaster from 2000-late.

[1] https://twitter.com/foone

wahern · on June 27, 2019

The current binary Debian package format dates to 1995. RPM doesn't get any love because RedHat never seriously committed to improving it. Maybe that's because it was a poorer design than Debian's. The Debian build system is more layered, which arguably makes it easier to extend and evolve. At some point RPM got embedded Lua, but it seems half-baked and because the build phases are too closely coupled I'm not surprised methods and protocols for extending things using Lua never emerged in practice--there aren't enough degrees of freedom in RPM's build model to make extensions seem more than just a hack.[1] In any event the situation could be better, there's just not enough will to do it.

[1] For example, IIRC Lua extensions in RPM were merely a convoluted way to extend RPMs macro system. You're still limited to simply defining macro variables in RPMs rigid templating and translation stages. Most of the complexity of RPM is in the places you least need it, and all the pain points are in places RPM doesn't permit you much, if any, freedom to tweak or automate.

Conan_Kudo · on June 28, 2019

In the times I've engaged with the rpm.org project, I've heard of ideas being floated about making the spec format better and even making a better package archive format.

But there's nothing concrete proposed by anyone for the rpm developers (or indeed anyone else) to consider for implementation.

We're probably at the point now where it's a good idea to start looking at it, but what should be done?

majewsky · on June 27, 2019

Came here to say the same thing. ;)

eeZah7Ux · on June 27, 2019

This should be upvoted.

stirfrykitty · on June 27, 2019

After spending years messing with RPM packages in the sysadmin world, I now breathe a sigh of relief with Arch/pacman. Building/maintaining/installing RPMs is painful compared to Arch or Debian's dpkg.

debiandev · on June 27, 2019

Having worked a lot with RPM, other package managers, containers and proprietary build systems... I would choose debs every time.

noselasd · on June 27, 2019

I have the exact oposite experience. Building/packaging/maintainig my own software with RPMs is pretty straight foreward, whilst packaging the same software as .deb is a rollercoaster which often goes off track and down a deep hole.

wahern · on June 27, 2019

RPM makes the simple easy and the difficult impossible. RPM's grace and sin is its simplistic, rigid, singular spec file.

The startup costs to Debian packaging are greater. In the most basic case not that much greater--a single control file with intuitive fields, and a rules file (i.e. Makefile) with intuitive targets. However, to do things according to best practice (i.e. official Debian policy) often requires using additional tools and configuration files. So, yeah, it can become byzantine. But if you realize that it all comes back to generating that control file and rolling up a target directory, same as RPM, it can be easier to make sense of everything.

The important distinction is that all the complexity, required and optional, in Debian packaging can be hidden within the confines of the packaging framework. That is, whatever your build rules and workarounds, anyone trying to build your package can just invoke a standard dpkg build without having to understand any of it. Working around deficiencies in RPM often requires adding and exposing additional layers of complexity, like generating spec files in ad hoc pipelines before a user can invoke rpmbuild.

For example, do you autogenerate your RPM package version from the source? How do you embed it within the spec file? And are you able to also generate a source tarball that can generate the same build? It's one thing to cobble together an RPM that explodes to some binaries; it's another to do so in an automated fashion that's also transparent--i.e. generates SRPMS that can be built within clean environments. I know, I know, CI/CD systems can obviate the need to build binary RPMs from SRPMs, but that's exactly my point. RPM alone is seriously deficient. There are lots of little problems like this that pop up when your needs become more complex; you end up building taller towers of complexity than what would be needed using Debian.

Conan_Kudo · on June 28, 2019

You can build with rpmbuild without making SRPMS (pass -bb instead of -ba), but in my experience, most CI/CD systems do not provide clean, arbitrarily reproducible environments.

That said, rpmbuild has grown features over the years to make it more compatible with that model.

My experience with RPMs, Debian packages, and various other formats has led me to prefer some of the rigidity and explicitness of rpmspec over the sprawl of automagic that is a debian/ tree, and other similar things. But if you want to do weird things to build packages, nothing stops you from doing that and using a trivial spec file as the intermediate to pass instructions to rpmbuild to wrap it up.

The format is not perfect, but I haven't seen anyone make any proposals recently about how to improve it on the rpm-ecosystem mailing list, the rpm-maint mailing list, the rpm project on GitHub, or at any of the developer conferences where rpm developers show up.

You seem to have a ton of knowledge and opinions, why not engage with the project?

stirfrykitty · on June 27, 2019

Having said what I did above, though, can anything compare to SUSE's command to update? "zypper -up"

Best command ever...

DyslexicAtheist · on June 27, 2019

the initial (buggy) versions of zypper replacing Yast was what drove me away from SuSE for good. Never had much love for SuSe but what really killed it for me is when I worked as a consultant (a company that was a SuSE premium reseller) and I was the poor sod trying to integrate their solutions in large companies that didn't have IT/Tech as their core business (hospitals, government etc). I still have flashbacks to their SLOX (suse linux open exchange) and SLES which both were the buggiest distros I ever had shoved down my throat. I felt sorry for my customers. Once the whole IT of a hospital stopped working because of some bugs in their LDAP migration. I need therapy just by reflecting about this and their general way of how they handled root causes back then.

From your comment it seems SuSE must have improved since 2005. I didn't follow them for long time now. It's Arch or Debian for me these days.

JudgeWapner · on June 27, 2019

I still have flashbacks from circa 2000-2002 being in "rpm hell" and trying to install a single package on my system, only to find out it needs libutemptr-3.0.1.rpm, where as my yahoo searches only found a download for libutemptr-3.0.2a.rpm. I try downloading that anyway, wait 20m on my dailup connection, only to find that libutemptr needs libgdkpixbuf-7-9.01.rpm. 50% of the time, one of these dependencies would clash with whatever was already on my system and refuse to install. 49% of the time, it wouldn't be the right version so it wouldn't work anyway. and 1% of the time I could eventually make it work if I just extracted the content and changed my LD_LIBRARY_PATH to satisfy the dependencies manually. eliding the very purpose of rpm.

rpm hell was the Dark Ages of FOSS, and I'm so glad that the community has resolved that problem. However, the bitter taste still lingers and for this reason I will never take rpm files seriously again.

quicksilver03 · on June 27, 2019

Why do you blame rpm as a packaging format, instead of the maintainers of those rpm packages?

JudgeWapner · on June 27, 2019

because a good format/protocol should be easy to use correctly, and difficult to use incorrectly.

jle17 · on June 27, 2019

I'm curious at how you find rpms more painful to build than debs ?

I don't have much experience with them but I was under the impression than just writing a specfile and calling rpmbuild was a step up compared to debian, which seems a mess to me with the makefiles and other stuff all over the place and all those dpkg commands which are wrappers for each other.

stirfrykitty · on June 27, 2019

- Packages with mutually exclusive dependencies are troublesome in the RPM world

- yum/dnf take extraordinary amounts of time compared to apt

- debs are all in one source. Upstream sources are clean, predictable, and easy to troubleshoot.

pkulak · on June 27, 2019

I thought dnf was the new hotness in the package management world. Is apt still superior?

butteroverflow · on June 28, 2019

apt was never superior. The biggest thing in dnf for me is transaction support. Basically, every time you do a package operation, dnf records the changes to its logs, and you can then revert it, downgrading the upgraded packages, installed the removed, and so on.

You can also revert multiple transactions at once, e.g. restore the package versions to what they were three months ago.

There's nothing remotely like this in apt -- if you install a lot of packages on your system, and then want to remove them, the common solution is something like "parse the (text) apt logs, learn manually what's changed, and revert it yourself".

Some other niceties:

* package groups: `dnf install @'c development'` instead of `apt install binutils gcc make bison yacc whatever-else`. Somewhat mitigated by metapackages, but they rarely cover the whole groups of functionality, i.e. they are often more low-level.

* package streams: `dnf install node-8`, or `node-9`, or one of the other supported versions;

* dnf downloads many packages in parallel, and the level of parallelism is configurable. This greatly helps with upgrades from slow mirrors. I believe it can also utilize multiple mirrors at once, though I haven't tried it.

* dnf supports plugins and you can install them with a simple command. For example, there's a plugin to automatically create btrfs snapshots on every package upgrade.

* COPR saved my skin a couple of times, though it's basically the same thing as Ubuntu's PPAs. This is more of a problem for Debian.

pkulak · on June 30, 2019

Interesting, thanks. How do you automatically revert a setup script though? Can you just not have setup scripts with dnf?

wwright · on June 27, 2019

dnf is a generational improvement over yum. It still uses RPMs behind the scenes.

throw7 · on June 27, 2019

The reason I stayed with redhat/rpm so long ago was that rpm kept the native upstream source file untouched and pristine (most often a tarball). It was very easy to see what exactly was being patched and the build process driven by the specfile. This appealed to me as opposed to debian which often would just mash local patches together with the source (I hear debs do have the ability to keep the source "pristine", but for whatever reason dd's didn't usually do that).

somepig · on June 27, 2019

This is patently false. Source packages with Debian aren't even a single file. They consist of a pristine upstream tarball, a tarball of a /debian/ directory, and a dsc file that describes the package, including checksums of the two tarballs.

The /debian/ directory contains all distro specific patches, as well as build rules (a Make file), and package scripts.

It's possible to build a 'debian native' package where the /debian/ directory is bundled up with the orig source, but those are few and far between -- primarily packages consisting of debian-specific scripts/tools.

wahern · on June 27, 2019

All true. Debian and Debian-derived distributions invariably manage their Debian build rules separately from upstream. In that sense there's little difference between RPM and Debian.

However, if as a developer you want to include packaging support inside the project (for your own or your users' convenience), this is infinitely easier to accomplish with Debian than RPM packages. Various aspects of RPM--e.g. spec file template semantics, source and build directory locations--not only assume that packaging is done separately, but make violating those assumptions costly. Whereas Debian package build tools rarely make such assumptions.

If I want to add "deb" and "rpm" targets to my Makefile, the deb target is a one-liner that directly invokes a dpkg tool using standard options. Whereas the rpm target may require generating a specfile (because some specfile variables aren't trivial to dynamically evaluate inline, if at all) and source tarball, or to go through additional contrivances to avoid either of those two things.

Conan_Kudo · on June 28, 2019

`rpmbuild` has had a `--build-in-place` feature for a number of years (since rpm 4.13.0). With that, you don't need a source, or a tarball, and the actions of %prep, %build, and %install are done from the current working directory.

This is similar to what have been able to do to make debs with dpkg-deb.

ggg2 · on June 27, 2019

it doesn't explain anything!

this is a indepth user guide, at most.

isostatic · on June 27, 2019

Thanks, I won't bother

I know rpm is a cpio archive, with the files and some scripts, but the capabilities, when the scripts are run, etc I'm not sure of (I don't encounter rpms much - I'm a deb person, and I'm happy with my "ar -x" to explode my deb and look at the dependencies, conffiles, install/remove scripts, etc to install our custom packages on any rare centos servers we may need)

vetinari · on June 27, 2019

It is a cpio archive with a custom header, so `rpm2cpio package.rpm | cpio ...` is needed to explode it.

To look at scripts/dependencies/etc you don't need to explode it at all, rpm has switches for querying that.

_pmf_ · on June 27, 2019

What's baffling to me is that there's support for signing,, but no support for an encryption layer. When delivering software updates for embedded devices, I need to do both, so I have to roll my own container format. This seems to also be the case with package formats like ipkg/opkg, which are targeted for the embedded use case.

JetSpiegel · on July 3, 2019

Why not have a password-protected repository per client? If you need public dependencies, build them once and copy them to all repositories.

HTTP supports inline username and passwords, use HTTPS to keep the password encrypted on the wire: http://username:pass@server.tld/...

0x006A · on June 27, 2019

whats the use-case for an encryption layer?

scheveningen · on June 27, 2019

Personally I don't feel like the "hiding from the listener" use case discussed in the other response is very critical. What I think _pmf_ is getting at is an "only authorized devices may install my software, or view the RPMs".

You could accomplish this having a keypair on your field/embedded devices, and then having the RPM distribution system pull each devices public key from the keyserver, build the RPM with encryption specifically for this device, and then push it out. Or maybe you choose to have a generic keypair for a class of devices.

This could be used in cases where you have internal secrets in the RPMs you are building, or in the case of things like proprietary software and software licensing. I don't see how this applies to open source OS updates which is what I think the other sub-thread seems to be fixated on for some reason.

Whether this belongs in the RPM system itself or in a wrapper format, I'm not so sure of.

ggg2 · on June 27, 2019

hiding packages you have installed from your ISP/NSA/etc.

this discussion comes up time and time again (in rpm, apt et al). the consensus is: if you need that extra feature, manually download sensitive packages via ssl or something. everyone else (with nothing to hide, heh) keeps benefiting from a global cache of unencrypted transport of (mostly) open source data.

BuildTheRobots · on June 27, 2019

Transport security & confidentiality makes sense (though at first I was trying to work out how an encrypted yum package would work).

Yum with CentOS 6 and above does support SSL for mirror sites and a handful of global mirrors also support it (HEG being one).

I suppose there's a slight race condition (eg how do I update the CA-Certificates bundle when I need the new CA-Certificates bundle to connect to the mirror site to download the update), however I tend to agree there should be some privacy as default.

solatic · on June 27, 2019

As pwnna pointed out, package size gives you away.

The real way to protect against this, if it's genuinely part of your threat model, is to maintain a complete local mirror: you can't tell what is installed and at what versions if you simply download everything.

And if it's actually part of your threat model, then you likely have a large enough install base that you need a local mirror for performance/non-security reasons anyway. So it's really a non-issue.

ses1984 · on June 27, 2019

You can cache things that are encrypted too, or do you think drm protected Netflix videos are all streamed from the origin? Yeah it's a bit more complicated...

forgottenpass · on June 27, 2019

If by "origin" you mean "box Netflix has root on"... yes, I do think that?

rhinoceraptor · on June 27, 2019

Netflix runs a fleet of their own CDN boxes, that they put in ISP data centers.

eeZah7Ux · on June 27, 2019

The combination of IP addresses and package sizes is way too revealing. That's why APT supports Tor as a transport protocol.

pwnna · on June 27, 2019

Does that help? I thought the package size is quite revealing.

isostatic · on June 27, 2019

In some cases (although the server could presumably send some random length data headers if that's a concern), but if you download multiple packages on a single connection can it still be tracked?

vetinari · on June 27, 2019

The sizes of all packages are a known information. So if someone is dedicated enough to track your downloaded packages, figuring out which ones were transferred with a single connection is relatively simple integer programming task.

If you want to really hide what you are installing, make a local mirror of the entire repo and then pick and choose from that.

scheveningen · on June 27, 2019

I thought _pmf_ was describing packages that he authored, and certainly if the contents of them are confidential, they would be in a private repository.

I don't think that the RPMs that I have created in my internal repository and deploy to my field systems are a 'known information' to anyone outside of my organization. If they are, I'm in serious trouble.

I think a more realistic use case for package-level encryption is deploying RPMs that have secrets in them (either keys/creds in configuration or trade secrets in application logic). Ideally of course we should encapsulate these such that they aren't deployed to field/embedded devices but in embedded there certainly may be some use-cases and requirements that those of us used to working in data center and cloud computing aren't immediately thinking of.

_pmf_ · on June 27, 2019

This is for embedded Linux devices with a set of proprietary applications (small volume commercial/industrial control, no consumer device). Mostly non networked, so we have to distribute application updates via USB (in the networked case, we a use VPNs separated by customer groups to deliver updates, so this layer provides the encryption without requiring the update archive to be encrypted).

eeZah7Ux · on June 27, 2019

The combination of IP addresses and package sizes is way too revealing. That's why APT supports Tor as a transport protocol.

mrath · on June 28, 2019

Sometime back a sys admin at my friend's work place asked them to package Java applications as RPM for deployment. I thought that was not a great idea and may be wrong tool for the job. But I will take this opportunity to ask people here if it was an OK suggestion?

ryanolsonx · on June 27, 2019

I found this very enlightening. Thank you!