> It turns out you can create a UDP socket with a protocol flag, which allows you to send the ping rootless
This is wrong, despite the Rust library in question's naming convention. You're not creating a UDP socket. You're creating an IP (AF_INET), datagram socket (SOCK_DGRAM), using protocol ICMP (IPPROTO_ICMP). The issue is that the rust library apparently conflates datagram and UDP, when they're not the same thing.
You can do the same in C, by calling socket(2) with the above arguments. It hinges on Linux allowing rootless pings from the GIDs in
Basically the `socket2` crate lets you convert the fd it produces into a `UdpSocket`. It doesn't verify it really is a UDP socket first; that's up to you. If you do it blindly, you can get something with the wrong name, but it's probably harmless. (At the very least, it doesn't violate memory safety guarantees, which is what Rust code tends to be very strict about.)
It may be memory safe but it's not using the type system to represent the domain very well.
One could imagine a more type-friendly design in which we could write that first line as follows:
let socket: Socket<IPv4, Datagram, IcmpV4> = Socket::new()?;
Now, the specifics of socket types will be statically checked.
Edit: I realized that the issue here is actually the conversion, and that UdpSocket on its own is actually a type-safe representation of a UDP socket, not a general datagram socket. But the fact that this dubiously-safe conversion is possible and even useful suggests that an improved design is possible. For example, a method like UdpSocket's `set_broadcast` can't work with a socket like the above, and from a type safety perspective, it shouldn't be possible to call it on such a socket.
One could, but one probably doesn't want to have separate types for TCP-over-IPv4 vs TCP-over-IPv6 for example, even if they accept/produce different forms of addresses. That'd force a lot of code bloat with monomorphization.
So now one is making one's own enumeration which is different than the OS one and mapping between them, which can get into a mess of all the various protocols Linux and other OSs support, and I'm not sure it's solving a major problem. Opinions vary, but I prefer to use complex types sparingly.
I think there are likely a bunch of other cases where it's useful to choose these values more dynamically too. Networking gets weird!
It's precisely because networking gets weird that a good representation at the type level could be useful. But I agree that it'd need to be done carefully to avoid creating usability issues.
It's unfortunate they did not extend marking the OwnedFd conversions as unsafe due to the focus in the RFC on a single class of unsafety in Fds instead of having a recognition that there are other issues with arbitrary Fd conversions.
No, it's perfectly safe. Except if you expand the scope of "safe" by a lot.
OP turned the socket into an (almost) raw file descriptor, and created an UDP socket from it. Weird, yes, but since it's perfectly memory safe and invalid operations would correctly error, it's not "dubiously-safe". It's safe.
I mean, either your language has the ability to do raw (technically Owned in this case) file descriptors, or it doesn't.
Maybe you'd prefer Rust had a third mode? Safe, `unsafe {}`, and `are_you_sure_you_understand_this {}`, the last one also being 'safe', but just… odd.
It's dubiously safe because it allows invalid combinations, i.e. calling UDP-related methods on non-UDP sockets. I'm using "safe" in the general English sense here, "protected from or not exposed to danger or risk."
> invalid operations would correctly error
At runtime, yes. I'm pointing out that Rust makes it possible to do better, and catch such issues at compile time.
let f = std::fs::File::open("/dev/null").unwrap();
let f: std::os::fd::OwnedFd = f.into();
let socket: std::net::UdpSocket = f.into();
If you convert a high level object into a low level one, and then back up as another type, then what exactly do you expect the language to do about that?
> "protected from or not exposed to danger or risk."
A computer will do what you tell it to do, not what you intend it to do. Opening a file is way more dangerous than risking errors because "this syscall doesn't work on that fd".
There's also always risk that a syscall will fail at runtime, whether the type of fd is correct or not.
It sounds like you would prefer if UdpSocket From<OwnedFD> should run getsockname() or something to confirm it's of the expected type, but I would prefer not. Indeed, in the general case some perfectly coded `unsafe` code could `dup2()` over the fd, so any checking at UdpSocket creation time is moot; you still don't get the safety you are asking for.
I agree with everything you wrote except for this:
> Indeed, in the general case some perfectly coded `unsafe` code could `dup2()` over the fd, so any checking at UdpSocket creation time is moot; you still don't get the safety you are asking for.
If `unsafe` code breaks safe code's soundness guarantees (let's assume for a second an alternate world in which "fd is of the correct type" is a soundness guarantee Rust makes), the bug is in the `unsafe` code.
Sure, I would strongly recommend against doing something like that. But I would expect it to work in the obvious way, and not be undefined behavior.
E.g. if UdpSocket were to dup() internally its fd A into a fd B, and as_fd() returned B, but all actual recv/send is on fd A, then that would cause worse problems than this.
But say an OS has a sockopt that turns an IPv4 UDP socket into a IPv6 UDP socket. Would it be OK for me to call that on UdpSocket's underlying fd? I'd say yes.
Now if I closed the fd for a UdpSocket from underneath it, I would expect that to be basically UB, if not by spec, then in practice impossible to reason about.
As for unsound, sure. I could be convinced of calling the dup2 thing unsound. Not sure unsound is well enough defined, but basically: don't do it.
Is it unsound to create a UdpSocket from a non-UDP file descriptor? Not in a way that can trigger unsafe, no.
There are two issues here, and you're talking about a different one from the one I'm interested in. Your main issue seems to be this:
> If you convert a high level object into a low level one, and then back up as another type, then what exactly do you expect the language to do about that?
One answer to this would be "prevent it entirely". That's probably not practical for a language like Rust today, though, and I don't really care about that.
What I care about is that it's necessary to do this in the first place. The fact that doing this can be useful and necessary in a case like this suggests that it would be possible to design the types involved so that you don't need these low-level and runtime-unsafe conversions to get the job done.
> It sounds like you would prefer if UdpSocket From<OwnedFD> should run getsockname() or something to confirm it's of the expected type
No, I'm saying the types could be designed to prevent the need for doing this in the first place.
> What I care about is that it's necessary to do this in the first place.
I don't think it is. socket2::Socket has send_to() just as much as UdpSocket does.
(disclaimer: I only looked up the docs, I didn't try to modify the code to strip out needless UdpSocket)
> runtime-unsafe
That's not a thing. You always need to check for errors. seccomp could be blocking your syscalls. Hard drives break such that reads return error.
Getting an Err() from a function does not make it "unsafe", runtime or not.
> the types could be designed to prevent the need for doing this in the first place.
If your type system does not allow you to "bring your own fd (to be managed)", then it's not fit for purpose for the kind of problems Rust aims to solve.
A systems language needs to be able to receive a file descriptor from a C library, and work with it.
Yes, it is. See all the work on the subject of "make illegal states unrepresentable". Search for that phrase, it's originally from Yaron Minsky at Jane Street Capital, but it's mainly a pithy characterization of a common goal for languages with strong type systems, like Rust, Haskell, or the ML family. Another way this is expressed is as "static debugging" - the idea that you can debug a significant proportion of a program's bugs statically, using the type system.
That's what I'm referring to here. If it's unfamiliar to you, it will likely take some time to get used to, because it's a significantly different paradigm from the common approach of debugging by running programs, encountering runtime errors, and trying to resolve them. But, instead of getting indignant and objecting to what I'm saying, consider that there might be something for you to learn here.
> You always need to check for errors. seccomp could be blocking your syscalls. Hard drives break such that reads return error.
What do you believe the relevance of this is? You need to check for errors that can't be checked for at compile time. That doesn't mean we should abandon the idea of checking for errors at compile time, or improving the scope of issues that we can detect at compile time.
Errors that are prevented at compile time cannot occur at runtime in principle, and that's an extremely powerful invariant in software development, one that any serious software developer should be aware of, and be able to take advantage of.
Type systems allow you to prove properties of programs that otherwise would need to be debugged and tested at runtime, but to take advantage of that, the types need to be designed appropriately.
> If your type system does not allow you to "bring your own fd (to be managed)", then it's not fit for purpose for the kind of problems Rust aims to solve.
This is a failure of imagination, nothing more. An appropriate type schema for this domain will be able to handle the requirements of the domain.
A language that doesn't let you do safe things, then that's a very different language.
In this case, it would be a language that does not allow creating a UdpSocket object by bringing in your own file descriptor, or it verifies that it's the right type of socket when you do. Which has performance implications without adding any "safe" guarantees.
Say you add this feature, taking the performance hit. Now you need to adjust seccomp policies to allow that. Ok, no biggie. But then I invent UDPv2, and this check fails. The code becomes wrong because of an incorrect assumption about the future.
All without gain. It's not an invalid state, any more than naming a variable "x_squared" but containing x+1 is an invalid state.
You could also imagine stdout to be of a different type if it's line or character buffered, and continue in the direction of a cartesian explosion for all states. Ok… that seems like it'd cause more problems than it'd solve.
> instead of getting indignant
Please don't assume my mental state. You got it wrong.
> This is a failure of imagination, nothing more. An appropriate type schema for this domain will be able to handle the requirements of the domain.
I'm all ears. Note that it also has to support "I got the file descriptor as a libc::c_int from a C library", or it's not fit for purpose.
I think OP either banged on this until it compiled, maybe blindly copying from other examples, or it's vibe coded, and shows why AI needs supervision from someone who can actually understand what the code does.
Could you please explain me the difference? As UDP is the "User Datagram Protocol" when I read about datagrams I always think about UDP and though it was just a different way of saying the same thing. Maybe "datagram" is supposed to be the packet itself, but you're still sending it via UDP, right?
There's actually a lot of combinations of (domain, type, protocol) that are available. It is not always the case that the protocol implies the type.
In IP land (domains AF_INET and AF_INET6), we have the well known UDP and TCP protocols, of course. UDP is always datagram (SOCK_DGRAM) and TCP is always stream (SOCK_STREAM). Besides datagram-only ICMP, there's also SCTP, which lets you choose between stream and sequential-packet (SOCK_SEQPACKET) types. A sequential-packet socket provides in-order delivery of packet-sized messages, so it sits somewhere between datagram and stream in terms of features.
In AF_UNIX land, there are no protocols (the protocol field is always 0), but all 3 of the aforementioned types are available. You just have to pick the same type on both sides.
Footnotes: SCTP is not widely usable because Windows doesn't natively support it and many routers will drop or reject it instead of forwarding it. Also, AF_UNIX is now supported on Windows, but only with SOCK_STREAM type.
UDP and TCP are Layer 3 protocols, and so is ICMP. They all fill the same bits within network packets, like at the same level. So sending an ICMP packet (protocol 1) is not the same as sending a UDP packet (protocol 17).
Internet Protocols (v6 and v4) send packets via Ethernet (or WiFi or Bluetooth or anything else) from an IP address to an IP address. For structure see https://en.wikipedia.org/wiki/IPv6_packet or if for some reason you still need the legacy version see https://en.wikipedia.org/wiki/IPv4#Packet_structure (aside but notice how much complexity was removed from the legacy version). Notably, IP does not have any mechanism for reliability. It is essentially writing your address and a destination address on a brick and tossing over your fence to the neighbor’s yard and asking them to pass it along. If your neighbor isn’t home your brick is not moving along.
TCP and UDP send streams and datagrams respectively and use the concept of application ports. A TCP stream is what it sounds like: a continuous stream of bytes with no length or predefined stopping point. TCP takes your stream and chunks it into IP packets, the size of which is determined by the lowest Ethernet (or whatever data link protocol) data frame size. Typically this is 1500 but don’t forget to account for header sizes so useful payload size is smaller. TCP is complex because it guarantees that your stream will eventually be delivered in the exact order in which it was sent. Eventually here could mean at t = infinity. UDP simply has source and destination port numbers in its header (which follows the IP header in the data frame), and guarantees nothing: not ordering not guaranteed delivery, etc. If an IP packet is a brick with two house addresses, a UDP datagram is a brick with two house addresses and an ATTN: application X added. An address represents an computer (this is very approximate in the world where any machine can have N addresses and run M VMs or containers which themselves can have O addresses), and a port represents a specific process on that computer.
ICMP does not use ports. ICMP is meant for essentially network and host telemetry so you are still sending and receiving only at an IP address level. But it has a number of message types. You can see them here: https://en.wikipedia.org/wiki/ICMPv6. Note that ping and pong are just two of the types. Others are actually responsible for things like communicating what raw IP cannot. For example Packet Too Large type tells you that an IP packet you tried to send was hitting a part of its path where the datagram size did not allow it to fit and it’s used for IP MTU path discovery (you keep sending different size packets to find what is the largest that will go through).
There are other protocols that run directly on top of IP (6in4 for example, or SCTP). Most are way less popular than the three mentioned above. Some use datagrams (discrete “bricks” of data), some use streams (endless “tapes” of data), which is the difference in protocol family: datagrams vs stream. You can also go a level deeper and just craft raw IP packets directly but for that you typically must be the root user since you can for example send a packet with the source port set to 22 even though you are not the ssh daemon.
Since ICMP has no concept of a port, when you send a ping to a remote host and it returns a ping to you, how does your kernel know to hand the response to your process and not some other one? In the ICMP header there is an ICMP identifier (often the process PID) and when the reply comes back it has the same identifier (but with source and destination IPs swapped and type updated to echo reply). This is what the kernel uses to find the process to which it will deliver the ICMP packet.
This is wrong, despite the Rust library in question's naming convention. You're not creating a UDP socket. You're creating an IP (AF_INET), datagram socket (SOCK_DGRAM), using protocol ICMP (IPPROTO_ICMP). The issue is that the rust library apparently conflates datagram and UDP, when they're not the same thing.
You can do the same in C, by calling socket(2) with the above arguments. It hinges on Linux allowing rootless pings from the GIDs in
EDIT: s/ICMP4/ICMP/gEDIT2: more spelling mistakes