I don’t see why you’d write one of these yourself when the one in the standard l...

apjana · on April 10, 2020

Not necessarily. There are implementations which don't even take advantage of 4/8 byte copying. We wanted to have something uniform. But yes, you are right with glibc or macOS.

Also, from the strncpy man page:

   strlcpy()
       Some systems (the BSDs, Solaris, and others) provide the following function:

           size_t strlcpy(char *dest, const char *src, size_t size);

       This function is similar to strncpy(), but it copies at most size-1 bytes to dest, always adds  a
       terminating  null  byte,  and  does  not pad the target with (further) null bytes.  This function
       fixes some of the problems of strcpy() and strncpy(), but the caller must still handle the possi‐
       bility of data loss if size is too small.  The return value of the function is the length of src,
       which allows truncation to be easily detected: if the return value is greater than  or  equal  to
       size,  truncation  occurred.  If loss of data matters, the caller must either check the arguments
       before the call, or test the function return value.  strlcpy() is not present in glibc and is not
       standardized by POSIX, but is available on Linux via the libbsd library.

saagarjha · on April 10, 2020

Why not call strncpy or memcpy rather than exhibiting undefined behavior?

apjana · on April 11, 2020

> Why not call strncpy

Read the excerpt.

> undefined

Nothing's _undefined_ there.

saagarjha · on April 11, 2020

I'm not sure which specific excerpt you're referring to, but I have a good idea of the many functions that libraries have come up with to sling characters from one buffer to another, plus I read your implementation and the man page snippet you linked above. I'm still not seeing why you can't replace the code between lines 881 and 902 with one of the appropriate copying routines; you quite literally have a source, destination, and length and you can fix up the last NUL byte right after the call. The standard library's function will be vectorized regardless of how your compiler was feeling that day, and it's probably smarter than yours (glibc, for example, does a "small copy" up to alignment before it launches into the vectorized stuff, rather than skipping it entirely if the buffers aren't aligned). And your function does have undefined behavior: you pun a char * to a ulong *.

apjana · on April 11, 2020

> glibc, for example

> The standard library's function will be vectorized regardless of how your compiler was feeling that day

We talked glibc. I mentioned there are libraries which do not do _any_ optimization other than a byte by byte copy.

> I'm still not seeing why you can't replace the code between lines 881 and 902

Because you are considering only glibc.

And yes, we can do a lot of things. But the function and copying buffers are not the top priority for us ATM. I shared it as an example in the context of the current topic. Not all code is supposed to match your preferences.

> char * to a ulong *

Both the source and destination are guarded by length and alignment requirement checks.

saagarjha · on April 11, 2020

> Not all code is supposed to match your preferences.

Yikes, sorry if I came off as trying to force my opinion on your project. I'm just trying to understand the rationale behind the choices you made, since I've (clearly) never seen anything like it. (If I was genuinely interested in trying to modify your project to my desires, I hope you can believe I'd be kind enough to dig through the project to see if I could figure this out myself, then send a patch with rationale for you to decide whether you wanted it or not, rather than yell at you on Hacker News to fix it.) But to your points:

> We talked glibc. I mentioned there are libraries which do not do _any_ optimization other than a byte by byte copy.

I haven't actually seen one for quite a while–most of the libcs that I'm familiar with (glibc, macOS's libc, musl, libroot, the various BSD libcs, Bionic) have some sort of vectorized code. I'm curious if the project can run on some obscure system that I'm not considering ;)

> Both the source and destination guarded by length and alignment requirement checks.

Perhaps we have a misunderstanding here: I'm saying it's undefined by the C standard, as the pointer cast is a strict aliasing violation regardless of the checks. It will generally compile correctly as char * can alias any type, so the compiler will probably be unable to find the undefined behavior, but it's technically illegal. (I would assume this is one of the many reasons most libcs implement their string routines in assembly.)

apjana · on April 11, 2020

> send a patch with rationale for you to decide whether you wanted it or not

I would really appreciate that. And I do understand your intention is good.

The problem I see with geeky forums is there are just too many people trying to force their ideas on you at every step and expect you to implement those. So it's kind of a standard reply from my side.

saagarjha · on April 11, 2020

It’s a bit too late for me to be writing string manipulation code in C, but I’ll see if I can take a look at this tomorrow. It’ll probably just be a replacement of the copying part with memcpy, and a benchmark if I can find one.

apjana · on April 11, 2020

Sure thing! Thanks a lot!

apjana · on April 11, 2020

Also, here's the source from musl (somewhat similar to what we have)

https://github.com/ifduyue/musl/blob/79f653c6bc2881dd6855299...

saagarjha · on April 11, 2020

Yeah, musl just vectorizes mildly using when certain GNU C extensions are available. Presumably Rich didn’t want to write out another version in assembly. (It really is a shame that strncpy returns dest.)