Hacker Newsnew | past | comments | ask | show | jobs | submit | liotier's commentslogin

InfoReseaux remarked that this data is suspicious, to say the least: CC BY-NC 4.0 but contains ODbL licensed data coming from Openstreetmap - but also from Microsoft.

Thread on the Openstreetmap forum: https://community.openstreetmap.org/t/is-globalbuildingatlas...

Christian Quest sent the following message to the authors:

"I’m writing to you because I’m surprised by the choice of data license you’ve set on the GlobalBuildingAtlas dataset.

As mentioned and explained in your paper, at least two data sources you’ve been using to create this dataset are under the Open Database License (ODbL): OpenStreetMap and Microsoft building datasets.

I’ve downloaded the extract of data you’re proposing to have a look at the final dataset, and it confirms that building polygons from OSM (and Microsoft) are present in the resulting dataset in a substantial portion.

In such case, your dataset must be published under the ODbL licence (see 4.2), because it is a derivative database (see 1.0 of ODbL license for definition).

A copy of this message has also been sent to the Legal Working Group of the OSM Foundation.

Thanks in advance to fix quickly the license of the dataset you published. This will also allow OpenStreetMap contributors to use it to improve OpenStreetMap, which is not possible with the CC-BY-NC you choose."


Yes, this dataset is not CC BY-NC 4.0. Be careful.

"Brad Edwards" and "Bradley Edwards" might be the same individual.


Yes, the dataset also has three entries for Virginia Giuffre, "Virginia L. Giuffre", "Virginia Roberts Giuffre", and "Jane Doe Number 3 (Virginia Roberts)"


I read a recent observation that people subject to discovery are often making purposeful typos in key names in order for the communication to remain under the radar.


Everyone is potentially subject to discovery. Some people are just more aware of it.


Likewise for instances of "Larry" and "Lawrence" Summers... probably a lot of those.


I’m sure some developer/archivist is working on a name authority as we speak.


great use case for using AI to suggest mergers and clean up.


LLMs are awful for this. I've got a project that's doing structured extraction and half the work is deduplication.

I didn't go down the route of LLMs for the clean up, as you're getting into scale and context issues with larger datasets.

I got into semantic similarity networks for this use case. You can do efficient pairwise matching with Annoy, set a cutoff threshold, and your isolated subgraphs are merger candidates.

I wrapped up my code in a little library if you're into this sort of thing.

github.com/specialprocedures/semnet


Nice looking library! Might try it for one of my own projects.


You mean verbing a noun ?


Verbing weirds language.


The trees are really sneezing today.


> estadounidenses

And in French the inhabitants of "les Etats-Unis" are "Etats-uniens". I've taken the habit of referring to them as USAians, which often gets negative reactionsand remains rare - but I find it is the most accurate demonym and I'll keep pushing it.

I look forward to the world inventing demonyms for the citizens of the European Union, because at least it will mean that our emerging national body is getting mindshare !


Whenever I’ve heard the term américain it’s been used to refer to a US citizen, not a mexicain or citizen of some other American country.


Yes, "Américains" is much more common - and that is the windmill I'm tilting at.


> I look forward to the world inventing demonyms for the citizens of the European Union, because at least it will mean that our emerging national body is getting mindshare !

USA is a country and EU is not


The European Union is an emerging country - it is my country. For now, many don't yet understand how common necessity binds us, and some remain under the illusion that they can make it alone against China and the USA, but ever closer union is real and whoever has been on Erasmus student exchange knows we are one people. On my French passport, "Union Européenne" is written above "République Française" - that is the hierarchy. A nation is people who will to live together, and the European Union is that... The rest is a couple treaties and a few decades away !


[This post to prevent ulterior posting of "yo mama" jokes]


> Contrast this to the “medias” like Threads, Bluesky, etc - moderation becomes impossible just because of the sheer scale of it all.

Wut ? Moderation at Bluesky is fantastic: users build their block lists and share them for others to subscribe to - moderation à la carte... Power to the users !


More like hermetic narrative security at scale.

That would change for the better when BlueSky itself only manages legal prohibitions and lets Everything Else be an optional layer.

While the hermetic narrative security would still be there to split people, it would only split by optional layers.


I had two accounts banned from BlueSky and they didn't say why. One was parodying Donald Trump so fair enough if they don't want content like that, and they told me it was banned for impersonating Donald Trump. The other, no idea at all because I don't think I even tweeted anything very controversial, and the email was just a very generic "you violated terms of service". My third account was not banned, but I don't use BlueSky any more. It's not a ban-evasion ban, since they're logged in together in the same web browser, with the menu to switch accounts active, and yet my third account was not banned.

My point of sharing this info is that BlueSky is not a user-driven moderation system. It arbitrarily and centrally bans accounts, just like Twitter.


You're right, Bluesky moderation is centralized. Unless content is served p2p, some moderation has to be centralized. At the end of the day, there's a server serving content and that server operator is legally obligated to remove illegal material.

Hopefully, atproto + community will provide alternatives for moderation services. Work is being done on this, we'll see what we end up getting. I feel that a competitive ecosystem of moderation services is probably the best answer we can hope for to that inherently messy problem.


> think you need any of the PRO features

Pro features ? Now I see - it is open core, with a $299 license. I'll pass.


Good for you!

I don't use anything from pro and I use datastar at work. I do believe in making open source maintainable though so bought the license.

The pro stuff is mostly a collection of foot guns you shouldn't use and are a support burden for the core team. In some niche corporate context they are useful.

You can also implement your own plugins with the same functionality if you want it's just going to cost you time in instead of money.

I find devs complaining about paying for things never gets old. A one off life time license? How scandalous! Sustainable open source? Disgusting. Oh a proprietary AI model that is built on others work without their consent and steals my data? Only 100$ a month? Take my money!


It is 299$ lifetime. It is extremely cheap


I wish everyone used LLDP everywhere: it is harmless and immensely helps in finding the correct spaghetti in the plate.


All surface area carries risk, and LLDP is additional surface area when exposed on a public IX.


I imagine the author of the parent comment despairing of being upvoted.


> "MORSE SPARCstation 2 - The second SPARCstation 2 has a MORSE sticker on it"

MORSE was a big Sun reseller in Europe in the 90's. I haven't heard their name since the dot-com bubble popped !


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: