Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Hypersim, Photorealistic Synthetic Dataset for Indoor Scene Understanding (github.com/apple)
122 points by homarp on Dec 21, 2020 | hide | past | favorite | 20 comments


Apple's first published ML paper was on using GANs to make synthetic datasets more realistic. They applied their approach to computer-generated eyeballs to add sensor-like noise. I find it interesting that the paper was published before Face ID was announced, since in retrospect it's clear that's what they were working on.

That paper also mentions something that has still not found its way into an Apple product: they posed 3D models of hands and then degraded them to appear as depth maps, with jagged edges, artifacts, and occlusions.

https://machinelearning.apple.com/research/gan


Might’ve also related to the cool (and somewhat creepy) FaceTime feature that makes it look like you’re making eye contact, even when you’re actually looking at the screen below the camera.


With $57'000 for 77'400 images, the costs are below $1 per image. However, there are only 461 scenes => $123 per scene. They mention vCPU years to perform the rendering. I wonder whether GPUs would be more cost-efficient and feasible for this rendering task.


Hi! I lead the development of this dataset. I'm delighted that it found its way onto Hacker News :D Anyway, I'd love to do all of this rendering on the GPU in future. I'm sure that would be more cost-efficient. When I looked into this topic last year, I remember finding out that GPU rendering with V-Ray is not as flexible in terms of generating all of the different ground truth layers that we want. The ground truth layers that we ship in our public release are a subset of all the layers we generate. Have a look at our public source code for more details. I don't believe it's possible to generate all of those layers on the GPU. But I'd love to be wrong about this!


We are doing something similar. I could not get V-Ray to actually use the GPU, and the guys from ChaosGroup were like "it works at our side". We ended up just using CPU.


> I wonder whether GPUs would be more cost-efficient and feasible for this rendering task.

GPUs are a superior computer for raytracing calculations. But...

CPUs have the software advantage: decades of engineering effort have been poured into CPU-renderers. While we're at the point where GPU-rendering is starting to become a thing, it will inevitably take decades of engineering to recreate all of those features on the GPU.

One important GPU specific problem is the shortage of RAM: CPUs always have more RAM available (even if that CPU-RAM is much much slower). Teaming up the CPU + GPU to handle the RAM problem for large scenes is going to take significant effort. The general ideas are known in research papers and whatever, but not really ported to the actual tools yet. Disney's Moana dataset is known to take ~100GBs of geometry data / textures / animations. But there's no GPU on the planet that can hold all that data. So you'd have to make a CPU+GPU team to take on that problem.


Yes, it would be much cheaper if they used GPU renderer such as OctaneRender https://home.otoy.com/render/octane-render/

However it might lack some features or flexibility compared to V-Ray.


V-ray is a GPU (and CPU) renderer. https://www.chaosgroup.com/vray-gpu


From my experience rendering on GPU is not worth it when you have a lot of images to generate. Generally you want budget your CPUs cores to render a bunch of images at the same time instead of one (or a few) images really fast. The more threads/buckets/blocks/threads you have with V-Ray the more overhead you are paying for. If you know what you are doing you should be able to saturate your NAS speed and/or Network bandwidth.

The other thing to consider is the evermotion dataset might contain a lot of V-Ray Materials and Layers that might be CPU only. At one point Chaosgroup had a list of all the difference between CPU/GPU, but it appears they either solved all of them and/or took the page down.


Ah, this is all good info. Thank you. And great point about Evermotion + V-Ray materials that are CPU-only. It is certainly possible that some of those materials are CPU-only.

I think the Chaos Group documentation you're thinking of is here: https://docs.chaosgroup.com/display/VMAX/V-Ray+GPU+Supported...

Indeed, at least according to this documentation, some of the render elements we are relying on to generate the Hypersim dataset are unsupported on the GPU.


I believe GPU would be more expensive, it would increase human labor too much.

CPU is universal and has more RAM. Optimizing very sophisticated ray tracing for GPU with tiny amount of VRAM could be very tricky. It is easier to do dirty code on CPU, and wait for couple of weeks.

Apple also had spare CPU cycles. This stuff could run overnight in background.


Hi! I lead the development of this dataset. I'm delighted that it found its way onto Hacker News :D I think GPU rendering would be cheaper if it was possible. See my post above.


"To obtain our image dataset, you can run the following download script. On Windows, you'll need to modify the script so it doesn't depend on the curl and unzip command-line utilities.",

Windows 10 has "curl" support out of the box since update 1803.


> Windows 10 has "curl" support out of the box since update 1803.

Huh, TIL.

(At first I thought that maybe Microsoft had just done something like

  Set-Alias -Name curl -Value Invoke-WebRequest
but no, it’s the real deal[0].)

[0]: https://devblogs.microsoft.com/commandline/windows10v1803/


Wow awesome! If Windows also has unzip, I can simplify the instructions! Thank you for pointing this out :D


You're welcome! I should have filed that as issue on Github in addition to commenting here.

Windows does not have unzip but I think you can uncompress files from the Windows command line via a detour to powershell: powershell "expand-archive file.zip -d <path/to/destination...>"

I just checked, this works fine here.


Can someone provide a couple concrete use cases for this data set? You could train a model to.... ?


Hi! I lead the development of this dataset. I'm delighted that it found its way onto Hacker News :D Anyway, we think this type of data could eventually be used for a wide range of tasks that people care about, e.g., photo re-lighting, virtual home remodeling, and safe indoor robot navigation. Note that these are my personal views as a computer vision researcher that works extensively with synthetic data.


Almost certainly they are interested in using it for their augmented reality technology ARKit, in order for your phone/device to be able to not just get a depth map of the room (which it does now) but understand what is a chair, what is a table, what is a wall, etc.


...do image segmentation on indoor scenes?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: