Hypersim, Photorealistic Synthetic Dataset for Indoor Scene Understanding

rgovostes · on Dec 21, 2020

Apple's first published ML paper was on using GANs to make synthetic datasets more realistic. They applied their approach to computer-generated eyeballs to add sensor-like noise. I find it interesting that the paper was published before Face ID was announced, since in retrospect it's clear that's what they were working on.

That paper also mentions something that has still not found its way into an Apple product: they posed 3D models of hands and then degraded them to appear as depth maps, with jagged edges, artifacts, and occlusions.

https://machinelearning.apple.com/research/gan

rafram · on Dec 21, 2020

Might’ve also related to the cool (and somewhat creepy) FaceTime feature that makes it look like you’re making eye contact, even when you’re actually looking at the screen below the camera.

isusmelj · on Dec 21, 2020

With $57'000 for 77'400 images, the costs are below $1 per image. However, there are only 461 scenes => $123 per scene. They mention vCPU years to perform the rendering. I wonder whether GPUs would be more cost-efficient and feasible for this rendering task.

mikeroberts3000 · on Dec 21, 2020

Hi! I lead the development of this dataset. I'm delighted that it found its way onto Hacker News :D Anyway, I'd love to do all of this rendering on the GPU in future. I'm sure that would be more cost-efficient. When I looked into this topic last year, I remember finding out that GPU rendering with V-Ray is not as flexible in terms of generating all of the different ground truth layers that we want. The ground truth layers that we ship in our public release are a subset of all the layers we generate. Have a look at our public source code for more details. I don't believe it's possible to generate all of those layers on the GPU. But I'd love to be wrong about this!

Icko · on Dec 21, 2020

We are doing something similar. I could not get V-Ray to actually use the GPU, and the guys from ChaosGroup were like "it works at our side". We ended up just using CPU.

dragontamer · on Dec 21, 2020

> I wonder whether GPUs would be more cost-efficient and feasible for this rendering task.

GPUs are a superior computer for raytracing calculations. But...

CPUs have the software advantage: decades of engineering effort have been poured into CPU-renderers. While we're at the point where GPU-rendering is starting to become a thing, it will inevitably take decades of engineering to recreate all of those features on the GPU.

One important GPU specific problem is the shortage of RAM: CPUs always have more RAM available (even if that CPU-RAM is much much slower). Teaming up the CPU + GPU to handle the RAM problem for large scenes is going to take significant effort. The general ideas are known in research papers and whatever, but not really ported to the actual tools yet. Disney's Moana dataset is known to take ~100GBs of geometry data / textures / animations. But there's no GPU on the planet that can hold all that data. So you'd have to make a CPU+GPU team to take on that problem.

Geee · on Dec 21, 2020

Yes, it would be much cheaper if they used GPU renderer such as OctaneRender https://home.otoy.com/render/octane-render/

However it might lack some features or flexibility compared to V-Ray.

dahart · on Dec 21, 2020

V-ray is a GPU (and CPU) renderer. https://www.chaosgroup.com/vray-gpu

Dork1234 · on Dec 21, 2020

From my experience rendering on GPU is not worth it when you have a lot of images to generate. Generally you want budget your CPUs cores to render a bunch of images at the same time instead of one (or a few) images really fast. The more threads/buckets/blocks/threads you have with V-Ray the more overhead you are paying for. If you know what you are doing you should be able to saturate your NAS speed and/or Network bandwidth.

The other thing to consider is the evermotion dataset might contain a lot of V-Ray Materials and Layers that might be CPU only. At one point Chaosgroup had a list of all the difference between CPU/GPU, but it appears they either solved all of them and/or took the page down.

mikeroberts3000 · on Dec 21, 2020

Ah, this is all good info. Thank you. And great point about Evermotion + V-Ray materials that are CPU-only. It is certainly possible that some of those materials are CPU-only.

I think the Chaos Group documentation you're thinking of is here: https://docs.chaosgroup.com/display/VMAX/V-Ray+GPU+Supported...

Indeed, at least according to this documentation, some of the render elements we are relying on to generate the Hypersim dataset are unsupported on the GPU.

jankotek · on Dec 21, 2020

I believe GPU would be more expensive, it would increase human labor too much.

CPU is universal and has more RAM. Optimizing very sophisticated ray tracing for GPU with tiny amount of VRAM could be very tricky. It is easier to do dirty code on CPU, and wait for couple of weeks.

Apple also had spare CPU cycles. This stuff could run overnight in background.

mikeroberts3000 · on Dec 21, 2020

Hi! I lead the development of this dataset. I'm delighted that it found its way onto Hacker News :D I think GPU rendering would be cheaper if it was possible. See my post above.

_Microft · on Dec 21, 2020

"To obtain our image dataset, you can run the following download script. On Windows, you'll need to modify the script so it doesn't depend on the curl and unzip command-line utilities.",

Windows 10 has "curl" support out of the box since update 1803.

1f60c · on Dec 21, 2020

> Windows 10 has "curl" support out of the box since update 1803.

Huh, TIL.

(At first I thought that maybe Microsoft had just done something like

  Set-Alias -Name curl -Value Invoke-WebRequest

but no, it’s the real deal[0].)

[0]: https://devblogs.microsoft.com/commandline/windows10v1803/

mikeroberts3000 · on Dec 21, 2020

Wow awesome! If Windows also has unzip, I can simplify the instructions! Thank you for pointing this out :D

_Microft · on Dec 21, 2020

You're welcome! I should have filed that as issue on Github in addition to commenting here.

Windows does not have unzip but I think you can uncompress files from the Windows command line via a detour to powershell: powershell "expand-archive file.zip -d <path/to/destination...>"

I just checked, this works fine here.

nightowl_games · on Dec 21, 2020

Can someone provide a couple concrete use cases for this data set? You could train a model to.... ?

mikeroberts3000 · on Dec 21, 2020

Hi! I lead the development of this dataset. I'm delighted that it found its way onto Hacker News :D Anyway, we think this type of data could eventually be used for a wide range of tasks that people care about, e.g., photo re-lighting, virtual home remodeling, and safe indoor robot navigation. Note that these are my personal views as a computer vision researcher that works extensively with synthetic data.

rgovostes · on Dec 21, 2020

Almost certainly they are interested in using it for their augmented reality technology ARKit, in order for your phone/device to be able to not just get a depth map of the room (which it does now) but understand what is a chair, what is a table, what is a wall, etc.

1f60c · on Dec 21, 2020

...do image segmentation on indoor scenes?