Your question makes sense. I’ve addressed this elsewhere but I’m currently just answering on my phone so I’ll summarize by saying that our focus is on complete, solid local data one municipality at a time (full context is really helpful when actually using the data) as opposed to any particular type of data.
That said, we still have work to do pointing people in this direction and helping them understand why. This whole thread is going to really affect our website and docs :)
I finally switched from firefox to safari on iOS. The latest redesign was the last straw for me. Very frustrating how the look of the app keeps changing without any useful updates. "telemetry says..."
Pretty sure they're doing this to entice people to vote in the Georgia runoffs so Dems will control the Senate. "Vote for us and we can make this law."
This right here. It's literally this right here. So in other words, it's not pointless, it's political tactic. A good tactic to entice unlikely voters to vote Democrat.
Multiple reasons, depending on your level of cynicism. Least cynical is that there were competing priorities in the general and the current President was (is) an overall distraction from any policy discussions. Most cynical is that Dems don't actually like winning because it gives them somebody to blame. Both are corporatist parties that don't generally work for the majority of people. But, like I said, that's the cynical take!
We implemented a 90 day retention policy for all slack messages about a year ago. Everyone was pretty resistant to it at first, but it's been pretty great, honestly. It forces us to put things we'll want to remember in the handbook and encourages a more async culture. Personally I'd be happy with a 30 day retention policy!
Snowflake is the go to data warehouse in my opinion. Redshift and BigQuery are fine, but Snowflake is head and shoulders above. Good community around it and tools for it (dbt - works on other warehouse though). They have the mindshare in the data warehouse market.
There's so much they can do from a user experience perspective to make it even better. The integration with Numeracy was a trainwreck, but the fundamentals of the DB are there.
Interesting to see they lose so much money, but I bet their margins have to be so thin running on the cloud. I wonder if they'll ever have to go bare metal to make it work.
Working with it was fraught with issues. Performance was mediocre at best, it was horribly expensive, Python and JS client libs had re-occurring issues with disconnecting and reconnecting. The advice given to us around scaling concurrent connections was bizarre at best. Teammates had numerous issues where it was clear corners had been cut in handling some edge cases around handling certain unicode characters. Their Snowpipe "streaming" implementation was...not good. The idea of having having compute workers that "spun up and down" sounded good in theory, but in practice lead to more bottlenecks and delays than anything else.
The AWS outage last year that prevented you from provisioning new instances essentially crippled our snowflake DB.
I almost go out of my way to recommend people _not_ use it. I keep seeing it pop up, but mostly because it seems they're doing what Mongo DB did in the early days and just throw marketing money to capture mindshare as opposed to being an actually good product.
We changed to ClickHouse and the difference was literally night-and-day. The performance especially was far superior.
I can't believe that they will succeed in the long run as an independent player IN the cloud.
They are always going to be less integrated and less infrastructure-cost-efficient than the native options (Redshift and BigQuery), without the R&D budgets and with incremental friction (sales) and risk (data privacy and cybersecurity).
AWS really should get around to buying them, like they should have bought Looker or Tableau or Mode or Fivetran or DBT, etc, ect.
Snowflake is wildly better than Redshift, no matter how you want to look at it -- integrations, cost, performance, etc.
Like, in a sane world I agree with you -- Redshift SHOULD have a crazy competitive advantage. But somehow they've been unable to execute on that goal for half a decade, and I don't see that changing quickly, given Snowflake's mindshare and growth.
Snowflake is better. Redshift has been really slow to execute. AWS is doing the world's worst job of articulating whatever vision they have for analytics. AWS's message is laser-focused on infrastructure folks and machine learning engineers (not analyst, data scientist, not absolutely anything else).
The higher you go up the stack, the slower and less meaningful, AWS's solutions feel. There is a fantastic job opportunity out there for someone to reconcile AWS's data analytics offerings. They have so much upside.
I'm still not betting on Snowflake winning a direct competition with their primary supplier. For the enterprise and the highly regulated: Redshift is good enough, already there, and they don't NEED the efficiencies that Snowflake makes available.
Redshift is an onpremise piece of software that was converted into a cloud platform (acquired by AWS). Snowflake was built from day 1 as a cloud platform with awesome big data frameworks as its internal architecture. Its very hard for Redshift to rearchitect itself in the way Snowflake was designed from the start because they need to continue supporting existing instances and create an entirely new product.
You don't need to own the public cloud infrastructure to build a better product.
Example: you can play inside ball on storage infrastructure costs to get a 2x cost benefit at the expense of a lot of extra engineering. Better DBMS storage organization, which is available to any implementation, gets you 10x (or greater) improvement. Which would you rather have?
In fact, products like Redshift don't even really game the infrastructure prices. Costs to customers are comparable with Snowflake for equivalent resources as far as I can tell. They both charge what the market will bear.
Hi, what yo are saying is cryptic to me would I would love to understand. would you mind breaking it down for the financially literate but tech handicapped person I am please? thanks much!!
Sure! Sorry to be so obscure, it was not a good explanation. To take the above example, let's say you have a database with 1TB of tabular data in Amazon.
1. You start out storing it on Amazon gp2 Elastic Block Store, which is fast block storage available on the network. It costs about $0.10 US per month per GB, so that's $102.40 per month.
2. Data (sadly) has a habit of getting destroyed in accidents so we normally replicate to at least one other location. Let's say we just replicate once. You are now up to $204.80 per month.
Now we have a couple of ways of reducing costs.
1. We could make the block storage itself cheaper thanks to inside knowledge of how it works plus clever financial engineering. However, the _most_ that can get us is about 5x savings, because prices for similar classes of storage are not that different. The real discount is more like 2x if we want to make money and be reasonably speedy. You likely have to do engineering work--like implementing blended storage--for this latter approach, so it's not free. So, we're back to $102.40 per month.
2. Or, we could build a better database.
2a.) Let's first build a database that can store data in S3 object storage instead of block storage. Now our storage costs about $0.02 per GB per month. Plus S3 is replicated, so we can maybe just keep a single copy. We're down to $10.28 per month but we had to rewrite the database to get it, because S3 behaves very differently from block storage and we have to build clever caches to work on it.
2b.) But wait! There's more. We could also arrange tabular data in columns rather than rows, which allows us to apply very efficient compression. Let's say the compression reduces size by 90% overall. We're now down to just $1.03 per month. Again, we had to rewrite the database, but we got a huge savings in return, like 100x.
The moral is that clever arrangement of data just about always beats financial shenanigans, usually by a wide margin. The primary reason that Amazon has done well in data services like Redshift and Aurora is partly that they have been extremely smart about data services, not any inherent advantage as platform owners.
Snowflake is better than Redshift but BigQuery has improved greatly in the last 2 years to fill in a lot of the missing gaps. I find Snowflake is the best at dealing with unstructured/JSON data and handling interactive results on smaller datasets while BQ is great with serverless scaling and very large computations.
"Our business benefits from powerful network effects. The Data Cloud will continue to grow as organizations move their siloed data from cloud-based repositories and on-premises data centers to the Data Cloud. The more customers adopt our platform, the more data can be exchanged with other Snowflake customers, partners, and data providers, enhancing the value of our platform for all users. We believe this network effect will help us drive our vision of the Data Cloud."
I fail to understand this network effect. Is there any conflation here ? How does data sharing equate to network effect. Something is fundamentally not adding up here. If I share my data with 10 other customers, it should inherently enhance my experience. How does this happen with Snowflake ?
This is one hypothetical way they could capture this value:
1) Building a common platform to upload datasets by anyone. e.g. weather data, retail data, govt data, other open data, or close data (copyright etc). They gave the example of COVID cases in their S-1 doc.
2) Providing mechanism for others to find data through a marketplace; some data is free, other only via payment (with diff monetisation models, e.g. per consumption, per month). Allow other customers to consume it as & when needed. Note, based on their S-1 doc, data is never copied when shared with others, so cost is limited to share with a wide audience.
3) More data on the platform, more data is shareable in the 'marketplace' and more data used by everyone. This increases the value of the whole platform through network effects.
4) Also opens up alternative revenue streams. e.g. more revenue through storage (more data on platform from different people). and revenue from shared data that is consumed (maybe)
I'm a little skeptical of this as well, but I think there is a path. At my previous company we would take in a lot of data from other companies and do analysis for them. If we had a really easy way to share the transformed and analyzed data after it's been modelled in the warehouse, that really would have been great. The question is, are you going to get companies to create a snowflake account just so they can access data in this way? Maybe if it's easy to export / do further analysis.
One of the barriers for Snowflake is that while it's better than what AWS offers, very few customers start out needing everything Snowflake does. They grow into that. So they stick with AWS, hoping that the features/capabilities there grow fast enough to keep up.
But, also very expensive. You can do queries on a spark cluster for tiny fractions of what they charge. But, snowflake makes things easy for the "decision makers" (who know SQL). So, all good.
Having run a medium size Spark cluster, I'm not sure I agree.
If you have 80-100% utilization for a month, perhaps, but the beauty of Snowflake is that you can spin up a 3XL warehouse for a few MINUTES to get answers fast, and then shut it down again and don't pay anything.
Saying "you could run it on self-managed Spark/Oracle/Hive/SQLite" is approximately the same argument as saying "I can run a web server cheaper myself than paying Amazon for an EC2 instance" -- there are cases where that is true, but there are many, many, cases where the "on demand capacity" is the bigger benefit.
> the beauty of Snowflake is that you can spin up a 3XL warehouse for a few MINUTES to get answers fast, and then shut it down again and don't pay anything
Is this why they're making a $350mn annual loss?
A million dollars a day loss would be a pretty big deal to me.
I recently saw youtube video on that called the cardboard box reform. It caught my eye because I hadn't heard anyone talking about it, but it really makes sense. I don't know how you sell less transparency to the public though.
I've been thinking about this a lot lately and have come to a similar conclusion. I've been very focused on technical skills lately, but in doing so I've started to lose some connections in my org and network. Career growth depends more on relationships than I'd like to admit.