More

veritas3241 · on Sept 19, 2022

https://docs.pdap.io/activities/data-sources/explore-data-so... It's in the docs.

s1artibartfast · on Sept 19, 2022

It still isn't very clear what the focus and direction they want to go in is from that page.

What would be an example of a core data set they are trying to compile? police involved shootings? police budgets? everything.

jasonlotito · on Sept 20, 2022

Seems pretty obvious from the page:

> Our mission is to make data from every U.S. police agency accessible via a single public resource.

More precisely:

> There are over 18,000 police organizations, and each has a unique way to publish information.

i.e. Police are publishing data? Let's organize it and make it easily accessible.

s1artibartfast · on Sept 20, 2022

I am curious and want to know more.

"Police data" is incredibly vague. What types of data are most interest and available from most of every agency?

If the only answer to what kind of data is "police data" then I'm not sure if I should care, support, or contribute.

Is this data on how many toilet paper roles departments purchase or police involved shootings? neither?

Certainly you can see where the question is coming from right?

josh-pdap · on Sept 20, 2022

Your question makes sense. I’ve addressed this elsewhere but I’m currently just answering on my phone so I’ll summarize by saying that our focus is on complete, solid local data one municipality at a time (full context is really helpful when actually using the data) as opposed to any particular type of data.

That said, we still have work to do pointing people in this direction and helping them understand why. This whole thread is going to really affect our website and docs :)

s1artibartfast · on Sept 20, 2022

Thanks for responding! What is complete data? Arrests by time and geolocation? Quarterly stats by call or response?

I have gone through some of the public info for my department so I am curious.

jasonlotito · on Sept 20, 2022

Again, it seems to me that they if police are publishing data, they want to make it accessible.

Are police publishing arrests by time and geolocation? Quarterly stats by call or response? Then yes.

Hell, I'm sure if they are publishing coffee consumption numbers, that would go up as well.

s1artibartfast · on Sept 20, 2022

>Are police publishing arrests by time and geolocation? Quarterly stats by call or response? Then yes.

I obviously don't know, and neither do you.

I don't know why you feel the need to weigh in on my question with unhelpful non-answers.

veritas3241 · on Sept 22, 2021

I finally switched from firefox to safari on iOS. The latest redesign was the last straw for me. Very frustrating how the look of the app keeps changing without any useful updates. "telemetry says..."

veritas3241 · on Dec 4, 2020

Pretty sure they're doing this to entice people to vote in the Georgia runoffs so Dems will control the Senate. "Vote for us and we can make this law."

SiFiPi · on Dec 4, 2020

This right here. It's literally this right here. So in other words, it's not pointless, it's political tactic. A good tactic to entice unlikely voters to vote Democrat.

ocdtrekkie · on Dec 5, 2020

I suppose this is reasonable, but then, why'd they wait until after the general?

veritas3241 · on Dec 5, 2020

Multiple reasons, depending on your level of cynicism. Least cynical is that there were competing priorities in the general and the current President was (is) an overall distraction from any policy discussions. Most cynical is that Dems don't actually like winning because it gives them somebody to blame. Both are corporatist parties that don't generally work for the majority of people. But, like I said, that's the cynical take!

veritas3241 · on Oct 16, 2020

We implemented a 90 day retention policy for all slack messages about a year ago. Everyone was pretty resistant to it at first, but it's been pretty great, honestly. It forces us to put things we'll want to remember in the handbook and encourages a more async culture. Personally I'd be happy with a 30 day retention policy!

veritas3241 · on Aug 24, 2020

I had the exact same question. Pg 127 seems to give the answer.

> Date Available for Sale in the Public : The 91st day after the date of this prospectus (First Release).

Edit: Seems like I was wrong. This is for current shareholders. I saw somewhere on the internet sometime in October. That's a wild guess though.

xavdid · on Aug 24, 2020

Assuming "Date of this prospectus" is today (Aug 24), then 91 days from now is Monday November 23, 2020.

mrgordon · on Aug 24, 2020

No this section is about the rules on selling for current shareholders.

veritas3241 · on Aug 24, 2020

Ah good catch. I don't know then.

veritas3241 · on Aug 24, 2020

Snowflake is the go to data warehouse in my opinion. Redshift and BigQuery are fine, but Snowflake is head and shoulders above. Good community around it and tools for it (dbt - works on other warehouse though). They have the mindshare in the data warehouse market.

There's so much they can do from a user experience perspective to make it even better. The integration with Numeracy was a trainwreck, but the fundamentals of the DB are there.

Interesting to see they lose so much money, but I bet their margins have to be so thin running on the cloud. I wonder if they'll ever have to go bare metal to make it work.

FridgeSeal · on Aug 25, 2020

I could not disagree more.

Working with it was fraught with issues. Performance was mediocre at best, it was horribly expensive, Python and JS client libs had re-occurring issues with disconnecting and reconnecting. The advice given to us around scaling concurrent connections was bizarre at best. Teammates had numerous issues where it was clear corners had been cut in handling some edge cases around handling certain unicode characters. Their Snowpipe "streaming" implementation was...not good. The idea of having having compute workers that "spun up and down" sounded good in theory, but in practice lead to more bottlenecks and delays than anything else.

The AWS outage last year that prevented you from provisioning new instances essentially crippled our snowflake DB.

I almost go out of my way to recommend people _not_ use it. I keep seeing it pop up, but mostly because it seems they're doing what Mongo DB did in the early days and just throw marketing money to capture mindshare as opposed to being an actually good product.

We changed to ClickHouse and the difference was literally night-and-day. The performance especially was far superior.

veritas3241 · on Aug 25, 2020

Sorry you had a bad experience with it. Certainly compared to redshift it's a dream. I use it every day and it's been great for us.

dataminded · on Aug 24, 2020

I can't believe that they will succeed in the long run as an independent player IN the cloud.

They are always going to be less integrated and less infrastructure-cost-efficient than the native options (Redshift and BigQuery), without the R&D budgets and with incremental friction (sales) and risk (data privacy and cybersecurity).

AWS really should get around to buying them, like they should have bought Looker or Tableau or Mode or Fivetran or DBT, etc, ect.

bpodgursky · on Aug 24, 2020

Snowflake is wildly better than Redshift, no matter how you want to look at it -- integrations, cost, performance, etc.

Like, in a sane world I agree with you -- Redshift SHOULD have a crazy competitive advantage. But somehow they've been unable to execute on that goal for half a decade, and I don't see that changing quickly, given Snowflake's mindshare and growth.

dataminded · on Aug 25, 2020

I agree with you.

Snowflake is better. Redshift has been really slow to execute. AWS is doing the world's worst job of articulating whatever vision they have for analytics. AWS's message is laser-focused on infrastructure folks and machine learning engineers (not analyst, data scientist, not absolutely anything else).

The higher you go up the stack, the slower and less meaningful, AWS's solutions feel. There is a fantastic job opportunity out there for someone to reconcile AWS's data analytics offerings. They have so much upside.

I'm still not betting on Snowflake winning a direct competition with their primary supplier. For the enterprise and the highly regulated: Redshift is good enough, already there, and they don't NEED the efficiencies that Snowflake makes available.

1290cc · on Aug 28, 2020

Redshift is an onpremise piece of software that was converted into a cloud platform (acquired by AWS). Snowflake was built from day 1 as a cloud platform with awesome big data frameworks as its internal architecture. Its very hard for Redshift to rearchitect itself in the way Snowflake was designed from the start because they need to continue supporting existing instances and create an entirely new product.

hodgesrm · on Aug 26, 2020

You don't need to own the public cloud infrastructure to build a better product.

Example: you can play inside ball on storage infrastructure costs to get a 2x cost benefit at the expense of a lot of extra engineering. Better DBMS storage organization, which is available to any implementation, gets you 10x (or greater) improvement. Which would you rather have?

In fact, products like Redshift don't even really game the infrastructure prices. Costs to customers are comparable with Snowflake for equivalent resources as far as I can tell. They both charge what the market will bear.

choubix · on Aug 26, 2020

Hi, what yo are saying is cryptic to me would I would love to understand. would you mind breaking it down for the financially literate but tech handicapped person I am please? thanks much!!

hodgesrm · on Aug 26, 2020

Sure! Sorry to be so obscure, it was not a good explanation. To take the above example, let's say you have a database with 1TB of tabular data in Amazon.

1. You start out storing it on Amazon gp2 Elastic Block Store, which is fast block storage available on the network. It costs about $0.10 US per month per GB, so that's $102.40 per month.

2. Data (sadly) has a habit of getting destroyed in accidents so we normally replicate to at least one other location. Let's say we just replicate once. You are now up to $204.80 per month.

Now we have a couple of ways of reducing costs.

1. We could make the block storage itself cheaper thanks to inside knowledge of how it works plus clever financial engineering. However, the _most_ that can get us is about 5x savings, because prices for similar classes of storage are not that different. The real discount is more like 2x if we want to make money and be reasonably speedy. You likely have to do engineering work--like implementing blended storage--for this latter approach, so it's not free. So, we're back to $102.40 per month.

2. Or, we could build a better database.

2a.) Let's first build a database that can store data in S3 object storage instead of block storage. Now our storage costs about $0.02 per GB per month. Plus S3 is replicated, so we can maybe just keep a single copy. We're down to $10.28 per month but we had to rewrite the database to get it, because S3 behaves very differently from block storage and we have to build clever caches to work on it.

2b.) But wait! There's more. We could also arrange tabular data in columns rather than rows, which allows us to apply very efficient compression. Let's say the compression reduces size by 90% overall. We're now down to just $1.03 per month. Again, we had to rewrite the database, but we got a huge savings in return, like 100x.

The moral is that clever arrangement of data just about always beats financial shenanigans, usually by a wide margin. The primary reason that Amazon has done well in data services like Redshift and Aurora is partly that they have been extremely smart about data services, not any inherent advantage as platform owners.

Edit: fixed math error

manigandham · on Aug 24, 2020

For them to be an attractive acquisition target means they are succeeding, otherwise what would a cloud vendor gain from buying them?

theflork · on Aug 24, 2020

talent, patents, less competition

manigandham · on Aug 24, 2020

Snowflake is better than Redshift but BigQuery has improved greatly in the last 2 years to fill in a lot of the missing gaps. I find Snowflake is the best at dealing with unstructured/JSON data and handling interactive results on smaller datasets while BQ is great with serverless scaling and very large computations.

deepGem · on Aug 26, 2020

"Our business benefits from powerful network effects. The Data Cloud will continue to grow as organizations move their siloed data from cloud-based repositories and on-premises data centers to the Data Cloud. The more customers adopt our platform, the more data can be exchanged with other Snowflake customers, partners, and data providers, enhancing the value of our platform for all users. We believe this network effect will help us drive our vision of the Data Cloud."

I fail to understand this network effect. Is there any conflation here ? How does data sharing equate to network effect. Something is fundamentally not adding up here. If I share my data with 10 other customers, it should inherently enhance my experience. How does this happen with Snowflake ?

suhel · on Aug 29, 2020

This is one hypothetical way they could capture this value:

1) Building a common platform to upload datasets by anyone. e.g. weather data, retail data, govt data, other open data, or close data (copyright etc). They gave the example of COVID cases in their S-1 doc.

2) Providing mechanism for others to find data through a marketplace; some data is free, other only via payment (with diff monetisation models, e.g. per consumption, per month). Allow other customers to consume it as & when needed. Note, based on their S-1 doc, data is never copied when shared with others, so cost is limited to share with a wide audience.

3) More data on the platform, more data is shareable in the 'marketplace' and more data used by everyone. This increases the value of the whole platform through network effects.

4) Also opens up alternative revenue streams. e.g. more revenue through storage (more data on platform from different people). and revenue from shared data that is consumed (maybe)

Here is a company that is doing something similar in Australia. https://www.datarepublic.com/solutions/use-cases/data-collab...

veritas3241 · on Aug 26, 2020

I'm a little skeptical of this as well, but I think there is a path. At my previous company we would take in a lot of data from other companies and do analysis for them. If we had a really easy way to share the transformed and analyzed data after it's been modelled in the warehouse, that really would have been great. The question is, are you going to get companies to create a snowflake account just so they can access data in this way? Maybe if it's easy to export / do further analysis.

tyingq · on Aug 24, 2020

One of the barriers for Snowflake is that while it's better than what AWS offers, very few customers start out needing everything Snowflake does. They grow into that. So they stick with AWS, hoping that the features/capabilities there grow fast enough to keep up.

afpx · on Aug 24, 2020

But, also very expensive. You can do queries on a spark cluster for tiny fractions of what they charge. But, snowflake makes things easy for the "decision makers" (who know SQL). So, all good.

jwatte · on Aug 24, 2020

Having run a medium size Spark cluster, I'm not sure I agree.

If you have 80-100% utilization for a month, perhaps, but the beauty of Snowflake is that you can spin up a 3XL warehouse for a few MINUTES to get answers fast, and then shut it down again and don't pay anything.

Saying "you could run it on self-managed Spark/Oracle/Hive/SQLite" is approximately the same argument as saying "I can run a web server cheaper myself than paying Amazon for an EC2 instance" -- there are cases where that is true, but there are many, many, cases where the "on demand capacity" is the bigger benefit.

filesystem · on Aug 25, 2020

I'm not sure how the pricing compares, but there are SaaS spark cluster offerings. Databricks is probably the biggest.

zhte415 · on Aug 25, 2020

> the beauty of Snowflake is that you can spin up a 3XL warehouse for a few MINUTES to get answers fast, and then shut it down again and don't pay anything

Is this why they're making a $350mn annual loss?

A million dollars a day loss would be a pretty big deal to me.

veritas3241 · on June 29, 2020

I recently saw youtube video on that called the cardboard box reform. It caught my eye because I hadn't heard anyone talking about it, but it really makes sense. I don't know how you sell less transparency to the public though.

veritas3241 · on June 18, 2020

Seconded! And you can build your own adapters for other databases as well.

veritas3241 · on May 24, 2020

Every time I see a post about Mongo it makes me wonder what could have been if RethinkDB was managed differently.

veritas3241 · on May 16, 2020

I've been thinking about this a lot lately and have come to a similar conclusion. I've been very focused on technical skills lately, but in doing so I've started to lose some connections in my org and network. Career growth depends more on relationships than I'd like to admit.