I'm glad people are realizing cgi-bin was the right model all along, but I'm sad it took ten years in the wilderness to get there.
All I want is user-isolated (suexec) shared web hosting on scalable web servers and SQL servers that someone else runs for me, where the extent of what I need to do is put PHP/Python/shell/whatever files in a directory on some network filesystem, and optionally wire up FastCGI if I want. It works well, and it's been doable for years because I used to run infrastructure like this. In the modern era you can even imagine extending suexec to create containers and not just switch user IDs if you want more isolation - as long as you have the container image locally, the individual steps of making a container (make some namespaces, make some cgroups, set a seccomp policy) shouldn't be noticeably more overhead than exec / process startup itself, and, again, FastCGI is an option.
That's "serverless" just as much as anything else is (i.e., the person running the website doesn't need to think about launching and operating servers, the platform takes care of it) and it's way simpler than using all these fancy tools on top.
I definitely feel vindicated in my long-standing feeling that every server-side software deployment story I've seen since that of my early Perl and PHP work has been way, way worse than those were. The best anything else has managed is "almost as simple to use and and reliable, but much more complex".
It still applies now. e.g. it's still trivial to deploy a ruby/rails app to a server (with rollback potential) using Capistrano. Capistrano itself just being a DSL over a shell script that runs scp and a few remote commands.
We're still doing that in the traditional web hosting industry.. while many are using cPanel and just scaling out X customers per server at my company since 2007 and earlier we've done this using linux and load balancers for a long time for both web and email.
Unfortunately however this mid-game between "traditional" and "scalable" makes it difficult to get people to move to a few things that would help which includes always using S3 so we don't need a central NFS server (which you lose a lot of performance to, even if it was just for some folders like the main code directories). And the reality is the majority of our customer base is on wordpress... and there are a lot of terrible wordpress developers which lead to all sorts of fun stuff.
Also PHP in general reloading the entire code base every page load is stable though somewhat intensive compared to a Rails-style application for people that want really high hit-rate sites but don't want to make everything static - for example someone with a shopping cart. (We have had customers exactly like that, wanting to hit a badly designed wordpress site with thousands of people at once in a rush sale).
So it is a thing.. but there are challenges. And honestly, surprisingly, getting people to move the needle a little from their basic cpanel setups is surprisingly almost harder than just changing platforms entirely because you have to educate people on how you want to be different to everyone else rather than selling a fun new solution.
I run an architecture that's not terribly different from this. Unfortunately, the current hotness -- php-fpm -- doesn't work with suexec. The PHP pools are easily configured to run under nonprivileged user accounts but suexec provided a few other nice features that are no longer available.
Scalable no-downtime MySQL is still a total bitch though.
The past few years of nodejs, microservices and cloud development has been a fast-forward from 90s (or even 60s) technologies and ideas.
I feel like everything that was cool in 2010 has reached the point where its functionality either converged with "old tech" or just died because there were too many problems with it.
Examples:
- JSON wanted to be simpler than XML. Now it has schemas, JsonPath, comments (but only supported by some parsers), and is still pretty verbose and has way too few data types (datetime anyone?)
- MongoDB wanted to be a document DB with shards and eventual consistency, now it also supports ACID-like operations
- JavaScript was cool because it was untyped, now TypeScript is the new cool kid on the block. It's only a matter of time until untyped programming is the new hype again
- SemVer got somehow rediscovered
- People made fun of Maven, nowadays everybody and their mother are pulling directly from npmjs.org
The detour into non-SQL databases I find particularly hilarious, because as a greybeard I was sitting on the sidelines the whole time muttering "it'll never work". Now they all seem to have SQL interfaces for some reason.
This is interesting, but having been down the "serverless" road a number of times in different languages (mostly node.js and ruby) I've found it to be more complex to manage all the services rather than just using RDS and a container orchestrator (these days I'm all in on K8s. I'm happy to explain why if there is interest, but I'm not advocating using it for everything so plz don't flame me) and all the framework features of my chosen platform (currently in love with Elixir/Phoenix).
It also wouldn't help much with the primary PHP challenge I've run across, that of a stateful Wordpress box (and to be fair, the article doesn't claim to help with that. It assumes a 12-factor app). Most PHP I've had to deal with is not stateless, thus making horizontal scaling a challenge. Luckily you can scale pretty far vertically with PHP as long as you haven't made Big O mistakes.
What do you mean by "Most PHP ... is not stateless"?
Usually, the state of PHP app is in the DB.
Other "state data" are:
1) sessions. By default saved in files, can be moved to DB, making PHP stateless
2) storage (i.e. user's attachments). To decouple from PHP server, - need to switch to object storage service.
So, it's not hard.
But most popular PHP application - WordPress if failing to be stateless, because of number 2: it uses local storage for plugins, media, user's data.
> To decouple from PHP server, - need to switch to object storage service.... So, it's not hard.
If you are the developer of the app, that's true. If you're the ops guy to whom the app was lobbed over the wall to, it's not quite that simple without making code changes to the app. In the case of Wordpress also, it's been my experience that many of the "developers" are just marketing people and don't actually write code. These people are not equipped to remove state from the app either.
Aurora Serverless is an absolute nightmare to use from the perspective of Lambda. It’s easy to flood the database with Lambda connections and the database layer doesn’t scale automatically as described. So the result is that Aurora Serverless returns max connections errors.
The RDS Proxy isn’t GA yet and has huge disclaimers on using it in production.
The only alternative is to use the RDS Data API which requires sending raw SQL queries over HTTP.
If you’re using an ORM getting the actual bound SQL query from the database request can be a bear.
And to make matters worse the RDS Data API doesn’t return JSON results with the tables column names. It puts the column names in a separate key that requires developers to map to generic column names using the index of the value.
AWS should be embarrassed about Aurora Serverless. No question in my mind.
All "elastically scalable / autoscaling" cloud services (e.g. S3, ELB/ALB, Aurora Serverless, etc.) are ultimately composed of application code running on a finite number of machines thereby capable of serving a finite throughput of load. Excess load must be temporarily shed while additional machines are spun up. You can easily observe this behavior by running load tests on these services.
Separately, all applications don't scale linearly with the volume of open connections. There's typically a sweet spot of open connections that provides maximum throughput. Exceeding this sweet spot will actually reduce aggregate throughput. To the extent that serverless applications are stateless and are unable to pool connections as well as a stateful application, you should expect to see the volume of connections to grow proportional to load, potentially knocking over the backing DB (or in Aurora Stateless's case, returning max connections errors).
It is best called "walletscaling." To run anything at scale you are looking at reserved capacity and provisioned throughput. Add extra fees for DynamoDB Accelerator (query caching), Global Tables (cross region replication), backups charged per GB, etc.
Most of the required addons aren't in the "cloud model" where you pay for usage, but instead you pay to have them on regardless of usage.
DynamoDB costs you nothing when its not in use. It only charges based on actual reads and writes performed. As opposed to a traditional Relational database that always has to be on and has to be scaled up vertically to the maximum anticipated load permanently and with a read replica if you really want redundancy.
DynamoDB gives you redundancy out of the box (your tables are replicated across the three availability zones in a region), the scale is available to you if you have sudden traffic in on demand mode or you can set a limit if you wish to manage costs; your queries may receive errors about being throttled at some point if you approach those limits.
For OLTP workloads, DynamoDB (and a lot of other NoSQL-style, cluster based databases) cannot be beat for performance, capacity, scalability and costs. Which is exactly what you want on the front line of a workload that can receive large amounts of traffic.
For OLAP workloads with unknown query patterns across a variable set of data that can change over time and large table scans, a relational database is king because the actual volume of traffic is low but the size of queries are a lot larger usually.
You don't need to use DynamoDB Accelerator though, and it provides value you won't automatically get for free by using another database. If you don't use it you will be managing your own redis/memcache instance with all cache invalidation logic.
> Global Tables (cross region replication),
Again, you don't get this for free in any other DB. Setting up your own multi master cross regional database is not free.
Dynamo DB works fine without any of the above two features.
>It is best called "walletscaling.
And which DB out of curiosity scales without any load on your wallet? What mythical DB can one run which needs neither horizontal nor vertical scaling, thus not impacting the wallet.
> And which DB out of curiosity scales without any load on your wallet?
BigQuery includes all these things "for free" in the base price. It is pretty straightforward, with a price for storage and a price for querying (which you can choose to be usage or fixed). DynamoDB started off the same way, and added nickels and times by the roll.
You can't be serious when comparing BigQuery to DynamoDB. BigQuery is an OLAP database, you can't use it for OLTP workloads. And BigQuery has no equivalents to DAX(very low read latencies), Global Tables(millisecond access to data globally). If you are using BigQuery as an OLTP database with reads and writes going in simultaneously, more power to you.
Autoscaling is monstrously good on DynamoDB. You can turn on on demand mode so that it will scale up to the sheer limits of DynamoDB itself, but you pay per usage. So if you do want to set some kind of limit based on affordability you can set limits as to how much the tables scale.
As to how that autoscaling performs, its instant and always available so you have nothing to worry about there. Rather spend your time focussed on optimisation of queries than managing the scalability of your datastore which is as it should be.
It's a little odd -- Aurora Serverless seems to have received next to no updates for 2 years. It still only supports MySQL 5.6, for example.
I'm not sure why this product hasn't become their premier RDS offering, it looks like it has the foundation to offload a good deal of operational complexity.
The "Aurora Serverless" branding also applies to Aurora PG 10.7 edition, which was made publicly available in July last year, so I'm not quite sure what 'no update' you are talking about.
Have they figured out how to make API Gateway not cost an arm and a leg yet? I absolutely adore Lambda for side projects, but every time I wire up API Gateway I'm worried I'm going to saddle myself with a huge bill.
API Gateway came out with a cheaper variant (called HTTP APIs, admittedly a poor name) that is $1 per 300 million requests (114 requests per second for a month): https://aws.amazon.com/api-gateway/pricing/
Looks like it costs $1 per 1 million HTTP calls. That does not sound very costly if you are talking about just side projects which may not generate close to a million requests?
The service is a bit pricey at $39/mo, but given that it appears to be geared toward enterprise/startups/teams and not individual/end users, I guess that makes sense.
I know this is not what's happening here, but I just love the idea of a MySQL function where it spins up a new instance for every connection and promptly throws away the data after executing.
You can somewhat accomplish this with SQLite stored in S3.
Zappa (python) has a deployment configuration that allows this. It's basically a Lambda that keeps itself alive all the time and for each request, fetches the SQLite DB from S3, does its transaction, and then puts the modified database back on S3.
The upside is it's basically free for low traffic read-only apps, the downside is the obvious problem of write conflicts if you have more than one write-capable user at any given time.
If you were to use the django test framework to generate a new SQLite DB on each request, you'd have what you're talking about.
I've been contemplating pushing SQLite data files to my Lambda function via a custom layer. Individual executions can update within the scope of their execution, but you'd only get an update by pushing a new layer.
One fewer network hop compared to DynamoDB, and for something that might get an update once a week or even once a month I get low latency without having to oversubscribe to another service.
From playing with zappa/django and SQLite on S3, the first page load latency is still a thing even if the one Lambda is always alive, and I don't know why but have my suspicions. I didn't bother to performance test it much other than observation, since I was just using it for development to build an SES email newsletter system.
Comparing a read of a database driven app (going to the URL and getting the admin login in Django via Lambda/SQLite/S3) to an async javascript submission (submitting a form on a cloudfront static site that POSTs to DynamoDB), the javascript/DynamoDB round trip is faster by about a full second.
I suspect it's because of the simple bulk of the Django deployment. Putting the whole bundle on a Lambda with all of its dependencies was about 45-46 megs of crap, whereas a simple Node DynamoDB insert is a couple dozen lines.
So ultimately, while spiffy to play with, I didn't bother to use it much due to performance. Although it has been awhile, I heard that Amazon made some Lambda changes recently to address initial wake-up request performance.
The whole spiel of AWS (and GCP, Azure, etc.) is to hide the costs behind pricing tiers and service categories. You cannot pay a single price per service but you need to pay for bandwidth, requests, and disk space or computing time separately for each service.
Even AWS salespeople will not give you a price because it heavily depends on your usecase.
All cloud providers have calculator tools to give you a feeling about being informed but you can theorize as much as you want, reality will be different. There are just too many variables to consider.
I think for a lot of use-cases, a $5 droplet in DigitalOcean will be cheaper than paying for all these cloud APIs separately. (Although AWS does provide free-tier for many APIs your first year).
Now that I can afford more than $5 for side projects, I am looking for things like ease of deployment, reduced burden of maintenance etc. IMO serverless gives me this (once you have a proper setup of course). There are no machines to login to, no need to worry about patching hosts etc.
There was a time when my wordpress blog (RIP) used to get hacked every 6 months. A purely serverless model reduces the surface area for attacks.
I guess what I am trying to say is that for me personally, serverless is not about the immediate infra cost, but the TCO long term. And if/when your side projects take off, you can always go back to bare-metal servers as required.
All I want is user-isolated (suexec) shared web hosting on scalable web servers and SQL servers that someone else runs for me, where the extent of what I need to do is put PHP/Python/shell/whatever files in a directory on some network filesystem, and optionally wire up FastCGI if I want. It works well, and it's been doable for years because I used to run infrastructure like this. In the modern era you can even imagine extending suexec to create containers and not just switch user IDs if you want more isolation - as long as you have the container image locally, the individual steps of making a container (make some namespaces, make some cgroups, set a seccomp policy) shouldn't be noticeably more overhead than exec / process startup itself, and, again, FastCGI is an option.
That's "serverless" just as much as anything else is (i.e., the person running the website doesn't need to think about launching and operating servers, the platform takes care of it) and it's way simpler than using all these fancy tools on top.