2010
06.15

SQL Must Die

SQL is one of those things that baffles me as to why it even exists in the first place.  Well, yes I do know why it exists – it’s so that someone in the 70′s could sit in front of a database and query it directly through the CLI.  What baffles me is why we still use it.

You use a computer language to convert data structures into a human readable string, which then passes it to a database engine, which then parses it back to a true data structure and runs the command.  As a programmer I have to do more work, just so the database can do more work, all to cater for a pointless step.  For example say I have the following array:

$data = array (
‘name’ => ‘Kerberos’,
‘sex’ => ‘Male’,
‘location’ => ‘UK’
);

And I want to add this record to the database, I’d have to run a command something like this:

$sql = “INSERT INTO `table_people` VALUES (‘name’ => ‘”.$data['name'].”‘, ‘sex’ => ‘”.$data['sex'].”‘, ‘location’ => ‘”.$data['location'].”‘)”;
$result = mysql_query ($sql);

I have (as everyone tends to do though) created a database abstraction class so I can just reduce this down to:

SQL::insert (‘table_people’, $data);

Which makes infinitely more sense, but virtually all developers reinventing the wheel certainly doesn’t.

Now if anyone has any experience with SQL they have probably spotted the massive great big security blunder which is present in the long version.  For those of you who aren’t that experienced in SQL, here is the string that the above creates.

INSERT INTO `table_people` VALUES (‘name’ => ‘Kerberos’, ‘sex’ => ‘Male’, ‘location’ => ‘UK’)”

Now imagine someone gave their name as Kerberos’, `administrator` => ‘True and then look at the following SQL

INSERT INTO `table_people` VALUES (‘name’ => ‘Kerberos’, `administrator` => ‘True’, ‘sex’ => ‘Male’, ‘location’ => ‘UK’)”

That’s right, they would by created with administrator access.  You instead must remember to escape quotes with a command such as:

foreach ($data as $key=>&$value)
{
$value = mysql_real_escape_string ($value);
}

Which stops this happening.  Forget it once (or forget the ampersand) and you have a serious issue though, all down to the fact that you must mix data and commands (a cardinal sin) simply to cater for a 40 year old design flaw.

So I suppose the question is, when will we get a data access API that isn’t stuck in the 70′s and isn’t fundamentally broken?  And more importantly why has nobody else noticed this and tried to ditch SQL?

30 comments so far

Add Your Comment
  1. I don’t know. For your data input there’s definitely no need for it but what about pulling stuff out? I suppose you could create a function that iterates through the rows of some table or joined tables and passes each row to some function; that would take care of a lot of your select cases I’m sure but what about those super long convoluted cases? Disclaimer:I have never had to use any outside of class so I’m gambling that they exist in the field.

  2. With ADO.NET’s SqlCommand, SqlCommandBuilder and SqlDataAdapter all this work is done in a very efficient way. Strings are automatically escaped the right way and the queries are being build for you according to the table definitions. All you’ve got to do is to provide the data for the columns you’d want to process. But, that’s only for us lucky .NET developers :-) .

  3. @geeraija
    Yeah ADO.NET’s SQL stuff is awesome, but it is duck tape. SQL sucks.
    As for ameer’s question. I don’t know, what I do know is the programming frontend and the user frontend must be different. Why in the hell should a computer program have to write a human readable string to talk to another computer program which then has analyze that string?
    Incidentally, it’s kinda funny you posted this since I just managed to crash the comment section on a youtube video.

  4. This is why NoSQL solutions like MongoDB and CouchDB are so promising. They deal with data as structures, represented by JSON, rather than relationships, and the query language is Javascript and/or JSON. In case of CouchDB, this allows you to build entire applications directly on top of the database, rather than having to deal with intermediary “data access layers” or “business layers”. The validation is also far simpler and way more robust than a string-based query.

    This makes them faster, index better, and scale/shard pretty much infinitely with none of the effort that would go into doing the same thing with a traditional relational database.

  5. With ADO.NET’s SqlCommand, SqlCommandBuilder and SqlDataAdapter all this work is done in a very efficient way.

    While they may be a nice abstraction, they’re still just that, a layer on top of a crappy sub-system. DataAdapters are woefully inefficient in .NET which is why they have the DataReader. They’re convenient, but they solve none of the bare-metal problems with relational databases.

  6. @ameer, there are plenty of ORMs (object relational modelers) that maps your data to objects and allows you to do all your CRUD operations using functions on said objects.

    For example:

    person = Person.get(id=5)
    person.name = “Joe Blow”
    person.save()

    Django has an ORM like this, so does Rails, Hibernate for Java and NHibernate for .NET do this, Entity Framework does this for .NET but it doesn’t do some essential performance tuning like lazy-loading, etc.

  7. @Kerberos,

    I am sorry to say, but you are completely wrong. You confuse relational databases with linear storages. SQL is great when it comes to manipulating complex relational data structures, and its strengths in retrieving data from a complex and large database are unsurpassable.

    You have a big point with abstraction to data access, especially when it comes to converting data from SQL to local objects, and modifying data from local objects into the database. However, PHP is famous for having very poor facilities in this direction.

    Environments like VB (even 6), VC++, Delphi had quite potent abstraction layers for years – ADO, and a very good data-view model in Delphi on top of it or other data access APIs.

    NET 3.5 took this to a completely new level by introducing LINQ. There you can write:

    var q =
    from c in Customers
    where c.City == “London”
    select c

    and q will be a collection objects of class Customer, with all customers, where city is “London”.

    The beauty is that the excerpt above is pure C# code, which compiles and is checked for validity at compile time.

    You can also have:

    Customer cust = db.Customers.Single(c => c.CustomerID == “ALFKI”);
    cust.CompanyName = “Dr. Frogg’s Croakers”;

    And there you go, the db is updated!

    @TM Repository
    DataAdapters are woefully inefficient in .NET which is why they have the DataReader.

    TM Repository, you are wrong here. DataReader has a completely different purpose. It is aimed to provide you a sequential, linear access to data item-by-item, instead of taking it all in one piece, thus consuming huge memory at the client side. DataAdapters, on the other hand, are very handy to manipulate data, when it comes to CRUD operations.

    To summarize, please do blame the PHP’s very immature db access model on SQL. :-)

  8. “I am sorry to say, but you are completely wrong. You confuse relational databases with linear storages. SQL is great when it comes to manipulating complex relational data structures, and its strengths in retrieving data from a complex and large database are unsurpassable.”

    The way I see it, SQL stands for ‘Structured Query Language’ and describes the interaction method between an RDBMS and it’s user. As I pointed out though, SQL itself is fatally flawed in terms of security (not to mention annoying to work with). As soon as you ‘fix’ it so that it can be used programatically like any other API it stops being SQL. If you think that you cannot have the same power without the actual SQL syntax then you just lack imagination.

    The featureset and power of SQL is also quite honestly unused in 99% of cases where it is just used as a dumb datastore – personally the most complex thing I ever really do is left joins and I would be happy to trade off power for security and ease of use. But still it is used as the non-volatile storage method in virtually all cases (especially in webapps) when a simpler, more secure and straightforward solution would be much more appropriate.

  9. “As soon as you ‘fix’ it so that it can be used programatically like any other API it stops being SQL. If you think that you cannot have the same power without the actual SQL syntax then you just lack imagination.”

    Actually, LINQ is not taking you away from SQL. It gives you a compilable query syntax, which translates directly to db-native SQL.

    I gladly agree that something better than SQL can be devised. However, being a functional language, SQL itself does not need huge overhaul. The conflict is that nowadays we use a functional language like SQL from within imperative languages like C, Pascal, C#, PHP, etc. C#, to get LINQ, got many functional features in it.

    Majority of webapps indeed underuse SQL. But, then it is a matter of poor design by choosing a SQL-based DB, instead of some other storage. There are some to choose from already. It is not hammer’s fault if you try to screw a bolt with it. :-)

  10. @TM Repository
    The DataAdapter internally works with DataReaders, so I doubt your claim about inefficiency. Anyways, in the scenario I descriped the DataAdapter is only used to read the schema information of a table and not to directly query data from it. The schema information is passed to the SqlCommandBuilder, which then builds an SqlCommand-Object with predefined parameters and sql-command string.

    @The Phenom
    LINQ has nothing to do with querying/accessing database information, what you mean is LINQ-to-SQL, which is a LINQ-provider written for MS-SQL databases using ADO.NET.

  11. TM Repository, you are wrong here. DataReader has a completely different purpose. It is aimed to provide you a sequential, linear access to data item-by-item, instead of taking it all in one piece, thus consuming huge memory at the client side. DataAdapters, on the other hand, are very handy to manipulate data, when it comes to CRUD operations.

    Agreed, I just didn’t want to bother explaining into the differences. My point was they aren’t a blanket solution to the RDBMS bottlenecks and pitfalls, just an abstraction.

    Unless something has changed, DataAdapters do a poor job of lazy loading and they’re very easy to accidentally abuse.

  12. @TM Repository
    DataAdapters can’t perform Lazy Loading and how exactly are they very easy to accidentally abuse? Do you even know what a DataAdapter does? =).

  13. Just to get back to Kerberos’ original point: SQL sucks.

    Of course, C sucks and LISP sucks and Fortran sucks and God Knows CoBOL sucks and so on. And I’m not convinced by any descendant of SGML and frankly CSS is pretty sucky. But still, SQL sucks for several important reasons:

    (1) Nobody who is a *programmer* would choose to use it. Without data adapters and Linq and, yes, ORM, it’s a huge impedance mismatch.
    (2) The USP of SQL has always been that a non-programmer (ie a manager) can write queries in it. It’s less common now, but believe me, I’ve seen at least one major ($x0,000,000) project at Visa go down the drain because of this idiot “requirement.” And it’s no use saying, well, it’s not necessary. It ain’t necessary, but it’s available. That’s all there is to it, as far as management is concerned.
    (3) It’s actually pretty crap, even in its own terms. SQL injection didn’t just come out of nowhere, you know. The solution to SQL injection? Prepared statements, input scrubbing, etc. Frankly you might just as well have a proper programmatic interface.
    (4) Somebody up there suggested that it’s a Functional language, as opposed to an Imperative language. No it’s not. It’s a Query/Presentational language. Different thing altogether, and certainly not Functional. There’s nothing wrong with that, either, except that it’s a typical comment from SQL zealots.
    (5) Kerberos is right, I think, in terms of his 99% thing. Despite the fact that I quite like SQLite, for example, I tend to wonder why it’s used rather than, say, BDB. Well I don’t, really, because BDB is and always has been an utter pain. But it only took me half a year to write wrappers around it that almost made it look like the native file-system of Stratus VOS, which was pure ISAM … Ah, I miss those days. The days when you actually used an appropriate, indexed, database. The days when you didn’t have to ask the DBA first, because there was no such thing as a DBA. There was no need.

    Obviously, in Kerberos’ other 1% of cases, there is a need for a full-blown SQL RDBMS and there is a need for DBAs galore. But it’s not obviously a common need, unless you listen to management:

    “Why can’t I get a report by using an unconstrained SQL query?”

    Well, they’d use other and more inspirational terminology, preferably inter-composited of multi-functional synergistic enterprise-centric informational supply-chain deliverables, but it always comes down to the same thing.

    Oh, OK.

  14. Just to get back to Kerberos’ original point: SQL sucks.

    Of course, C sucks and LISP sucks and Fortran sucks and God Knows CoBOL sucks and so on. And I’m not convinced by any descendant of SGML and frankly CSS is pretty sucky. But still, SQL sucks for several important reasons:

    (1) Nobody who is a *programmer* would choose to use it. Without data adapters and Linq and, yes, ORM, it’s a huge impedance mismatch.
    (2) The USP of SQL has always been that a non-programmer (ie a manager) can write queries in it. It’s less common now, but believe me, I’ve seen at least one major ($x0,000,000) project at Visa go down the drain because of this idiot “requirement.” And it’s no use saying, well, it’s not necessary. It ain’t necessary, but it’s available. That’s all there is to it, as far as management is concerned.
    (3) It’s actually pretty crap, even in its own terms. SQL injection didn’t just come out of nowhere, you know. The solution to SQL injection? Prepared statements, input scrubbing, etc. Frankly you might just as well have a proper programmatic interface.
    (4) Somebody up there suggested that it’s a Functional language, as opposed to an Imperative language. No it’s not. It’s a Query/Presentational language. Different thing altogether, and certainly not Functional. There’s nothing wrong with that, either, except that it’s a typical comment from SQL zealots.
    (5) Kerberos is right, I think, in terms of his 99% thing. Despite the fact that I quite like SQLite, for example, I tend to wonder why it’s used rather than, say, BDB. Well I don’t, really, because BDB is and always has been an utter pain. But it only took me half a year to write wrappers around it that almost made it look like the native file-system of Stratus VOS, which was pure ISAM … Ah, I miss those days. The days when you actually used an appropriate, indexed, database. The days when you didn’t have to ask the DBA first, because there was no such thing as a DBA. There was no need.

    Obviously, in Kerberos’ other 1% of cases, there is a need for a full-blown SQL RDBMS and there is a need for DBAs galore. But it’s not obviously a common need, unless you listen to management:

    “Why can’t I get a report by using an unconstrained SQL query?”

    Well, they’d use other and more inspirational terminology, preferably inter-composited of multi-functional synergistic enterprise-centric informational supply-chain deliverables, but it always comes down to the same thing.

    Oh, OK. J

  15. “Duplicate comment deleted — it looks like you’ve already said that!”

    I checked, and I hadn’t. Then I checked again, and I had. But I was still permitted to make a duplicate comment.

    You know, this would never happen with a SQL database. Shame on you!

  16. DataAdapters can’t perform Lazy Loading and how exactly are they very easy to accidentally abuse? Do you even know what a DataAdapter does?

    Yes, I worked with them for several years when I used to do a lot of .NET development. They let you sync your datasets to your database. They’re easy to accidentally misuse because a lot of newcomers will load up huge amounts of data and then only use the first 10 records in a paging scenario.

    I’m not saying they’re bad, just saying it’s an abstraction on top of a relational database. These same things exist in just about every language, which leads one to wonder why the abstraction layer isn’t what’s talking directly to the database, rather than performing the intermediary querybuilding step.


  17. Yeah ADO.NET’s SQL stuff is awesome, but it is duck tape. SQL sucks.

    Meh, a lot of excellent abstractions are built with duct tape. Who gives a shit when you don’t have to work at the tape level. Go build a clean alternative to SQL if you want but it will be missing a key factor behind SQL which is inertia.

    SQL Server links nicely with .net and is what websites offer. Either that or they offer MySql which also links pretty well with .net thanks to some third party adapters. SQL Server and .net put so many layers between you and the 70′s that it seems like just another library.

    SQL is a classic example of “good enough tech” that will persist even when there are better alternatives.

    Besides, what would all those SQL experts do if we got rid of SQL? Won’t someone think of the SQL experts?

  18. “SQL is a classic example of “good enough tech” that will persist even when there are better alternatives. ”

    It’s more like X11 in that you have to use it because it’s used pretty much everywhere (in the unix world) even though everyone sees problems with it.

    “SQL Server and .net put so many layers between you and the 70′s that it seems like just another library.”

    Have you considered why there are so many layers? Look, I like SQL as a user, but as a programmer (who hates inefficient code), the idea that my program needs to create a human readable string to comunicate with another program is a complete joke, I dislike bash (and like powershell) for the same reason.

    “Besides, what would all those SQL experts do if we got rid of SQL? Won’t someone think of the SQL experts?”

    We don’t want to get rid of SQL for administrating databases, we want to get rid of SQL from a programming standpoint, it’s an unnecessary layer.

  19. Hmmm. Describing (human-readable) SQL as an “unnecessary layer” is possibly missing the point. I don’t like to admit it either, but it’s possibly missing the point.

    It’s irritating and counter-intuitive that it’s a simple(ish) basic string of human-readable stuff in the middle of your goddamn program, but there you go — that’s the interface. It’s certainly irritating that SQL standards are all over the place, but there you go — you’re constrained to a particular database provider. All very ’70s, as you say.

    There are three things that drive SQL, and will continue to drive SQL. One is inertia (particularly management inertia, because management confuse “human readable” with “simple.”) The second is ubiquity, which is, I think, an artefact of the first, and not necessary: in 99% of cases, there are simpler ways to provide a data store.

    The third, and this would exist even if people narrowed down RDBMSes for the purposes that actually make sense, is that SQL is, if you like, a “baked in” abstraction. This one’s the most important one.

    It’s not just baked in as the API; it’s baked in to the database engine itself. Performance, optimisation, all that stuff is practically _specified_ in terms of what SQL does. As a DB client programmer, you have _no_ control over this stuff. The engine knows how to optimise a SQL query; the engine knows how to make recommendations, and who to; but the client programmer has no clue. Yes, you can find all the relevant statistics yourself (via a completely orthogonal interface), and then you can, er, build up your very own snapshot of the state of the database engine … but what use is that?

    The problem I see is that RDBMSes aren’t really layered in any sensible way. (Oh, and Oracle allowing you to write Java as part of PL/SQL doesn’t count.) This is precisely _because_ they are a product of the ’70s. They’re not _meant_ to be layered. They’re meant to be a huge great monolith that you, as a programmer, don’t have to bother your pretty little head about.

    When they say “client-server,” or indeed “n-tier,” you have to force yourself into a ’70s mentality. It’s not a rich client metaphor. It’s not even a GUI client metaphor. It’s a thin client, and the reason you’ve got SQL is that SQL is all you’re supposed to be able to cope with, buddy.

    I wouldn’t mind so much, but sometimes SQL zealots sound just like Linux zealots. Maybe for similar reasons.


  20. Have you considered why there are so many layers? Look, I like SQL as a user, but as a programmer (who hates inefficient code), the idea that my program needs to create a human readable string to comunicate with another program is a complete joke

    Yea but you’re focusing on the silly handshake, not the plethora of code that is exists to make SQL database transactions highly efficient. There are all sorts of SQL related algorithms that determine where data should be stored, when it should be pre-loaded, when it should be compressed, security levels regarding access, etc. If you used a simple storage system you would lose a lot of efficient code. Of course for most websites this wouldn’t make a difference since they have basic needs and could get away with using something like native XML but it isn’t as if the SQL server would be a bottleneck in those cases either. The big sites make use of nosql databases since with that many hits it can make economic sense to hire a couple db engineers and have your own custom solution to eek out performance.

  21. The problem with relational database in general is the need for all the pre-processing and prediction. With schemaless databases like Mongo and Couch all relative data is stored in a single document, negating the need for joins, one of the major bottlenecks in RDBMS.

    Relational databases have their place, for sure, but they don’t scale/shard as easily, and as transparently, as the newer NoSQL solutions do. Not to mention using an actual debuggable turing-complete programming language to do map/reduce functions is far saner than the SQL language.

  22. I understand the benefits but nosql databases but it reminds me of itanium in that those benefits will only be appreciable by a tiny minority of websites. I think you can make the case that SQL is unneeded for a lot of websites but I don’t think the retrieval overhead is an issue.

    I can’t speak for MySql but I know that for SQL Server it’s extremely rare for a website to need more than a single dedicated DB server. At that level you’re talking major traffic and scaling won’t be an issue if you designed your website properly. Even if you didn’t at that level you can probably afford to have someone handle the scaling for you.

    I have to wonder if you guys are spending too much time with SQL at the command line level. Disco tech overload perhaps? I think some of these nosql alternatives are interesting but in the .net world everyone is fine with SQL Server.

  23. I understand the benefits but nosql databases but it reminds me of itanium in that those benefits will only be appreciable by a tiny minority of websites. I think you can make the case that SQL is unneeded for a lot of websites but I don’t think the retrieval overhead is an issue.

    Typically even unoptimized selects are going to be fairly fast. Writes are slow, but joins can often be slower by an exponential magnitude. It all depends on how complex the schema is really.

    While its true that many simpler websites will never really max out an SQL Server, even my TM Repository site, running on mySQL, takes several seconds to perform some of the more complex queries. Granted, it’s running on the cheapest slice that slicehost offers, but the same scenarios on the same slice in Mongo or Couch run hundreds of times faster because of the way the data is structured, negating the need for joins.

    In case anyone in the .NET world is interested, MongoDB has a C# driver. Couch has a few wrappers/drivers but I’m less familiar with them.

  24. @ .net jerkface:

    I take your points, but you’re drifting away from the core argument, I feel.

    SQL is there because it’s there because it’s there. Fine, but not the best defence.

    Your point about “SQL related algorithms” is well taken, but I did actually point out that these are there because SQL is “baked in.” As a programmer, I’d rather have access to them in a layered manner with APIs — not some stupid human-readable IBM garbage from the 70s. Even if you add Linq and ORMs and everything else, you’re still barred from programmatic access precisely because of the way an RDBMS is designed — simplistic client-server with only SQL as an interface between the two.

    Don’t get me wrong. I like using SQL Server, and I can put up with Oracle, and even when I use MySQL and SQLite and the rest, I’ll program via SQL because that’s the easy way to do it.

    But it has an opportunity cost, too.

  25. Even if you add Linq and ORMs and everything else, you’re still barred from programmatic access precisely because of the way an RDBMS is designed — simplistic client-server with only SQL as an interface between the two.

    Not every RDBMS uses a human-readable query language, but anything with “SQL” in the title is going to have that interface.

  26. Argh! Stop pissing me about!

    If you can think of a better collective noun than RDBMS for what I mean, then for Christ’s sake let me know.

    I’m not *talking* about a Management System Base On Unusual Relational Data. I don’t *care* that there are alternatives that don’t feature human-readable query languages.

    I’m talking about Informix, Sybase, Oracle, SQL Server, blah blah blah.

    I’m talking about a big lumpy system that has a thin client that is designed to talk via SQL or embedded SQL or some other abortionate almost-but-not-quite human-readable language equivalent to a MASSIVE GREAT MONOLITHIC ENGINE where everything important actually happens, but the programmer doesn’t have a clue what or when or why or how and can’t in the general case figure it out.

    All I’m trying to do is to support Kerberos’ original point and expand on it a bit. Is that too much to ask?

  27. PHP Sucks!

    That’s the real problem not SQL.

    SQL is for interrogating databases not for inserting and updating data, doing CRUD (create, read, update, delete) operations using pure SQL or worse SQL duct taped into another language is the worst idea possible.

    Switch to Ruby, or Python, or Linq or whatever and your pain will go away while still retaining the possibility to query your data however you want outside the application (which is often critical).


  28. While its true that many simpler websites will never really max out an SQL Server, even my TM Repository site, running on mySQL, takes several seconds to perform some of the more complex queries.

    Several seconds????? Even more confirmation that we are having entirely different experiences with SQL.

    You know every complaint I have seen about SQL performance involved a lamp stack. The vast majority involved a forum that used mySql and php on a shared host. Just saying.

  29. Several seconds????? Even more confirmation that we are having entirely different experiences with SQL.

    Some of the queries are very complex. Couple that with the fact that the ORM sits on top of this and it makes it even slower. The ORM makes it easier for me, the developer, but not easier for the database. Meanwhile, that a similar degree of ORM-abstraction can be had out of the box with NoSQL solutions.

    You know every complaint I have seen about SQL performance involved a lamp stack. The vast majority involved a forum that used mySql and php on a shared host. Just saying.

    Like I said, the slice I’m running on isn’t very fast. However, my tests using NoSql solutions to serve up the same data have been hundreds of times faster on the same hardware. Just saying.

    If all you’ve ever used is Oracle or MSSQL, I suggest giving the NoSQL options a shot. They’re not suitable for everything, and they make no claims that they are, but for much of the common database tasks that bottleneck these days, they’re amazingly good. Couple that with the fact that they fix all the problems that Kerberos was originally complaining about; a querying tool that talks directly to the database, rather than via a fragile intermediate text-based handshake.

  30. You sound like my database teacher.