?

Log in

No account? Create an account

The One True Relational Database Model

« previous entry | next entry »
Aug. 12th, 2004 | 07:47 pm
mood: amusedamused
music: Tom "T-Bone" Stankus - Existential Blues

Microsoft has a relational database on the front burner for a future version of Windows. Personally, I think they're barking up the wrong tree. If they spent more time building websites they'd know that hierarchical models with very tight scripting connections offer more performance and a higher level application model. Relational databases are good for factories and stores. Object databases map the model of the Web. Just change the slashes to dots and off you go. - scripting.com

Heirarchical databases and Object databases are arguably subsets of Relational databases. If you think your programming effort needs a 'higher level application model', you should go ahead and use one -- built on top of a relational database. If you have some data that you need to manage professionally, you need the structure and reinforcement that only the relational model can provide. If that means you need to pay me to build you one, then so be it.

If there's a better model than the Relational one out there, I want it caught and shot now to hear about it. Maybe what I mean isn't better model, it's better formalism. Sure, use a flat text file to manage your data. That's better for some projects, some purposes. But it will catch up with you (or better yet, someone else) as the project ages. (For your own sake, recognize when that starts to happen, and re-engineer the project.)

If you go through the excercise of designing a relational database for your data, the nicest thing is, your data will stay that way! almost self-organizing, and it's easy to discover new uses (new ways to use your old data)!

Heirarchical databases collapse under their own inflexibility.

Object databases look pretty but tempt you into omitting some relations that you later have to back-fill using "business rules".

Link | Leave a comment |

Comments {26}

TroyToy

(no subject)

from: troyworks
date: Aug. 12th, 2004 06:23 pm (UTC)
Link

i don't understand your problems with OODB's. what do you mean by back fill.

They operate as giant virtual memory managers, anything you do in code, be it list, tree, graph, pointers can be done in OODB's, and if that colocation of related information (say an object serialized to disk) is more useful than add hoc assembly of the attributes stored in tables it can add orders of magnitude in performance increases.

Reply | Thread

Triple Entendre

the pure thingness of the things

from: triple_entendre
date: Aug. 12th, 2004 06:59 pm (UTC)
Link

I should clarify that I'm talking about data-driven design. That's where most of what your project concerns is storage, retrieval, and analysis of details about... things. entities. (I happen to think that nearly all programming should be data-driven, but that's a polemic for some other LJ post. :-))

Sounds like an OODB should be considered an implementation layer rather than a database.

Serializing an object to disk is a noble cause, but one that has nothing to do with the formal process of (nor benefits of) "database design".

You take the data and normalize it into a relational database first. There's no better way to learn about your content, and saves weeks of heartache later, believe me. A normalized database is like a mathematical proof of what you're trying to do!

If you must, take that data and denormalize it (aka "write a query") and wrap it up/re-engineer it into an object. That's fine, even if that's it and the end product never uses a relational data store. But understand, to the extent that your "object" has any design to it, it's an expression of relations that should be able to map back into an RDBMS. Otherwise, watch it grow hairs and get all gnarly, all hints of the original design lost.

Modeling those trees and pointers, etc., can be neatly (and more concisely!) done in an RDBMS. Prove me wrong, I'd love to find a new toy that works better.

One nice thing about good relational design is there's almost no code. Just the pure thingness of the things and how they are related. If you're implementing a database and you find yourself writing tons of "stored procedures" -- think again, your design may be broken, and now you're spackling over the cracks.

You can use anything to do anything, but that don't make it right.

Reply | Parent | Thread

TroyToy

Re: the pure thingness of the things

from: troyworks
date: Aug. 12th, 2004 07:16 pm (UTC)
Link

OODB's offer the transactional semantics thus are like a database.

normalization is counter indicated when the data is only normalizable into rather small fields. E.g. my graph network data (also how I'm storing rdf, semantic and social relationships) could be stored in two primary tables of nodes and links, but this would be a nightmare reconstructing the data via a query, and thus any beauty gained by normalization is way way way overshadowed by the crappy performance of any relational approach.

I think it depends on how you plan to model the application, I'm sold on useing OOP, and 3 tier (or more) approaches. If your putting all the logic into SQL then your suggestion might be fine.

I can use an OODB, without any object to relational mapping save 30% of the code, 40% of the maintenance and depending on how bad the OO to relational gap is increase performance in ways that are physically impossible to do with relational databases due to how expensive join operations are. OODB's cluster frequently accessed data per disk page..you can't get any more efficent in that.

Reply | Parent | Thread

truth without proof

Re: the pure thingness of the things

from: chronicfreetime
date: Aug. 13th, 2004 08:39 am (UTC)
Link

OODB's offer the transactional semantics thus are like a database.

I've never found multi-statement transactions inside the database to be very helpful. But a relational database with no concurrency features would still be quite useful. So I don't see this as a defining feature.

normalization is counter indicated when the data is only normalizable into rather small fields.

On the contrary, if you can do that, it sounds like your data is an especially good fit for the relational model. Using a relational database for trees and graphs is somewhat clumsy, but I think this is mostly due to the expressive limitations of SQL.

increase performance in ways that are physically impossible to do with relational databases due to how expensive join operations are

There is nothing about the relational model which requires data to be stored on disk in a certain way. In the same way ReiserFS makes it possible to efficiently put 1e6 tiny files in one directory, I'm sure it is possible to build an RDBMS which would perform well on the types of queries your application makes.

Reply | Parent | Thread

TroyToy

Re: the pure thingness of the things

from: troyworks
date: Aug. 13th, 2004 09:28 am (UTC)
Link

Rollbacks are highly useful aspect of transactions, a virtual memory manager is also extrememly useful without being a full blown db.

Relational databases *could* store things in a physically collocated structure (typically done by select into), but they don't be default (scaling is typically done by caching where this does happen) because that would defeat the purpose for them in being flexible ad hoc queries.

You've not addressed the OO to relational mapping headaches this alone in enough for me. Also there are already relationships defined by the with OO modelling: Class Inheritance, Composition, etc. Why would one voluntarily continually try to keep them with those provided with RDMS/SQL tables, and rich extensible types of programming languages with a completely different type system? If you can perform queryies similar to collection iterators in one language in memory, why introduce another one or 2 (SQL, stored procedures) when memory is basically a memory mapped drive and the persistence is somewhat transparent. It all seems like lots of extra unnecessary work.

This complaint is particular to my application, which stores lots of large graph structures. From a programmers perspective your right it's pure and clean table, but from a useage standpoint it's evil:

If you're looking at storing trees and graphs in a database, you have to have some degree of closure -the visibilty from a given node (typically done with a tree grouping id). The larger the tree or graph is the more impractical this becomes as one isn't interested in the entire graph, just a small segment of it. To recreate this from any given node is done by iterative selects. e.g. give me node5's children, for all of those children give me the other children, etc.

This has larger applicability, if a user's myriad of contact information, and name, and is always stored on the same disk page as the name, and always being accessed in that method,it makes more sense to keep it there instead of having to go to multiple tables to reconstruct it.

Reply | Parent | Thread

Triple Entendre

Re: the pure thingness of the things

from: triple_entendre
date: Aug. 13th, 2004 10:44 am (UTC)
Link

If you're allowing yourself to engage in projects that don't include a design phase as the very first... then you and the client both get what you deserve. Might as well go home at 5pm and keep YOUR sanity, the ship dates will slip regardless....

Relational databases *could* store things in a physically collocated structure

They're called indexes, and they can be and often are physical. In MSSQL tables there's always at least one. An index can encompass one or many fields. If you happen to know the best clustered index for your purpose, you can tell it so, and that's how it will be stored physically on the disk. But a well-designed schema usually will be fast enough that you don't need to second-guess the database engine; it works hard so you don't have to. Modern RDBMSes do a very good job of optimising -- unless you try to use them as big, black, featureless boxes. Understand your tools.

(typically done by select into), but they don't be default (scaling is typically done by caching where this does happen) because that would defeat the purpose for them in being flexible ad hoc queries.

There's almost never a reason to 'select into' in a data-driven database application. You might be thinking of data warehousing, a monstrous denormalization process which has its own misguided flavors of wrongness, but a few specific appropriate uses. Reporting of historical corporate financial data, for example.

You've not addressed the OO to relational mapping headaches this alone in enough for me.

Those incipient headaches probably indicate things that don't need doing! If your specifications mandate the use of a particular OODB tool, or if for whatever reason you have to do things a certain way, then sure, that's a problem. Not one that would stop me, but that's because I've been mapping things across systems for a decade now, and I don't find it achy.

If I can tell you anything, it is this; you are almost certainly doing one of two things:

1) doing extra design work in order to solve problems that have already been solved by RDBMSes and relational theory
OR
2) not doing that design work (which would have been implicit and easy, perhaps even automated, if relational theory were applied) and doing extra work coding around the problems that result, probably late at night and weekends.

Part of the motivation for all my work is to save people from unnecessary effort.

Solved problems. These are solved problems, I swear it.

Rhetorical:
Why is it that every three to five years we invent some new technology that could be really great, except the designers don't carry over any of what was learned before? "But this one goes up to eleven?"

-Trip

p.s. I'm not editing these much, in the interest of being comprehensive and responsive. I hope that this is useful.

p.p.s. Despite what one might think from my tone, my experience is wholly practical, not academic. at all.

p.p.p.s. On quick rereading this sounds pretty ranty and harsh at you all personal. S'all love, I promise.

Reply | Parent | Thread

DeepTape

Re: the pure thingness of the things

from: deeptape
date: Aug. 13th, 2004 11:07 am (UTC)
Link

Sounds like you guys are talking apples and oranges.

Trip, where did you get the idea troyworks didn't do any design work up front?

Sounds like Troy is complaining about the evil of SQL, the poor match of data types in the object space to data types in the traditional relational space, and the effort required to hack them together.

Reply | Parent | Thread

Triple Entendre

the pure appleness of the oranges

from: triple_entendre
date: Aug. 13th, 2004 12:00 pm (UTC)
Link

That's just me not editing. When I'm talking like this and I say "you", I mostly mean "that straw man standing next to you".

I did say that it was a rant, which is asking forgiveness in advance, but even then I'm usually better at editing out my rhetorical abuses of fallacy. I apologize for any confusion to the readers and to troyworks.

Troyworks is very experienced and, I am sure, talented at application development using oranges. But he was dissin' on them apples!

Reply | Parent | Thread

TroyToy

Re: the pure thingness of the things

from: troyworks
date: Aug. 13th, 2004 12:24 pm (UTC)
Link

First this is my own project so I'm not driven by any requests other than to do what I think is the best: less code = faster development, less maintenance, oodb's = faster runtime performance.

1) doing extra design work in order to solve problems that have already been solved by RDBMSes and relational theory

nope less design work overall, I store things as objects, if needed I query things like objects collections, the database does the heavy lifting. If i want set like operations for ad hoc like queries (mimicking join operations) I use a persistent data structure that's appropriate for things like that. Normally in a correctly designed model, everything is done via navigation (think of pointer references on a massive virtual memory space).

2) not doing that design work (which would have been implicit and easy, perhaps even automated,

Nope again. I model objects that represent my problem domain, postprocess them (and automated process to make them persistence capable) and they are the same objects I retrieve from the database, with relations perserved. I model once, not twice. -the data aspects of my Class Heirarchy IS the database schema.

Normalization isn't a catch all, give me 50 users with 1 phone number (single table), and 15 with 10 (two tables with cost of join), it becomes hard to make a model with works for everyone in an optimal fashion.

RDMBS's are built ontop of code..just like middlewear, which all are built onto standard data structures, and these are accessilbe inside the database and not, so I'm not recreating the wheel.

I think we may be arguing tools for different purposes.

How many tiers are your application? HTML with SQL embedded?
How frequenty is the same data accessed together, e.g. customer, contact info.

If your saying RDBM's aren't faithful representations, and relational theory isn't availalbe in other forms of code, I guess I don't understand what you mean by the term. Got some links?

OODB's are NOT new just most people don't understand them very well. They are used for MASSIVE databases (stanford particle accelerator) with astounding transactions (stock market, telecommunications)..who could all use Oracle if they wanted to, but don't for a reason, it doesn't work. If we are talking about any other site (say live journal) it's possible to go either way with different tradeoffs.

ps1: just the way I like it, and indeed it's useful, even if I don't think we are seeing eye to eye.
ps2: likewise.
p23: likewise.

Reply | Parent | Thread

truth without proof

Re: the pure thingness of the things

from: chronicfreetime
date: Aug. 13th, 2004 11:10 am (UTC)
Link

You've not addressed the OO to relational mapping

Databases are for storing data, not behavior. If you can't easily pull some arbitrary tuple of primitive types into your program, I say the fault lies with your choice of language. And associating an in-memory object with each row of each table undermines the utility of a relational database. You can't run a sensible ad hoc query on something you use as backing store for your heap.

I think OO is mostly snake oil, and an unhelpful way of thinking about most programming tasks. But that's another argument. If all you want to do is serialize objects to disk, go ahead, but don't complain that a relational database is a poor tool for that. A relational database is meant to be application independent, because data is immortal.

This has larger applicability, if a user's myriad of contact information, and name, and is always stored on the same disk page as the name, and always being accessed in that method,it makes more sense to keep it there instead of having to go to multiple tables to reconstruct it.

Again, tables are a logical organization, not necessarily a physical one. Throwing away application independence and relational integrity is not the only way to improve performance.

Reply | Parent | Thread

Triple Entendre

"arbitrary tuple" rolls quite sweetly on the tongue

from: triple_entendre
date: Aug. 13th, 2004 11:30 am (UTC)
Link

Neatly put, and right on target. Also, far more informative than my ranting so far. Thank you for that clarity.

I should go take a nap. :)

Reply | Parent | Thread

TroyToy

Re: the pure thingness of the things

from: troyworks
date: Aug. 13th, 2004 12:38 pm (UTC)
Link

Databases are for storing data, not behavior.
A relational database is meant to be application independent, because data is immortal.

an OODB does exactly that, it doesn't store the method information of an object, so that is free to change. Many of the larger OODB's have interfaces with Java, C, .Net. so the data lives on...

Conceptually if you stripped away methods, objects=tables, pointers and composition = table joins. The need to associate an object with anything at all is foreign to me.

We clearly are programming on the opposite ends of the spectrum. OOP is critical to developing clean applications (especially clientside rich user interfaces), and compoenents across multiple providers.

I think a key to our cognitive dissonances ad-hoc query aspect. Ad-hoc queries imply that one has no idea what a user is going to do to with a system here I will grant you RDMBS and SQL is easier to use.

OO modeling typically is the opposite, it defines narrow paths of interaction that fit known use cases which is how applications are typically built. I build more of the former than the latter, while still am free to use Relational when I need to. It's generally 3x more complicated with I have to go that path.

Reply | Parent | Thread

Triple Entendre

Re: the pure thingness of the things

from: triple_entendre
date: Aug. 13th, 2004 01:12 pm (UTC)
Link

I think a key to our cognitive dissonances ad-hoc query aspect. Ad-hoc queries imply that one has no idea what a user is going to do to with a system here I will grant you RDMBS and SQL is easier to use.

I put it to you that "you" (the programmer/designer/architect/etc) are a user of the data and of the system. At the very least, this can/should be true while you are creating it. Experimentation on normalized data in a good IDE with a query design tool will let you discover knowledge about your problem domain, knowledge that emerges as a logical yet inscrutable consequence of gazing upon the data (and poking at it dynamically and rapidly) in as pure a form as you can make it. Things you'd never have thought of if it weren't so easy to offhandedly browse through. deep, obscure correlations. Patterns. Perspective.

Reply | Parent | Thread

Triple Entendre

Re: the pure thingness of the things

from: triple_entendre
date: Aug. 13th, 2004 01:35 pm (UTC)
Link

narrow paths of interaction that fit known use cases

I have trouble imagining that, unless you are rewriting an existing application as-is in-place. Or maybe a "Windows Hardware Installation Wizard".

Use cases are a rather awkward tool for discovering requirements. I suppose they are useful for documenting that you should get paid.

Doesn't OOP in general suggest the exact opposite of "narrow paths of interaction"? Build small, useful components (objects and combinations of objects) that work together as tools to facilitate the user's interaction with data (objects)?

It probably won't surprise you that the majority of my work has been in building "decision support systems" and "data analysis tools". That's my true calling; order from disorder, needles from haystacks. Neatly arranged haystacks, mind you.

Reply | Parent | Thread

TroyToy

Re: the pure thingness of the things

from: troyworks
date: Aug. 13th, 2004 01:56 pm (UTC)
Link

Narrow by meaning relative to a database table: interfaces and defined methods controlling access to private data to prohibit corrupting it. Unlike an SQL statement which can do pretty much anything anyway it wants (good for your case trying to find patterns out of the noise).

The methods on a given object reflect the anticipated useage of it. E.g. a person.getDogName or person.getHairDressor isn't likely to be used on a sales site for books so it's not included in the model. person.getName() is any object who knows to call getName can.

Funny we truly are onthe opposite ends of the development spectrum. I build sites for people around use cases, e.g. an slide presentation and editing system for schools, pretty cut and dry use cases are "view slide" "editing slides". And the last year in a report generating tool for movie analysis (which does use a an RDBMS for the backend, which is mostly abstracted away from the presentation logic (e.g. helping users build valid questions (which get translated into SQL), and lots of nifty features to traverse the data in different ways.

However I have done data analysis of various network like things without a relational database, neural networks, search engine results, and livejournal social networks.

e.g.
http://www.intrio.com/products/graphion/index.htm

Reply | Parent | Thread

truth without proof

Re: the pure thingness of the things

from: chronicfreetime
date: Aug. 13th, 2004 11:21 am (UTC)
Link

To recreate this from any given node is done by iterative selects. e.g. give me node5's children, for all of those children give me the other children, etc.

Yes, this is a chore. Some have tried to extend SQL to make this easier, eg Oracle's "CONNECT BY". The problem is not, however, that your data doesn't fit the relational model; G = (V, E) after all. The problem is that existing database tools don't present an especially faithful implementation of relational theory.

In the past, when I've had to do anything intense with graphs, I've done it in memory. Those were mostly personal projects, though. At work, I have to jam trees into an RDBMS, and it hasn't really been that difficult. I'm not sure what approach I would take if I had to work with graph data on the scale of eg Orbitz; definitely not an OODB, though.

Reply | Parent | Thread

TroyToy

Re: the pure thingness of the things

from: troyworks
date: Aug. 13th, 2004 12:40 pm (UTC)
Link

faithful implementation of relational theory.

I don't think i truly understand how your are using "relational theory'. My frustrations come from the limitations of using an RDBMs in past applications.

Is in memory a faithful implementation of relational theory?

Reply | Parent | Thread

Triple Entendre

(no subject)

from: triple_entendre
date: Aug. 12th, 2004 07:03 pm (UTC)
Link

what do you mean by back fill.

I'll expand on this later (remind me if not).

Reply | Parent | Thread

valkyriie

(no subject)

from: valkyriie
date: Aug. 12th, 2004 08:10 pm (UTC)
Link

I love it when you talk dirty.

Reply | Thread

NobodobodoN

(no subject)

from: nobodobodon
date: Aug. 13th, 2004 10:19 am (UTC)
Link

Object databases look pretty but tempt you into omitting some relations that you later have to back-fill using "business rules".

Wish you'da told people here that a few years ago.

I'm just a data guy working at an object shop. Sure, they want data from me, but they aren't willing to give me any without a fight. Grr.

Reply | Thread

Triple Entendre

My advice

from: triple_entendre
date: Aug. 13th, 2004 10:55 am (UTC)
Link

Just go ahead and secretly design an automated mapping into a normalized database around the data you can get, and use that. Eventually they'll wonder why you can answer questions no one else can.

Reply | Parent | Thread

TroyToy

(no subject)

from: troyworks
date: Aug. 13th, 2004 12:26 pm (UTC)
Link

which object database are you using? and what are your reservations?

Reply | Parent | Thread

NobodobodoN

(no subject)

from: nobodobodon
date: Aug. 13th, 2004 12:40 pm (UTC)
Link

None of your business.

Reply | Parent | Thread

TroyToy

(no subject)

from: troyworks
date: Aug. 13th, 2004 12:48 pm (UTC)
Link

Fine,

I'm interested in an your take if you actually have experience using OODB's and actually have problems with them, I and others I'm sure would be interested in knowing what they are after all the house is on your side. Most people I have this particular discussion with have not used OODB's.

If your saying you don't like OOP at all, then I can understand.

Reply | Parent | Thread

NobodobodoN

(no subject)

from: nobodobodon
date: Aug. 13th, 2004 01:39 pm (UTC)
Link

You should mind your own business, sonny.

I like OOP, and I like relational databases, and I especially like it when they work well together.

I don't really enjoy discussing them, aside from the occasional grousing. I like to do my work, take my check, and go home.

I'm not actually using a formal OODB, I'm using a relational DB with a customized persistence layer. The problem is that this project has no database design, no data modelling of any kind. There's object modelling, and some post hoc mapping in the persistence layer, but getting the requirements dorks to give even simple data specs like maximum field lengths, allowable characters, and object cardinality is like pulling teeth.

We'll have hundreds of gigs of data that come and go through automatic feeds, and only the tinyest trickle through the UI, but 90% of the effort on the project goes into the UI. All of the requirements are written from a UI standpoint, and the external systems stuff has been mostly seat-of-pants work.

So, it's not really an issue with OODB per se, but it's annoying in many of the same ways.

Dig?

boodly doodly noodly woodly

Reply | Parent | Thread

NobodobodoN

(no subject)

from: nobodobodon
date: Aug. 13th, 2004 01:40 pm (UTC)
Link

ps. I love XML.

Reply | Parent | Thread