July 11RubyismsTrackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Nice roundup of the issues of using the highly-abstracted ActiveRecord.
Most of the worst problems you have pointed out ("SELECT *", updating an entire row every call to "save", unique constraints) can be solved within the framework, and it should be expected that developers will check their query logs, profile, identify pain points and fix them. ActiveRecord (and Rails) is nice to get things off the ground quickly, but shouldn't be an excuse for not knowing how your database is behaving behind the scenes. QUOTE: In fact, AR seems to love "SHOW FIELDS FROM" a lot. [...] I'd expect AR to cache this information AFAIK, AR caches this information in production mode. QUOTE: RoR also creates a default primary key named id for each and every table. In fact, many Rails developers get so used to this that they often forget to check for natual primary keys and unique constraints where appropriate. If you for example have a relation between users and classes, where each user can be a member of multiple classes, and each class is joined by multiple users, then you will have a memberships relation table which will have user_id and class_id foreign key columns. By default, RoR will also create an id column in memberships, which is useless, and not create a unique constraint or even primary key for ( user_id, class_id) which expresses the fact that each user can only be a member in each class once. If you're creating a HABTM join table, the documentation and every book I've read enjoins you to pass :id => false to suppress the creation of the `id` column. In fact, HABTM doesn't work properly with a join table that has an `id` column. However, Rails has no way to magically guess that you're creating a HABTM join. Rails now features has-many-through (HMT) relations, in which the join is itself a model and may have additional attributes of its own. In this case, the `id` column is merited. QUOTE: "add_index(:memberships, [:user_id, :class_id], :unique => true)" will create such a compound unique index. And that's a very good idea for HABTM join tables (although Rails convention would demand that table be named classes_users). I'd very much like to see a shorthand for the creation of join tables, like this: CODE: create_join_table :class, :user I've heard that there's been some recent work on Rails migrations to provide some shorthand for similar operations; maybe that will be included too. On the issue of SHOW FIELDS FROM, that query is issued when the associated class is first used. It should only occur once per table per Rails instance in production mode, and that's certainly been what I've observed. However, if you had ten Rails apps, each deployed as a three-Mongrel cluster, and each being restarted a few times a day, the SHOW FIELDS FROM queries will certainly add up. Rails doesn't use migrations to determine the schema of a table. The fact that it relies directly on the database and not on external configuration files is one of its core design features. You're absolutely right about SELECT *, although it's usually needed when you're going to be editing the model in question. All of my TEXT-heavy apps have custom finders for exactly that purpose, but I've heard references to a plugin that allows something like a :exclude_text => true argument to AR.find. QUOTE: Apparently Ruby uses InnoDB by default. Apparently Ruby uses transactions a lot. That means that the query cache is of very limited use, because queries inside a transaction cannot use the query cache. Careful with your language there. Rails (ActiveRecord, specifically) uses InnoDB by default in its migrations. Ruby doesn't care. In any case, this sounds a lot more like a MySQL shortcoming to me, unless there's some inherent reason that the query cache cannot be used within a transaction. Furthermore, I've never noticed Rails to use a transaction for SELECT, and isn't that where the benefit of the query cache comes from? QUOTE: If you are looking at the binary log, you will find a lot of queries of the form "UPDATE products SET WHERE id = ". This is the result of an AR product.save call. AR will needlessly save all attributes, even when they have not been changed. There was a discussion of this on rubyonrails-core. The issue is maintaining data integrity -- AR validations operate in the application context before saving the record. Updating all columns ensures that the result stored in the database is valid. If two or more AR instances were to save just the updated columns in quick succession, the result may be invalid. To give a brief example: My Customer model has a validation requiring the presence of either an email or a phone number. Two administrators interacting with two separate Rails instances simultaneously update the record, one removing the email address and the other removing the phone number. The result is a record with neither. That's a contrived example, but there are many more -- and some have more serious consequences. Other than that, I agree with most of your points. A lot of that would be obviated by learning to use AR properly, as comment #1 noted. The quoting issue you bring up is extremely important -- not only should AR compulsively quote identifiers, but it should provide more help for programmers to do so. (An example is the :order argument to AR.find -- passing :order => :name should generate 'ORDER BY `#{table_name}`.`name`'. not 'ORDER BY name'.) QUOTE: To give a brief example: My Customer model has a validation requiring the presence of either an email or a phone number. Two administrators interacting with two separate Rails instances simultaneously update the record, one removing the email address and the other removing the phone number. The result is a record with neither. I don't know about Rails, but in our own ActiveRecord implementation such a requirement could be enforced by overriding the save() method for the product class and marking the necessary fields as modified when save() is called - thus including these fields in the UPDATE statement even if they weren't modified. I'm surprised that Rails developers couldn't think of a similar solution. Not all decisions should be blindly based on edge cases. QUOTE: In any case, this sounds a lot more like a MySQL shortcoming to me, unless there's some inherent reason that the query cache cannot be used within a transaction. Furthermore, I've never noticed Rails to use a transaction for SELECT, and isn't that where the benefit of the query cache comes from? With Repeatable-Read, which is the InnoDB default, "BEGIN; SELECT ...; SELECT; COMMIT" will create a Snapshot of the database for your view of the database on BEGIN. Subsequent selects will see unchanging data. That is a) a transaction using SELECT and b) the reason why the query cache cannot be useful.
Rails only uses transactions if you ask it to. If you just do a People.find(:all), it does a select * from people, not a begin; select * from people; commit;
I strongly disagree that this is an edge case. Interdependent validations are a regular feature of many apps I've encountered (and written). Furthermore, it's a far better thing to build less efficient applications out of ignorance than it is to build subtly data-corrupting ones. Bad performance can be detected, researched, and fixed with far less hassle and liability than inconsistent validation.
I'll say this, however: whereas AR::B#update_attributes and AR::B#save should always save all attributes unless explicitly instructed otherwise, AR::B#update_attribute should only update the column in question. That method doesn't perform validation (and neither do toggle!, increment!, and decrement!, which use update_attribute), and thus has no good reason to clobber other columns.
Regarding UPDATE of all fields. Use locking. It was invented to solve precisely this problem. Blindly destroying data is no more correct than allowing invalid state from a user perspective, and it's simple differing opinion from a developer perspective anyway, with the Rails implementation in the minority by far.
QUOTE Djur: There was a discussion of this on rubyonrails-core. The issue is maintaining data integrity -- AR validations operate in the application context before saving the record. Updating all columns ensures that the result stored in the database is valid. If two or more AR instances were to save just the updated columns in quick succession, the result may be invalid. The data integrity issue can (in most cases) and should be solved at the data level through the use of constraints. This is the one area where I really disagree with the approach ActiveRecord takes. It is too bad that the ActiveRecord developers are forced to stick with a design that clearly could be improved upon by updating only changed columns because they feel that constraints should not be used. This is exactly what constraints are there for.
Foreign key constraints have two big disadvantages:
One is that they create a lot of locking. From the manual: QUOTE: If a FOREIGN KEY constraint is defined on a table, any insert, update, or delete that requires the constraint condition to be checked sets shared record-level locks on the records that it looks at to check the constraint. InnoDB also sets these locks in the case where the constraint fails. The other is that they put additional load on the only part of the system that cannot be scaled well - the database. If one enforced constraints instead inside the application, e.g. using Ruby, you'd put load on the frontends instead. Usually there are a lot of frontends, but few databases, and usually frontends are cheaper and less stressed re memory. That's why I usually opt for a library-based, client-side validation when it comes to performance consideration. Also, validation inside the application very often has more data and context and can use more useful assumptions to validate.
Sure, foreign key constraints affect performance and there are cases where they should be removed. However, I think it's always a good idea to start off with them until performance problems show up. This is especially true in cases where there are multiple applications all hitting the same database. I'm not arguing that all validation belongs in the database, either. But when data integrity is concerned, the closer to the data you can have your validation, the better, generally.
IMHO: Rails is being designed and primarily used by people with a dedicated DB. So the idea of constraining in the DB is not as important. As David H has said multiple times, if something bothers you, submit a patch.
If you are here and unable to code in ruby for a patch you would like, let me know (james dot deville with gmail), and I'll even see if I can do it for you!
So MySQL is able to validate the content of one field based on the content of others? I know that's possible with the CHECK constraint in SQL, so let's see...
http://dev.mysql.com/doc/refman/5.1/en/create-table.html QUOTE: The CHECK clause is parsed but ignored by all storage engines. All right then. If we can't do it there, we need to do it in the application.
Ouch, point taken. Most of my experience is with Oracle, where this kind of thing is possible. Thanks for pointing that out.
So applications have to either be inefficient (on all databases) or risk data corruption because MySQL doesn't support table constraints (CHECK)? Foreign key constraints are bad because MySQL handles them poorly?
Constraints are part of the relational model, which is the concept at the core of RDBMSes as a whole. They're the seatbelts of the database world. I certainly wouldn't buy a car without seatbelts just because it was the fastest car around, no matter how much the manufacturer tried to convince me that seatbelts were unnecessary because crashes are an "edge case".
The lack of support for unsigned integers combined with the lack of support for 64-bit integers (bigint, longint, whatever you want to call it) bugs the bajeezus out of me.
I wind up having to override the native_database_types method from the appropriate lib/active_record/connection_adapters in my environment.rb file to support big, unsigned primary keys, and support other big/unsigned numbers.
Well, constraints are in the Standard, yes, but in most RDBMS that have them, they can only access other fields in the same record. One cannot write a field constraint that checks data in another table, or checks data in other records in the table. This generally requires one to use a trigger instead.
I think for simple, relatively static constraints (i.e., GenderCode in ('M','F')), they can make sense, but so would DOMAINs, which essentially sets up UDTs for the database...
I like to separate text and blob columns into their own models and tables, storing and managing the record content separate from its meta data.
For example in the case of a blog I would have two models: BlogEntry and BlogContent the later having just an id and text column. This allows me to RUD the metadata separately from the content and use lazy loading of the content when needed. The only hitch is that you need to wrap the create operation in a transaction. This avoids many of the problems mentioned above and offers a great deal of flexibility in how the bulk of the data (the content) is managed on the back end.
Sorry, I should mention that the create transaction simply involves doing:
content = BlogContent.new(params[:content]) entry = BlogEntry.new(params[:entry].merge(:content => content)) entry.save! |
QuicksearchComments about phpvikinger.org: Things that have no name Sun, 11.05.2008 06:34 In reply to "stuff with no nam es":very informative and succi nct. I am retired and need to learn to build a website [...] about What is the difference between MySQL and Postgres? Wed, 30.04.2008 14:08 what is the difference between MySql and PostgreSql? about Fortune Cookie Wed, 09.04.2008 21:46 What a random fortune, who kno ws what it means. My favorite random fortune cookie note rea ds "you will make a good [...] about Configuring InnoDB - An InnoDB tutorial Thu, 27.03.2008 19:17 Hi Kristian, I noticed a fe w things that are wrong with y our above article. Overall it is good however, so don [...] about Rubyisms Wed, 19.03.2008 15:01 Sorry, I should mention that t he create transaction simply i nvolves doing: content = Bl ogContent.new(params[:co [...] about Rubyisms Wed, 19.03.2008 14:51 I like to separate text and bl ob columns into their own mode ls and tables, storing and man aging the record content [...] about Rubyisms Wed, 27.02.2008 04:47 IMHO: Rails is being designed and primarily used by people w ith a dedicated DB. So the ide a of constraining in the [...] about Rubyisms Wed, 27.02.2008 04:33 Rails only uses transactions i f you ask it to. If you just d o a People.find(:all), it does a select * from people, [...] Categories
Blog AdministrationDisclaimerAll examples, code-snippets, advice, trivia, etc. are supplied in the hope that they are useful, but no warranty of any kind is given. Use of trademarks is not intended as a challenge to them.
ImpressumCopyright 2005-2007 bei den jeweiligen Autorinnen der Beiträge. Die Infrastruktur von MySQL-dump wird betrieben von:
Azundris, Immanuelkirchstraße 18, 710405 Berlin, Germany
|