Baixe Contents at a Glance About the Author e outras Notas de estudo em PDF para Engenharia Elétrica, somente na Docsity! THE EXPERT'S VOICE? IN WEB: DEVELOPMENT SECOND EDITION DDD 2/47 Pro Django eta RAT aia aee ASAS SALA A dA A AA NOIS For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. ■ IntroduCtIon xxvi A line started with three periods but containing no other text indicates that you should simply press Enter on a blank line in the interpreter. This completes any open code blocks, bringing you back to the >>> prompt. Any lines that don’t begin with either >>> or . . . represent the output of the code or the result of the previous expression. >>> import django >>> django.get_version() '1.5.1' The first line of an interactive example will always begin with >>>; everything else is code that should be written in a file and executed as part of a running Django application. The surrounding text will indicate what file the code should be placed in and how it will execute. Prerequisites Pro Django is written for Django 1.5, which was released on February 26, 2013. That release or a more recent clone of the Django code repository is required for the code samples to work properly. Since Django in turn relies on Python, these examples also assume a working Python environment of version 2.7 or higher. Most of the code examples are written with Python 3.3 in mind, but there are capability notes available where older versions diverge from the examples shown. 1 Chapter 1 Understanding Django Code alone isn’t enough. Sure, it’s what the computer runs, but code has to come from somewhere. A programmer has to sit down and decide what features to include, how they should be implemented, what other software to utilize, and how to provide hooks for future enhancements to be added. It’s easy to skip straight to code, ignoring the cognitive process that produces it, but great programmers always have reasons for the decisions they make. With a framework, like Django, many such decisions have already been made, and the tools provided are shaped by these decisions, and by the programmers who made them. By adopting these philosophies in your own code, not only will you be consistent with Django and other applications, but you may even be amazed at what you’re able to accomplish. Beneath even the simplest code is the thought process that went into its creation. Decisions were made about what it should do and how it should do it. This thought process is a step often overlooked in books and manuals, leading to an army of technicians slaving away, writing code that manages to accomplish the task at hand but without a vision for its future. While the rest of this book will explain in detail the many basic building blocks Django provides for even the most complicated of projects, this chapter will focus on even more fundamental aspects of the framework. For those readers coming from other backgrounds, the ideas presented in this chapter may seem considerably foreign, but that doesn’t make them any less important. All programmers working with Python and Django would do well to have a solid understanding of the reasons Django works the way it does, and how those principles can be applied to other projects. You may want to read this chapter more than once, and perhaps refer to it often as you work with Django. Many of the topics are common knowledge in the Django community, so reading this chapter carefully is essential if you plan to interact with other programmers. Philosophy Django relies heavily on philosophy, both in how its code is written and how decisions are made about what goes into the framework. This isn’t unique in programming, but it’s something newcomers often have trouble with. It is essential to maintain both consistency and quality, and having a set of common principles to refer to when making decisions helps maintain both. Since these concepts are also important to individual applications, and even collections of applications, a firm grasp on these philosophies will yield similar benefits. Perhaps the best-known and most-quoted passage of Python philosophy comes from Tim Peters, a longtime Python guru who wrote down many of the principles that guide Python’s own development process. The 19 lines he came up with, called the Zen of Python, have been so influential to Python programmers over time that they are immortalized as Python Enhancement Proposal (PEP) 201 and in the Python distribution itself, as an “Easter egg” module called this. 1http://prodjango.com/pep-20/ Chapter 1 ■ Understanding django 2 >>> import this Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one-- and preferably only one --obvious way to do it. Although that way may not be obvious at first unless you're Dutch. Now is better than never. Although never is often better than *right* now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea -- let's do more of those! While some of this is clearly intended for humor, it sums up common Python attitudes pretty well. The remainder of this chapter highlights some specific principles that are often cited within the Django community, but all professional Python programmers should keep this text in mind and reference it often. One important thing to keep in mind is that many of the lines in the Zen of Python are subjective. For example, “beautiful” may be better than “ugly,” but definitions of “beautiful” are plentiful and can vary as much as the people who provide them. Similarly, consider notions of simplicity and complexity, practicality and purity; each developer will have a different opinion on which side of the line a particular piece of code should be placed. Django’s Interpretation of the MVC Pattern One of the most common application architectures—adopted by hobbyists and corporations alike—is the Model-View-Controller (MVC) pattern, as it provides clean separation of tasks and responsibilities between the prominent aspects of an application. Django only loosely follows this approach. A proper discussion should kick off with a quick overview of its components. The model is generally responsible for managing data and core business logic.• The view displays that data to the user.• The controller accepts user input and performs logic specific to the application.Although this pattern has proven very effective in many domains, Django’s authors weren’t looking to conform to any kind of pattern at the outset. They were simply interested in finding the most effective way to develop software for the Web. After all, Django was built for the daily needs of a working newspaper, where things have to happen very quickly if they’re to happen at all. Ultimately, the separation of tasks into discrete groups serves a few different purposes. Code that is designed for a specific set of tasks is much more maintainable because it doesn’t • need to make assumptions about completely unrelated parts of the application. In general, this concept is called separation of concerns and is applicable throughout software development. Application development becomes more flexible, as multiple distinctly different view and • controller layers may connect to a single model layer. This enables a variety of applications to share the same business logic and data, presenting it and interacting with it in different ways, for different audiences. Chapter 1 ■ Understanding django 5 Often abbreviated DRY, this term comes up quite often in conversations and can be used as A noun—“This code violates DRY.”• An adjective—“I like that approach, it’s very DRY.”• A verb—“Let’s try to DRY this up a bit.”• The basic idea is that you should only write something once. That reduces the risk of accidentally introducing inconsistency between two pieces of code that should match. It should also be as reusable as possible, and if other code needs to know something about what you’ve already written, you should be able to get the necessary information automatically using Python, without requiring the programmer to repeat any of that information. To facilitate this, Python provides a wealth of resources for peeking inside your code, a process called introspection. Many of these resources, covered in Chapter 2, are incredibly useful when supporting DRY in your code. A Focus on Readability “Readability counts.” It’s mentioned specifically in the Zen of Python, as noted earlier, and is perhaps one of the most important features of Python. Indeed, many Python programmers take pride in the readability of both the language and the code they write. The idea is that code is read far more often than it’s written, especially in the world of open source. To this end, Python provides a number of features designed to improve readability. For instance, its minimal use of punctuation and forced indentation allow the language itself to help maintain the readability of your code. When you’re working with code in the real world, however, there’s far more to consider. For real life, the Python community has developed a set of guidelines for writing code, intended to improve readability. Set forth in PEP-8,2 these guidelines are designed to maintain not only readability of an individual program, but also consistency across multiple programs. Once you get the feel for one well-written program, you’ll be able to understand others easily. The exact details of PEP-8 are too numerous to list here, so be sure to read it thoroughly to get a good idea of how to write good code. Also, note that if you read Django’s own source code, some of the rules set forth in PEP-8 aren’t followed. Ironically, this is still in the interest of readability, as following every rule to the letter can sometimes cause other problems. After all, to quote the Zen of Python again, “Practicality beats purity.” The examples in this book will follow the style used by Django’s own source code. Failing Loudly “Errors should never pass silently. / Unless explicitly silenced.” This may seem like a simple sentiment, but at two lines, it comprises over 10 percent of the Zen of Python, and there’s something to be said for that. Dealing with exceptions is an important part of programming, and this is especially true in Python. All programming languages can generate errors, and most have a way to handle them gracefully, but each language has its own best practices for dealing with them. One key to keep in mind is that, although the names of most Python exceptions end in Error, the base class is called Exception. To understand how they should be used and handled, it’s useful to start by learning why that particular word was used. Looking at some of the dictionary definitions for the word “exception,” it’s easy to see variations on a theme. Something excepted; an instance or case not conforming to the general rule• One that is excepted, especially a case that does not conform to a rule or generalization• An instance that does not conform to a rule or generalization• 2http://prodjango.com/pep-8/ Chapter 1 ■ Understanding django 6 Rather than an error, which describes a situation where a problem occurred, an exception is simply when something unexpected occurred. This may seem like a subtle distinction, but some people treat exceptions as errors, reserving them solely for unrecoverable problems like corrupted files or network failure. This is reinforced by the fact that, in some languages, raising exceptions is extremely expensive, so to prevent performance problems, exceptions are avoided whenever possible. In Python, however, exceptions are no more expensive than simple return values, allowing them to be more accurate to their dictionary definition. If we define an exception as a violation of a rule, it stands to reason that we must first define a rule. Defining RulesThis is the most important aspect of understanding exceptions, so it’s necessary to be perfectly clear: there’s no Python syntax for defining rules. It’s simply not a feature of the language. Some other languages explicitly support design by contract,3 and many can support it through framework-level code, but Python doesn’t support any form of it natively. Instead, rules are defined by programmers in what they intend their code to do. That may seem like an over-simplification, but it’s really not. A piece of code does exactly what its author intends it to do, and nothing more. Anything outside the intentions of the programmer can—and should—be considered an exception. To illustrate this, here are some of the rules used by Python and Django: Accessing an item in a list using the bracket syntax (• my_list[3]) returns the item at the specified position. A set’s • discard() method makes sure that a specified item is no longer a member of the set. A QuerySet’s • get() method returns exactly one object that matches the arguments provided. Examples like these are important because even though these rules are simple, they accurately describe how the given features will behave in various situations. To further illustrate, consider the following scenarios and how the rule impacts behavior. If the index provided as a reference to a list item does exist, the appropriate value will be • returned. If it doesn’t, an exception (IndexError) is raised. If the value used as an index isn’t an integer, a different exception (TypeError) is raised. If the item being removed from a set using • discard() is already a member of the set, it’s simply removed. If it wasn’t a member of the set, discard() returns without raising an exception, because discard() only ensures that the item is not in the set. If the arguments passed to a QuerySet’s • get() method match one record in the database, that record is returned as an instance of the appropriate model. If no records match, an exception (DoesNotExist) is raised, but if more than one record matches, a different exception (MultipleObjectsReturned) is raised. Finally, if the arguments can’t be used to query the database (due to incorrect types, unknown attribute names or a variety of other conditions), still another exception (TypeError) is raised. Clearly, even simple rules can have profound effects, as long as they’re defined explicitly. Although the only requirement is that they be defined in the mind of the author, rules are of little use if not conveyed to anyone else. This becomes especially important in the case of a framework such as Django, built for distribution to the masses. 3http://prodjango.com/design-by-contract/ Chapter 1 ■ Understanding django 7 Documenting Rules There are a number of appropriate ways to document the specific rules a piece of code was written to follow. It’s even quite useful to specify them in more than one way, and in varying levels of complexity. There are four main places where people look for this information, so providing it in any or all of these locations would serve the purpose quite well. • Documentation—As this should be the complete collection of information about the application, it stands to reason that these rules would be included. • Docstrings—Regardless of stand-alone documentation, developers will often peek at the code itself to see how it works. Docstrings allow you to provide plain-text explanations of these rules right alongside the code that implements them. • Tests—In addition to providing explanations of these rules for humans to understand, it’s a great idea to provide them in a way that Python can understand. This allows your rule to be verified on a regular basis. In addition, doctests—tests embedded inside docstrings—are also human-readable, and both purposes can be served at once. • Comments—Sometimes, a function may be complicated enough that a broad overview, such as might be found in full documentation or even the docstring, doesn’t give sufficient information about what a particular chunk of code is expected to do. Python’s emphasis on readability makes this fairly infrequent, but it does still happen. When it does, comments can be a useful way of explaining to others what the code is intended for, and thus what should be considered an exception. In particular, comments should explain the purpose of the code, not merely outline what each line is actually doing. Think why, not how. Regardless of how you choose to describe your rules, there’s one lesson that must always take precedence: be explicit. Remember, anything not laid out in your rule should be considered an exception, so defining the rule explicitly will help you decide how the code should behave in different situations, including when to raise exceptions. Also, be consistent. Many classes and functions will look similar in name or interface, and where at all possible, they should behave similarly. Programmers who are accustomed to a particular behavior will expect similar behavior from similar components, and it’s best to meet those expectations. This is especially true when writing code that mimics types provided by Python or Django, as they’re already well-documented and well-understood by many programmers. Community Since being released to the public in 2005, Django has achieved great success, both technically and culturally. It has amassed a tremendous following throughout the world of Python Web development, among hobbyists and professionals alike. This community is one of the greatest assets to the framework and its users, and it’s most certainly worth discussing in some detail. aN eVOLVING COMMUNItY it’s important to realize that like any social structure, the django community will evolve and change over time. so the information in this section may not always accurately reflect current practices and expectations. there’s no reason to let that deter you, though. the one thing i don’t expect to change is the community’s willingness to embrace new members. You’ll always be able to get in touch with a variety of people, if you’re willing to put yourself out there. Chapter 1 ■ Understanding django 10 Check Your Version As mentioned previously, the official documentation keeps up with Django’s trunk development, so there’s a definite possibility that the documented features don’t match the features available in the code you’re using. This is more likely to occur if you’re using an official release, but it can still happen if you’re tracking trunk, depending on how often you update your local copy. When you’re tracking trunk, the article on backwards-incompatible9 changes should be considered an essential part of the official documentation. If you run into problems after updating, make sure that none of the features you’re using have changed. Frequently Asked Questions (FAQ) After a few years of answering questions using the methods that follow, the Django community has heard a variety of questions that come up on a regular basis. To help answer these questions more easily, there are two articles. Although the official FAQ10 includes many questions not related to troubleshooting problems, there are still several common issues listed there. The Internet Relay Chat (IRC) channel has its own set of questions and answers and its own FAQ.11 Mailing Lists One of the easiest ways to get help is to ask your question on the django-users mailing list.12 Because it operates over standard email, it’s accessible to everyone, without requiring any special software. Simply join the list and you’ll be able to post your questions for thousands of other users to look at. There are no guarantees, but most questions get answered quickly. One key advantage of the mailing list is that all conversations are archived for future reference. In addition to the FAQs, the django-users mailing list archive can be an invaluable resource when you’re trying to track down a problem that might have occurred to someone before. Be sure to search the archives before asking your question, though, because there’s a good chance someone else has run into it as well. Internet Relay Chat (IRC) If you need answers more quickly, the best option is the Django IRC channel,13 where many knowledgeable members of the Django community are available for direct conversation. It’s a very helpful environment, but you should be prepared to provide specific details about the problem. This may include the exact error traceback, snippets of the models, views and other code that might be involved with the problem. This code is most often shared using an online pastebin—a place to temporarily put some code for others to look at. Code can be pasted onto a public Web site for a limited time, allowing it to be shared with others. GitHub provides a tool for this purpose, called gist,14 which is a simple tool for sharing code with users on IRC and elsewhere. Now What? Of course, learning about philosophy and community doesn’t get any code written. It helps to know how to put tools to good use, but that’s nothing without a set of tools to work with. The next chapter outlines many of the less commonly used tools that Python itself has to offer, while the remaining chapters explore much of Django’s own toolset. 9http://prodjango.com/backwards-incompatible-changes/ 10http://prodjango.com/faq/ 11http://prodjango.com/irc-faq/ 12http://prodjango.com/django-users/ 13http://prodjango.com/irc/ 14http://prodjango.com/gist/ 11 Chapter 2 Django Is Python Django, like other frameworks, is built on an underlying programming language—in this case, Python—to do its work. Many people who are new to Django are also new to Python, and Python’s natural-feeling syntax combined with Django’s energy-saving features can make Django seem like it uses some kind of metalanguage, which isn’t the case. A proper understanding of what can be done in Django must begin with the knowledge that Django is simply Python, as are all of your applications. Anything that can be done in Python can be done in Django, which makes the possibilities nearly limitless. This also means that Django applications have access not only to the entire Python standard library, but also to an immense collection of third-party libraries and utilities. Interfaces to some of these are provided along with Django itself, so for many cases, the existing code and documentation will be sufficient to quickly get an application up and running. Later in this book, some additional utilities are covered, along with some tips on how to integrate them into a Django application. The possibilities aren’t limited to the options outlined in this book, so feel free to look around for Python utilities that will help support your business plan, and use the techniques listed in this book to integrate them into your application. Though learning Python is beyond the scope of this book, Django uses some of its advanced features. In this chapter, I’ll discuss many of those features to help you understand how Python can contribute to the goal of making things easier for everyone. How Python Builds Classes Some of the most advanced Python techniques that Django relies on are related to how Python constructs its classes. This process is often taken for granted by most developers—as well it should be—but since it’s at the heart of Django, it forms the basis of this exploration. When the Python interpreter encounters a class definition, it reads its contents just as it would any other code. Python then creates a new namespace for the class and executes all the code within it, writing any variable assignments to that new namespace. Class definitions generally contain variables, methods and other classes, all of which are basically assignments to the namespace for the class. However, nearly any valid code is allowed here, including printing to console output, writing to log files or even triggering GUI interaction. Once the contents have finished executing, Python will have a class object that is ordinarily placed in the namespace where it was defined (usually the global namespace for the module), where it is then passed around or called to create instances of that class. >>> class NormalClass: ... print('Loading NormalClass') ... spam = 'eggs' ... print('Done loading') ... Loading NormalClass Chapter 2 ■ Django Is python 12 Done loading >>> NormalClass <class '__main__.NormalClass'> >>> NormalClass.spam 'eggs' As you can see, code executes within the class definition, with any assigned variables showing up as class attributes once the class is ready. Building a Class Programmatically The process described in the previous section is used for any source-declared class, but the way Python goes about it offers the possibility of something far more interesting. Behind the scenes, details about the class declaration are sent off to the built-in type object, which takes care of creating an appropriate Python object for the class. This happens automatically, for every class, immediately when it finishes parsing the contents of the class declaration. The constructor for type accepts three arguments, which represent the entire class declaration. • name—The name provided for the class, as a string • bases—A tuple of classes in the inheritance chain of the class; may be empty • attrs—A dictionary of the class namespace COMpatIBILItY: NeW-StYLe CLaSSeS IN pYthON 2 the process described in this section is true for new-style python classes, a distinction introduced in python 2.21 old-style classes have been completely removed from python 3, but if you're working with python 2, you’ll need to make sure to force new-style classes. to do so, simply make sure that the class inherits from the built-in object type somewhere in its inheritance chain. all the classes Django provides to be subclassed will already derive from object, so any further derivatives will automatically be new-style classes, without any extra effort on your part. still, it’s important to keep the difference in mind, so that any custom classes your application may need will exhibit the behaviors outlined in this chapter. Like any Python object, a new type can be instantiated at any time, from any block of code. This means that your code can construct a new class based on data collected at runtime. The following code demonstrates a way to declare a class at runtime, which is functionally equivalent to the example provided in the previous section. >>> DynamicClass = type('DynamicClass', (), {'spam': 'eggs'}) >>> DynamicClass <class '__main__.DynamicClass'> >>> DynamicClass.spam 'eggs' 1http://prodjango.com/new-style-classes/ Chapter 2 ■ Django Is python 15 This declarative syntax has become an identifying feature of Django code, so many third-party applications that supply additional frameworks are written to use a syntax similar to that of Django itself. This helps developers easily understand and utilize new code by making it all feel more cohesive. Once you understand how to create a class using declarative syntax, you’ll easily be able to create classes using many Django features, both official and community-provided. Looking at declarative syntax on its own will demonstrate how easy it is to create an entirely new framework for Django that fits with this pattern. Using declarative syntax in your own code will help you and your colleagues more easily adapt to the code, ensuring greater productivity. After all, developer efficiency is a primary goal of Django and of Python itself. While the next few sections describe declarative syntax in general, the examples shown are for Django’s object-relational mapper (ORM), detailed in Chapter 3. Centralized Access Typically, a package will supply a single module from which applications can access all the necessary utilities. This module may pull the individual classes and functions from elsewhere in its tree, so they can still use maintainable namespaces, but they will all be collected into one central location. from django.db import models Once imported, this module provides at least one class intended as the base class for subclasses based on the framework. Additional classes are provided to be used as attributes of the new subclass. Together, these objects will combine to control how the new class will work. The Base Class Each feature starts with at least one base class. There may be more, depending on the needs of the framework, but at least one will always be required in order to make this syntax possible. Without it, every class you ask your users to define will have to include a metaclass explicitly, which is an implementation detail most users shouldn’t need to know about. class Contact(models.Model): In addition to inspecting the defined attributes, this base class will provide a set of methods and attributes that the subclass will automatically inherit. Like any other class, it can be as simple or complex as necessary to provide whatever features the framework requires. Attribute Classes The module supplying the base class will also provide a set of classes to be instantiated, often with optional arguments to customize their behavior and assigned as attributes of a new class. class Contact(models.Model): name = models.CharField(max_length=255) email = models.EmailField() The features these objects provide will vary greatly across frameworks, and some may behave quite differently from a standard attribute. Often they will combine with the metaclass to provide some additional, behind-the-scenes functionality beyond simply assigning an attribute. Options to these attribute classes are usually read by the metaclass when creating this extra functionality. Chapter 2 ■ Django Is python 16 For example, Django’s Model uses the names and options of field attributes to describe an underlying database table, which can then be created automatically in the database itself. Field names are used to access individual columns in that table, while the attribute class and options convert native Python data types to the appropriate database values automatically. More information on how Django handles model classes and fields is available in the next chapter. Ordering Class Attributes One potential point of confusion when using declarative syntax is that Python dictionaries are unordered, rather than respecting the order in which their values were assigned. Ordinarily this wouldn’t be a problem, but when inspecting a namespace dictionary it’s impossible to determine the order in which the keys were declared. If a framework needs to iterate through its special attributes, or display them to a user or programmer, it’s often useful to access these attributes in the same order they were defined. This gives the programmer final control over the order of the attributes, rather than some arbitrary ordering decided by the programming language. A simple solution to this is to have the attributes themselves keep track of the instantiation sequence; the metaclass can then order them accordingly. This process works by having all attribute classes inherit from a particular base class, which can count how many times the class is instantiated and assign a number to each instance. class BaseAttribute(object): creation_counter = 1 def __init__(self): self.creation_counter = BaseAttribute.creation_counter BaseAttribute.creation_counter += 1 Object instances have a different namespace than classes, so all instances of this class will have a creation_counter, which can be used to sort the objects according to the order in which they were instantiated. This isn’t the only solution to this problem, but it’s how Django sorts fields for both models and forms. Class Declaration With all of these classes in a module, creating an application class is as simple as defining a subclass and some attributes. Different frameworks will have different names for the attribute classes, and will have different requirements as to which classes are required or the combinations in which they may be applied. They may even have reserved names that will cause conflicts if you define an attribute with that name, but such problems are rare, and reserving names should generally be discouraged when developing new frameworks for use with this syntax. The general rule is to allow developers to be as flexible as they’d need to be, without the framework getting in the way. from django.db import models class Contact(models.Model): """ Contact information provided when sending messages to the owner of the site. """ name = models.CharField(max_length=255) email = models.EmailField() This simple code alone is enough to allow the framework to imbue the new class with a wealth of additional functionality, without requiring the programmer to deal with that process manually. Also note how all the attribute classes are provided from that same base module and are instantiated when assigned to the model. A class declaration is never limited to only those features provided by the framework. Since any valid Python code is allowed, your classes may contain a variety of methods and other attributes, intermingled with a framework’s provided features. Chapter 2 ■ Django Is python 17 Common Duck Typing Protocols You’ve probably heard the old adage, “If it walks like a duck and talks like a duck, it’s a duck.” Shakespeare played on this idea a bit more romantically when he wrote in Romeo and Juliet, “That which we call a rose by any other name would smell as sweet.” The recurring theme here is that the name given to an object has no bearing on its true nature. The idea is that, regardless of labels, you can be reasonably sure what something is just by looking at its behavior. In Python, and in some other languages, this concept is extended to refer to object types. Rather than relying on some base class or interface to define what an object can do, it simply implements the attributes and methods necessary to behave as expected. A common example of this in Python is a file-like object, which is any object that implements at least some of the same methods as a Python file object. In this way, many libraries may return their own objects that can be passed to other functions that expect a file object but while retaining special abilities, such as being read-only, compressed, encrypted, pulled from an Internet-connected source or any number of other possibilities. Also, like interfaces in other languages, Python objects can be more than one type of duck at a time. It’s not uncommon, for instance, to have an object that can behave as a dictionary in some respects, while behaving like a list in others. Django’s HttpResponse object exhibits both of these behaviors, as well as mimicking an open file object. In Django, many features utilize duck typing by not providing a particular base class. Instead, each feature defines a protocol of sorts, a set of methods and attributes that an object must provide in order to function properly. Many of these protocols are presented in the official Django documentation, and this book will cover many more. You will also see some of the special abilities that can be provided by using this technique. The following sections describe a few common Python protocols that you’ll see throughout Django, and indeed throughout any large Python library. Callables Python allows code to be executed from a number of sources, and anything that can be executed in the same manner as a typical function is designated as callable. All functions, classes and methods are automatically callable, as would be expected, but instances of arbitrary object classes can be designated as callable as well, by providing a single method. __call__(self[, …]) This method will be executed when the instantiated object is called as a function. It works just like any other member function, differing only in the manner in which it’s called. >>> class Multiplier(object): ... def __init__(self, factor): ... self.factor = factor ... def __call__(self, value): ... return value * self.factor ... >>> times2 = Multiplier(2) >>> times2(5) 10 >>> times2(10) 20 >>> times3 = Multiplier(3) >>> times3(10) 30 Chapter 2 ■ Django Is python 20 a VerY LOOSe prOtOCOL File-like objects come in many varieties, because this protocol is one of the loosest defined in all of python. there are quite a few features, from buffering output to allowing random access to data, that are inappropriate in some situations, so objects designed for those situations will typically just not implement the corresponding methods. For example, Django’s HttpResponse object, described in Chapter 7, only allows writes in sequence, so it doesn’t implement read(), seek() or tell(), causing errors when used with certain file-manipulation libraries. the common approach in situations like this is to simply leave any inappropriate methods unimplemented so that trying to access them raises an AttributeError. In other cases, a programmer may decide it’s more useful to implement them but simply raise a NotImplementedError to display a more descriptive message. just make sure to always document how much of the protocol your object obeys, so users aren’t surprised if these errors occur while trying to use them as standard files, especially in third-party libraries. Iterables An object is considered iterable if passing it to the built-in iter() returns an iterator. iter() is often called implicitly, as in a for loop. All lists, tuples and dictionaries are iterable, and any new-style class can be made iterable by defining the following method. __iter__(self) This method is called implicitly by iter() and is responsible for returning an iterator that Python can use to retrieve items from the object. The iterator returned is often implied by defining this method as a generator function, described in the upcoming “Generators” section. >>> class Fibonacci(object): ... def __init__(self, count): ... self.count = count ... def __iter__(self): ... a, b = 0, 1 ... for x in range(self.count): ... if x < 2: ... yield x ... else: ... c = a + b ... yield c ... a, b = b, c ... >>> for x in Fibonacci(5): ... print(x) ... 0 1 1 2 3 Chapter 2 ■ Django Is python 21 >>> for x in Fibonacci(10): ... print(x) ... 0 1 1 2 3 5 8 13 21 34 Iterators When iter() is called with an object, it’s expected to return an iterator, which can then be used to retrieve items for that object in sequence. Iterators are a simple method of one-way travel through the available items, returning just one at a time until there are no more to use. For large collections, accessing items one by one is much more efficient than first gathering them all into a list. next(self ) The only method required for an iterator, this returns a single item. How that item is retrieved will depend on what the iterator is designed for, but it must return just one item. After that item has been processed by whatever code called the iterator, next() will be called again to retrieve the next item. Once there are no more items to be returned, next() is also responsible for telling Python to stop using the iterator and to move on after the loop. This is done by raising the StopIteration exception. Python will continue calling next() until an exception is raised, causing an infinite loop. Either StopIteration should be used to stop the loop gracefully or another exception should be used to indicate a more serious problem. class FibonacciIterator(object): def __init__(self, count): self.a = 0 self.b = 1 self.count = count self.current = 0 def __next__(self): self.current += 1 if self.current > self.count: raise StopIteration if self.current < 3: return self.current - 1 c = self.a + self.b self.a = self.b self.b = c return c next = __next__ Chapter 2 ■ Django Is python 22 def __iter__(self): # Since it's already an iterator, this can return itself. return self class Fibonacci(object): def __init__(self, count): self.count = count def __iter__(self): return FibonacciIterator(self.count) Note that iterators don’t explicitly need to define __iter__() in order to be used properly, but including that method allows the iterator to be used directly in loops. COMpatIBILItY: IteratOrS IN pYthON 2 there’s only one very minor change to iterators in python 3. the __next__() method shown here used to be called next(). note the missing underscores. this was changed to respect python’s convention of identifying magic methods like this with double underscores before and after the name of the method. If you need to support python 2 and 3 together, the solution is fairly simple. after you define __next__() as shown in our Fibonacci example, you can just assign the __next__() method to next on the method directly: next = __next__. this can be done anywhere inside the class definition, but it’s usually best right after the end of the __next__() method, to keep things tidy. Generators As illustrated in the Fibonacci examples, generators are a convenient shortcut to create simple iterators without having to define a separate class. Python uses the presence of the yield statement to identify a function as a generator, which makes it behave a bit differently from other functions. When calling a generator function, Python doesn’t execute any of its code immediately. Instead, it returns an iterator whose next() method will then call the body of the function, up to the point where the first yield statement occurs. The expression given to the yield statement is used as the next() method’s return value, allowing whatever code called the generator to get a value to work with. The next time next() is called on the iterator, Python continues executing the generator function right where it left off, with all of its variables intact. This repeats as long as Python encounters yield statements, typically with the function using a loop to keep yielding values. Whenever the function finishes without yielding a value, the iterator automatically raises StopIteration to indicate that the loop should be ended and the rest of the code can continue. Sequences While iterables simply describe an object that retrieves one value at a time, these values are often all known in advance and collected on a single object. This is a sequence. The most common types are lists and tuples. As iterables, sequences also use the __iter__() method to return their values one by one, but since these values are also known in advance, some extra features are available. Chapter 2 ■ Django Is python 25 Also note that, while functions can accept any number of required and optional arguments, they may only define one of each of the excess argument types. Passing Argument Collections In addition to functions being able to receive arbitrary collections of arguments, Python code may call functions with any number of arguments, using the asterisk notation previously described. Arguments passed in this way are expanded by Python into a normal list of arguments, so that the function being called doesn’t need to plan for excess arguments in order to be called like this. Any Python callable may be called using this notation, and it may be combined with standard arguments using the same ordering rules. >>> def add(a, b, c): ... return a + b + c ... >>> add(1, 2, 3) 6 >>> add(a=4, b=5, c=6) 15 >>> args = (2, 3) >>> add(1, *args) 6 >>> kwargs = {'b': 8, 'c': 9} >>> add(a=7, **kwargs) 24 >>> add(a=7, *args) Traceback (most recent call last): ... TypeError: add() got multiple values for keyword argument 'a' >>> add(1, 2, a=7) Traceback (most recent call last): ... TypeError: add() got multiple values for keyword argument 'a' As illustrated in the final lines of this example, take special care if explicitly passing any keyword arguments while also passing a tuple as excess positional arguments. Since Python will expand the excess arguments using the ordering rules, the positional arguments would come first. In the example, the last two calls are identical, and Python can’t determine which value to use for a. Decorators Another common way to alter the way a function behaves is to “decorate” it with another function. This is also often called “wrapping” a function, as decorators are designed to execute additional code before or after the original function gets called. The key principle behind decorators is that they accept callables and return new callables. The function returned by the decorator is the one that will be executed when the decorated function is called later. Care must be taken to make sure that the original function isn’t lost in the process, as there wouldn’t be any way to get it back without reloading the module. Decorators can be applied in a number of ways, either to a function you’re defining directly or to a function that was defined elsewhere. As of Python 2.4, decorators on newly defined functions can use a special syntax. In previous versions of Python, a slightly different syntax is necessary, but the same code can be used in both cases; the only difference is the syntax used to apply the decorator to the intended function. Chapter 2 ■ Django Is python 26 >>> def decorate(func): ... print('Decorating %s...' % func.__name__) ... def wrapped(*args, **kwargs): ... print("Called wrapped function with args:", args) ... return func(*args, **kwargs) ... print('done!') ... return wrapped ... # Syntax for Python 2.4 and higher >>> @decorate ... def test(a, b): ... return a + b ... Decorating test... done! >>> test(13, 72) Called wrapped function with args: (13, 72) 85 # Syntax for Python 2.3 >>> def test(a, b): ... return a + b ... >>> test = decorate(test) Decorating test... done! >>> test(13, 72) Called wrapped function with args: (13, 72) 85 The older syntax in this example is another technique for decorating functions, which can be used in situations where the @ syntax isn’t available. Consider a function that’s been declared elsewhere but would benefit from being decorated. Such a function can be passed to a decorator, which then returns a new function with everything all wrapped up. Using this technique, any callable, regardless of where it comes from or what it does, can be wrapped in any decorator. Decorating with Extra Arguments Sometimes, a decorator needs additional information to determine what it should do with the function it receives. Using the older decorator syntax, or when decorating arbitrary functions, this task is fairly easy to perform. Simply declare the decorator to accept additional arguments for the required information so they can be supplied along with the function to be wrapped. >>> def test(a, b): ... return a + b ... >>> def decorate(func, prefix='Decorated'): ... def wrapped(*args, **kwargs): ... return '%s: %s' % (prefix, func(*args, **kwargs)) Chapter 2 ■ Django Is python 27 ... return wrapped ... >>> simple = decorate(test) >>> customized = decorate(test, prefix='Custom') >>> simple(30, 5) 'Decorated: 35' >>> customized(27, 15) 'Custom: 42' However, the Python 2.4 decorator syntax complicates things. When using this new syntax, the decorator always receives just one argument: the function to be wrapped. There is a way to get extra arguments into decorators, but first we’ll need to digress a bit and talk about “partials.” Partial Application of Functions Typically, functions are called with all the necessary arguments at the time the function should be executed. Sometimes, however, arguments may be known in advance, long before the function will be called. In these cases, a function can have one or more of its arguments applied beforehand so that the function can be called with fewer arguments. For this purpose, Python 2.5 includes the partial object as part of its functools module. It accepts a callable along with any number of additional arguments and returns a new callable, which will behave just like the original, only without having to specify those preloaded arguments at a later point. >>> import functools >>> def add(a, b): ... return a + b ... >>> add(4, 2) 6 >>> plus3 = functools.partial(add, 3) >>> plus5 = functools.partial(add, 5) >>> plus3(4) 7 >>> plus3(7) 10 >>> plus5(10) 15 For versions of Python older than 2.5, Django provides its own implementation of partial in the curry function, which lives in django.utils.functional. This function works on Python 2.3 and greater. Back to the Decorator Problem As mentioned previously, decorators using the Python 2.4 syntax present a problem if they accept additional arguments, since that syntax only provides a single argument on its own. Using the partial application technique, it’s possible to preload arguments even on a decorator. Given the decorator described earlier, the following example uses curry (described in Chapter 9) to provide arguments for decorators using the newer Python 2.4 syntax. >>> from django.utils.functional import curry >>> @curry(decorate, prefix='Curried') ... def test(a, b): Chapter 2 ■ Django Is python 30 This requires that all arguments passed to the decorator be passed as keyword arguments, which generally makes for more readable code. One downside is how much boilerplate would have to be repeated for each decorator that uses this approach. Thankfully, like most boilerplate in Python, it’s possible to factor it out into a reusable form, so new decorators can be defined more easily, using yet another decorator. The following function can be used to decorate other functions, providing all the functionality necessary to accept arguments, or it can be used without them. >>> def optional_arguments_decorator(real_decorator): ... def decorator(func=None, **kwargs): ... # This is the decorator that will be ... # exposed to the rest of your program ... def decorated(func): ... # This returns the final, decorated ... # function, regardless of how it was called ... def wrapper(*a, **kw): ... return real_decorator(func, a, kw, **kwargs) ... return wrapper ... if func is None: ... # The decorator was called with arguments ... def decorator(func): ... return decorated(func) ... return decorator ... # The decorator was called without arguments ... return decorated(func) ... return decorator ... >>> @optional_arguments_decorator ... def decorate(func, args, kwargs, prefix='Decorated'): ... return '%s: %s' % (prefix, func(*args, **kwargs)) ... >>> @decorate ... def test(a, b): ... return a + b ... >>> test(13, 17) 'Decorated: 30' >>> test = decorate(test, prefix='Decorated again') >>> test(13, 17) 'Decorated again: Decorated: 30' This makes the definition of individual decorators much simpler and more straightforward. The resulting decorator behaves exactly like the one in the previous example, but it can be used with or without arguments. The most notable change that this new technique requires is that the real decorator being defined will receive the following three values: • func—The function that was decorated using the newly generated decorator • args—A tuple containing positional arguments that were passed to the function • kwargs—A dictionary containing keyword arguments that were passed to the function An important thing to realize, however, is that the args and kwargs that the decorator receives are passed as positional arguments, without the usual asterisk notation. Then, when passing them on to the wrapped function, the asterisk notation must be used to make sure the function receives them without having to know about how the decorator works. Chapter 2 ■ Django Is python 31 Descriptors Ordinarily, referencing an attribute on an object accesses the attribute’s value directly, without any complications. Getting and setting attributes directly affects the value in the object’s instance namespace. Sometimes, additional work has to be done when accessing these values. Retrieving data from a complicated source, such as a database or configuration file• Transforming a simple value to a complicated object or data structure• Customizing a value for the object it’s attached to• Converting a value to a storage-ready format before saving to a database• In some programming languages, this type of behavior is made possible by creating extra instance methods for accessing those attributes that need it. While functional, this approach leads to a few problems. For starters, these behaviors are typically more associated with the type of data stored in the attribute than some aspect of the instance it’s attached to. By requiring that the object supply additional methods for accessing this data, every object that contains this behavior will have to provide the necessary code in its instance methods. One other significant issue is what happens when an attribute that used to be simple suddenly needs this more advanced behavior. When changing from a simple attribute to a method, all references to that attribute also need to be changed. To avoid this, programmers in these languages have adopted a standard practice of always creating methods for attribute access so that any changes to the underlying implementation won’t affect any existing code. It’s never fun to touch that much of your code for a change to how one attribute is accessed, so Python provides a different approach to the problem. Rather than requiring the object to be responsible for special access to its attributes, the attributes themselves can provide this behavior. Descriptors are a special type of object that, when attached to a class, can intervene when the attribute is accessed, providing any necessary additional behavior. >>> import datetime >>> class CurrentDate(object): ... def __get__(self, instance, owner): ... return datetime.date.today() ... def __set__(self, instance, value): ... raise NotImplementedError("Can't change the current date.") ... >>> class Example(object): ... date = CurrentDate() ... >>> e = Example() >>> e.date datetime.date(2008, 11, 24) >>> e.date = datetime.date.today() Traceback (most recent call last): ... NotImplementedError: Can't change the current date. Creating a descriptor is as simple as creating a standard new-style class (by inheriting from object under Python 2.x), and specifying at least one of the following methods. The descriptor class can include any other attributes or methods as necessary to perform the tasks it’s responsible for, while the following methods constitute a kind of protocol that enables this special behavior. Chapter 2 ■ Django Is python 32 __get__(self, instance, owner) When retrieving the value of an attribute (value = obj.attr), this method will be called instead, allowing the descriptor to do some extra work before returning the value. In addition to the usual self representing the descriptor object, this getter method receives two arguments. • instance—The instance object containing the attribute that was referenced. If the attribute was referenced as an attribute of a class rather than an instance, this will be None. • owner—The class where the descriptor was assigned. This will always be a class object. The instance argument can be used to determine whether the descriptor was accessed from an object or its class. If instance is None, the attribute was accessed from the class rather than an instance. This can be used to raise an exception if the descriptor is being accessed in a way that it shouldn’t. Also, by defining this method, you make the descriptor responsible for retrieving and returning a value to the code that requested it. Failing to do so will force Python to return its default return value of None. Note that, by default, descriptors don’t know what name they were given when declared as attributes. Django models provide a way to get around this, which is described in Chapter 3, but apart from that, descriptors only know about their data, not their names. __set__(self, instance, value) When setting a value to a descriptor (obj.attr = value), this method is called so that a more specialized process can take place. Like __get__, this method receives two arguments in addition to the standard self. • instance—The instance object containing the attribute that was referenced. This will never be None. • value—The value being assigned. Also note that the __set__ method of descriptors will only be called when the attribute is assigned on an object and will never be called when assigning the attribute on the class where the descriptor was first assigned. This behavior is by design, and prohibits the descriptor from taking complete control over its access. External code can still replace the descriptor by assigning a value to the class where it was first assigned. Also note that the return value from __set__ is irrelevant. The method itself is solely responsible for storing the supplied value appropriately. Keeping Track of Instance Data Since descriptors short-circuit attribute access, you need to take care when setting values on the attached object. You can’t simply set the value on the object using setattr; attempting to do so will call the descriptor again, resulting in infinite recursion. Python provides another way to access an object’s namespace: the __dict__ attribute. Available on all Python objects, __dict__ is a dictionary representing all values in the object’s namespace. Accessing this dictionary directly bypasses all of Python’s standard handling with regard to attributes, including descriptors. Using this, a descriptor can set a value on an object without triggering itself. Consider the following example. >>> class Descriptor(object): ... def __init__(self, name): ... self.name = name ... def __get__(self, instance, owner): ... return instance.__dict__[self.name] ... def __set__(self, instance, value): Chapter 2 ■ Django Is python 35 Checking for Specific Types More often, you’ll need to check for the influence of a particular type, whether a class descends from it or whether an object is an instance of it. This is a much more robust solution than using type, as it takes class inheritance into account when determining success or failure. Python provides two built-in functions for this purpose. • issubclass(cls, base)—Returns True if cls and base are the same, or if cls inherits from base somewhere in its ancestry • isinstance(obj, base)—Tests if the object is an instance of base or any of its ancestors >>> class CustomDict(dict): ... pass # Pretend there's something more useful here ... >>> issubclass(CustomDict, dict) True >>> issubclass(CustomDict, CustomDict) True >>> my_dict = CustomDict() >>> isinstance(my_dict, dict) True >>> isinstance(my_dict, CustomDict) True There’s a clear relationship between issubclass and isinstance: isinstance(obj, SomeClass) is equivalent to issubclass(obj.__class__, SomeClass). Function Signatures As described earlier in this chapter, Python functions can be declared in a number of ways, and it can be quite useful to have access to information about their declarations directly inside your code. Of particular importance when inspecting functions is inspect.getargspec(), a function that returns information about what arguments a function accepts. It accepts a single argument, the function object to be inspected, and returns a tuple of the following values: • args—A list of all argument names specified for the function. If the function doesn’t accept any arguments, this will be an empty list. • varargs—The name of the variable used for excess positional arguments, as described previously. If the function doesn’t accept excess positional arguments, this will be None. • varkwargs—The name of the variable used for excess keyword arguments, as described previously. If the function doesn’t accept excess keyword arguments, this will be None. • defaults—A tuple of all default values specified for the function’s arguments. If none of the arguments specify a default value, this will be None rather than an empty tuple. Together, these values represent everything necessary to know how to call the function in any way possible. This can be useful when receiving a function and calling it with just the arguments that are appropriate for it. Chapter 2 ■ Django Is python 36 >>> def test(a, b, c=True, d=False, *e, **f): ... pass ... >>> import inspect >>> inspect.getargspec(test) ArgSpec(args=['a', 'b', 'c', 'd'], varargs='e', keywords='f', defaults=(True, False)) Handling Default Values As the previous example illustrates, default values are returned in a separate list from argument names, so it may not seem obvious how to tell which arguments specify which defaults. However, there’s a relatively simple way to handle this situation, based on a minor detail from the earlier discussion of excess arguments: required arguments must always be declared before any optional arguments. This is key because it means the arguments and their defaults are specified in the order they were declared in the function. So in the previous example, the fact that there are two default values means that the last two arguments are optional, and the defaults line up with them in order. The following code could be used to create a dictionary mapping the optional argument names to the default values declared for them. >>> def get_defaults(func): ... args, varargs, varkwargs, defaults = inspect.getargspec(func) ... index = len(args) - len(defaults) # Index of the first optional argument ... return dict(zip(args[index:], defaults)) ... >>> get_defaults(test) {'c': True, 'd': False} Docstrings As mentioned previously, classes and functions all have a special __doc__ attribute, which contains the actual string specified as the code’s docstring. Unfortunately, this is formatted exactly as it was in the original source file, including extra line breaks and unnecessary indentation. To format docstrings in a more readable manner, Python’s inspect module provides another useful function, getdoc(). It removes unnecessary line breaks, as well as any extra indentation that was a side effect of where the docstring was written. The removal of indentation merits a bit of explanation. Essentially, getdoc() finds the leftmost non-whitespace character in the string, counts up all the whitespace between that character and the start of the line it’s in, and removes that amount of whitespace from all the other lines in the docstring. This way, the resulting string is left- justified but retains any additional indents that exist for the sake of formatting the documentation. >>> def func(arg): ... """ ... Performs a function on an argument and returns the result. ... ... arg ... The argument to be processed ... """ ... pass ... >>> print(func.__doc__) Performs a function on an argument and returns the result. Chapter 2 ■ Django Is python 37 arg The argument to be processed >>> print(inspect.getdoc(func)) Performs a function on an argument and returns the result. arg The argument to be processed In situations where docstrings should be displayed to users, such as automated documentation or help systems, getdoc() provides a useful alternative to the raw docstring. Applied Techniques There are innumerable combinations of Python features that can be used to accomplish a vast multitude of tasks, so the few shown here should by no means be considered an exhaustive list of what can be done by combining the many features of Python. However, these are useful tactics in terms of Django, and serve as a solid basis for the other techniques listed throughout this book. Tracking Subclasses Consider an application that must, at any given time, have access to a list of all subclasses of a particular class. Metaclasses are a terrific way to go about this, but they have one problem. Remember, each class with a metaclass option will be processed, including this new base class, which doesn’t need to be registered (only its subclasses should be registered). This requires some extra handling, but it’s fairly straightforward: >>> class SubclassTracker(type): ... def __init__(cls, name, bases, attrs): ... try: ... if TrackedClass not in bases: ... return ... except NameError: ... return ... TrackedClass._registry.append(cls) ... >>> class TrackedClass(metaclass=SubclassTracker) ... _registry = [] ... >>> class ClassOne(TrackedClass): ... pass ... >>> TrackedClass._registry [<class '__main__.ClassOne'>] >>> class ClassTwo(TrackedClass): ... pass ... >>> TrackedClass._registry [<class '__main__.ClassOne'>, <class '__main__.ClassTwo'>] Chapter 2 ■ Django Is python 40 class MinimumLength(PasswordValidator): def validate(self, password): "Raises ValueError if the password is too short." if len(password) < 6: raise ValueError('Passwords must be at least 6 characters.') class SpecialCharacters(PasswordValidator): def validate(self, password): "Raises ValueError if the password doesn't contain any special characters." if password.isalnum(): raise ValueError('Passwords must contain at least one special character.') Yes, it really is that easy! Here’s how these plugins would look in practice. >>> for password in ('pass', 'password', 'p@ssword!'): ... print(('Checking %r...' % password), end=' ') ... if is_valid_password(password): ... print('valid!') ... else: ... print() # Force a new line ... for error in get_password_errors(password): ... print(' %s' % error) ... Checking 'pass'... Passwords must be at least 6 characters. Passwords must contain at least one special character. Checking 'password'... Passwords must contain at least one special character. Checking 'p@ssword!'... valid! Now What? With a solid understanding of what Python has to offer, you’re ready to dive into some of the ways Django uses these tools for many of its features and how you can apply the same techniques in your own code. Forming the foundation of most Django applications, models make use of many of these advanced Python features. 41 Chapter 3 Models Data is at the center of most modern Web applications, and Django aims to provide support for a variety of data structures and persistence options. Models are the primary aspect of the traditional MVC model that Django uses as expected. Models are an essential part of any application that needs to persist data across multiple requests, sessions or even server instances. Django models are defined as standard Python classes, with a wealth of additional features added in automatically. Behind the scenes, an object-relational mapper (ORM) allows these classes and their instances access to databases. Without this ORM, developers would be required to deal with the database directly, using Structured Query Language (SQL), the standard way to access content in databases. The primary goal of SQL is to describe and access the relationships that are stored in a relational database. SQL does not generally provide high-level relationships for applications, so most applications include handwritten SQL for data activities. This is definitely possible, but it tends to lead toward lots of repetition, which in and of itself violates the DRY principle outlined in Chapter 1. These bits of SQL littered throughout an application’s code quickly become unmanageable, especially since the programmers who have to manage the code aren’t typically experts in relational databases. That also means that these databases are quite prone to bugs, which are often troublesome to track down and fix. That still doesn’t factor in the biggest issue of all: security. SQL injection1 attacks are a common way for malicious attackers to access or even modify data they shouldn’t have access to. This occurs when hand-written SQL doesn’t take appropriate precautions with regard to the values that are passed into the database. The more SQL statements that are written by hand, the more likely they are to be susceptible to this type of attack. All of these problems are extremely common in Web development, regardless of language, and ORMs are a common way for frameworks to mitigate them. There are other ways to avoid some of these problems, such as SQL injection, but Django’s ORM was written with these concerns in mind and handles much of it behind the scenes. By accessing data using standard Python objects, the amount of SQL is minimized, reducing the opportunity for problems to crop up. How Django Processes Model Classes Described in Chapter 2, one of Django’s most recognizable features is its declarative syntax for model definitions. With this, model definitions can be simple and concise, while still providing a vast array of functionality. The basic process of using metaclasses for declarative syntax is described in detail in Chapter 2, but there are more specific steps taken when handling models, which deserve some extra attention. The metaclass responsible for processing model definitions is ModelBase, living at django.db.models.base. This provides a few key features, listed here in the order in which the actions are performed. 1http://prodjango.com/sql-injection/ Chapter 3 ■ Models 42 1. A new class is generated to be used for the actual model, preserving the module location where the original model was defined. 2. If a custom app_label wasn’t provided for the model, it’s determined based on the module where it was declared. 3. Meta options are pulled out of the model and placed in a special Options object, which is described in more detail later in this chapter. 4. Two special exception classes, DoesNotExist and MultipleObjectsReturned, are created and customized for the new model. 5. A default manager is assigned to the model if one wasn’t already provided. 6. If the model was already defined—which can happen because of differences in how the module was imported at different stages—the existing model is retrieved from the application cache and returned, making sure that the same class object is always used. 7. Attributes and methods defined on the original model are added to the newly-created model class. 8. Settings from inherited parent models are set on the new model. 9. The new model is registered with the application cache for future reference. 10. The newly-created model is returned to be used in place of the class that was defined in the source file. Abstract models and inherited models are special cases, where not all of these actions occur. Specific differences for these cases are covered later in this chapter. Setting Attributes on Models Python provides useful tools for getting and setting attributes on objects without knowing the name in advance, but while getattr() and setattr() represent the standard way of accessing attributes on objects, one of Django’s hooks for model fields requires some additional handling. Django provides a class method, add_to_class(), on all of its models, which should be used as a substitute for setattr(). The syntax and semantics of add_to_class() are slightly different than the traditional functions. It’s actually a class method, rather than a built-in or even module-level function, which means the class is provided implicitly, rather than being an explicit first argument. This method checks the provided value for the presence of a contribute_ to_class() method, and calls it if it exists. Otherwise, the standard setattr() function is used to add the value to the model. These behaviors are mutually exclusive; only one will happen in a given add_to_class() call. It’s important to realize that this isn’t just for Django’s own internal code. If an application has need to add arbitrary objects as attributes to models, they must call add_to_class(). This way, developers working with the application can pass any object in, and be assured that it will be handled the same as if it had been applied directly on the model’s class definition. This whole process changes what the classes look like when using the introspection techniques described in Chapter 2. In order to determine the declared fields, the database table being used or the display name for the model, some additional knowledge is required. Getting Information About Models Once a model has been processed by Python, along with Django’s ModelBase metaclass, its original structure can still be determined by using an attribute that exists on every Django model and its instances called _meta. Chapter 3 ■ Models 45 >>> from django.utils.text import capfirst >>> for field in Product._meta.fields: ... print('%s: %s' % (capfirst(field.verbose_name), field.__class__)) ... ID: <class 'django.db.models.fields.AutoField'> SKU: <class 'django.db.models.fields.CharField'> Name: <class 'django.db.models.fields.CharField'> Price: <class 'django.db.models.fields.DecimalField'> >>> Product._meta.get_field('name').__class__ <class 'django.db.models.fields.CharField'> Primary Key Fields Any field can be specified as a primary key, by setting primary_key=True in the field’s definition. This means that if code is to handle a model or a model instance without prior knowledge of its definition, it’s often necessary to identify which field was defined as a primary key. Much like getting a field by name, it would be possible to just iterate over all the fields, looking for one with its primary_key attribute set to True. After all, Django only allows one field to be specified as a primary key. Unfortunately, this again introduces a fair amount of boilerplate that slows things down and makes it more difficult to maintain. To simplify this task, Django provides another _meta attribute, pk, which contains the field object that will be used as the primary key for the model. This is also faster than iterating over all the fields, since pk is populated once, when the model is first processed. After all, Django needs to determine whether it needs to provide an implicit primary key. The _meta.pk attribute is also used to enable the pk shortcut property on model instances, which returns the primary key value for an instance, regardless of which field is the primary key. Typically, models don’t need to declare an explicit primary key, and can instead let Django create one automatically. This can be a useful way to avoid repeating such a common declaration, while still allowing it to be overridden if necessary. One potential problem with this, however, is the task of determining whether a model was given an automatic field, and what that field looks like. It’s possible to make certain assumptions about a model, based on how Django provides this automatic field, and what it would typically look like. However, it’s easy to create a custom field that looks a lot like the implicit field, and it’d be very difficult to tell the difference if your code only looks at its structure and options. Instead, Django provides two attributes on the _meta attribute that help with this situation. The first, _meta.has_auto_field, is True if the model let Django provide an id field implicitly. If it’s False, the model has an explicit primary key, so Django didn’t have to intervene. The second attribute related to the automatic primary key field is _meta.auto_field, which will be the actual field object Django provided for use as the primary key. If _meta.has_auto_field is True, this will be an AutoField, and will always be configured the same way for all models that use it. It’s important to look at this attribute instead of making assumptions about the field’s structure, in case Django makes any changes in the future. It’s an easy way to help make sure your application keeps working properly in the future. If a model provides its own primary key field, and thus _meta.has_auto_field is False, _meta.auto_field will be set to None. Configuration Options In addition to providing access to the fields declared on the model, _meta also acts as a container for all the various options that can be set on a model using the Meta inner class. These options allow a model to control a variety of things, such as what the model is named, what database table it should use, how records should be ordered, and a number of others. These options all have defaults, so that even those attributes that aren’t specified on the model are still available through the _meta attribute. The following is a list of the many options that are available in this way, along with their default values and a brief description what the option is intended for. Chapter 3 ■ Models 46 • abstract—A Boolean that indicates whether the model was defined as abstract, a process that is described in more detail in Django’s model inheritance documentation.2 The default value is False. • app_label—A string containing the name Django uses to recognize the application where the model was defined. It’s easiest to understand what this means by looking at the default value, which is the name of the module containing the models.py the model is specified in. For a model located at corporate.accounts.models.Account, the app_label would be "accounts". • db_table—The name of the database table that Django will use to store and retrieve data for the model. If not defined explicitly, it’s determined as a function of the model’s name and location. That is, the db_table for a model called Account with an app_label of accounts would be "accounts_account". • db_tablespace—In the case of Oracle, and perhaps other database backends in the future, tables can be placed in different parts of the disk, or different disks entirely. By default, this is simply an empty string, which tells the database to store the table in its default location. This option is ignored for backends that don’t support it. • get_latest_by—The name of a date-based field, such as a DateField or a DateTimeField, which should be used to determine the most recent instance of a model. If not provided, this will be an empty string. • order_with_respect_to—An instance of a field relating to another model, which is used when ordering instances of this model. This defaults to None, which implies that the model’s ordering is determined solely by fields within the model itself, rather than any related models. • ordering—A tuple containing the names of fields to be used when ordering instances of the model. By default, this is an empty tuple, which relies on the database to determine the ordering of model instances. • permissions—A sequence of tuples of additional permissions to be added to the model. Each tuple in the sequence contains two values, the first being the name of the permission to be used in code and in the database, and the second being the text to be displayed in the admin interface when selecting permissions for a user or group. • unique_together—A sequence of tuples indicating any groups of fields that must, when combined, be used in only one record in the database. Each tuple in the sequence contains the names of the fields that must be unique together for a particular index. Multiple tuples don’t have any relation to each other; they each represent a separate index at the database level. • verbose_name—The display name for a single instance of the model. By default, this is determined by the name of the class itself, by splitting up each capitalized portion into a separate uncapitalized word; Article would become "article", while AddressBook would become "address book". • verbose_name_plural—The display name for multiple instances of the model. By default, this will be simply the verbose_name with an “s” at the end. Article would be "articles" and AddressBook would be "address books". • verbose_name_raw—The raw, untranslated version of verbose_name. Occasionally, it’s necessary to use the same display name for everyone, without Django applying a translation. This is particularly useful when storing it away in the cache or database for later access, especially if it’ll be translated at a later point in time. 2http://prodjango.com/model-inheritance/ Chapter 3 ■ Models 47 Accessing the Model Cache Once models have been processed by the ModelBase metaclass, they’re placed in a global registry called AppCache, located at django.db.models.loading. This is instantiated automatically, immediately when the module is imported, and is accessed using the name cache. This special cache provides access to the various models that are known to Django, as well as installs new ones if necessary. Because ModelBase handles registration of new models whenever the class is processed by Python, the models it contains aren’t guaranteed to be part of applications present in the INSTALLED_APPS setting. This fact makes it even more important to remember that the _meta attribute on the model contains an installed attribute indicating whether the model belongs to an installed application. Whenever code accesses one of the features in this section, AppCache will automatically load applications that are listed in INSTALLED_APPS, making sure that whenever some of the features are accessed, the cache includes all applications and models that should be made available. Without this, the results of these methods would be wildly unpredictable, based solely on which applications were loaded in which order. As might seem obvious, the application cache can only be fully populated once all the applications have been loaded. Therefore, if an application’s models.py makes any calls to AppCache as part of this loading process, it’s possible that the cache might not be fully populated yet. To protect against this problem, AppCache provides a method to determine whether the cache itself has been populated and is ready to be accessed. Calling cache.app_cache_ready() will return True or False depending on whether all of the installed applications have been processed correctly. Using this, applications that could benefit from having their own cache of known models can check if this cache is available for that purpose. If so, it can use this cache directly, while if not, it can manually determine what it needs to know. Retrieving All Applications When looking to introspect a site’s contents, it’s also very useful to look at the structure of applications themselves. After all, looking at models is only useful if there are models to look at, and sometimes it’s necessary to just collect all the models currently in use. It’s also useful to have them arranged by the application that declares them. Django already needs to have this information handy, so AppCache is designed to specifically manage this information. hOW DOeS DJaNGO See appLICatIONS? one important thing to keep in mind is that django needs an object to use as a reference for the application. a django application is essentially a standard python package, which is just a collection of modules contained in a single folder. While python provides an object to use as a reference for individual modules, it doesn’t offer anything to refer to a package. Because of this, the closest notion django can have to an application object is the __init__.py module that python uses to recognize it as a package. In that case, django would be using a module object as an application reference. Unfortunately, few projects store anything useful in __init__.py, so django isn’t likely to find anything of interest in it. In order to get at anything really useful, it would have to perform some extra work to traverse the package structure to get a module that contained some pertinent information. Instead, since django has to use a module object anyway, it makes more sense to use a module that contains useful information right off the bat. For the majority of applications, the most useful module in a package is models.py, where all the django models are defined. therefore, django uses this module to recognize an application. some of the following methods return an application, and in each case, it returns the models module within the application’s package. Chapter 3 ■ Models 50 Unfortunately, like many uses of simple iteration, that would require the loop to check each individual object in the module to see if it is in fact a model or if it’s something else entirely. After all, Python modules can contain anything, and many models make use of tuples and module-level constants to help do their work, so there’s no guarantee that each item in the module’s namespace is in fact a Django model. Instead, cache.get_models() retrieves a list of proper Django models that are specific to the given application module. It’s no coincidence that both cache.get_apps() and cache.get_app() return application modules; cache.get_models() is suitable for use with both of these methods. That means that a list of models can be retrieved even without an application, but knowing the application in advance reduces the number of models retrieved. The following code demonstrates how these techniques can be used in combination to retrieve a list of models for each of the known applications in use on the site. >>> from django.db.models.loading import cache >>> for app in cache.get_apps(): ... app_label = app.__name__.split('.')[-2] ... for model in cache.get_models(app): ... print('%s.%s' % (app_label, model.__name__)) ... admin.LogEntry auth.Message auth.Group auth.User auth.Permission contenttypes.ContentType sessions.Session sites.Site news.News customers.Customer callcenter.Agent callcenter.Call callcenter.Case As an additional option, get_models() can also be called with no argument, which will cause it to return all the models that are known to AppCache. This is a useful shortcut to avoid some of the overhead associated with the extra loop in this example, as a quick way to grab all the models. There’s a catch, however. When using get_models() directly, with no argument, all registered models are returned. This may sound like a great idea, and sometimes it is, but remember that AppCache registers all models as they’re encountered, regardless of where they were found. The full list may include models that aren’t part of an installed application. Contrast that with the get_apps()/get_models() combination, which only retrieves models if their applications are found in the INSTALLED_APPS setting. In practice, get_models() may return different results if called without an argument than if it were called with each of the applications returned from get_apps(). Typically, this could mean that an application may get access to extra models that it might not want to know about. Sometimes this is indeed the desired behavior, but it’s always important to understand the difference. One way a model could be in AppCache, but not be installed, is if the application is imported from a separate, installed application, which would cause its model classes to be processed by Django and registered, regardless of whether or not it was in INSTALLED_APPS. Also, if any model specifies an app_label on its Meta class and that application label doesn’t match up with any installed application, the same situation would occur. If an application does wish to access all the models, regardless of whether they’re installed or not, remember that it can use the _meta.installed attribute to identify which models were installed properly. Chapter 3 ■ Models 51 Sometimes, the name of both the application and the model are provided, perhaps as part of a URL or other configuration. In these cases, it doesn’t make much sense to iterate over all the models for the given application. For this case, AppCache provides another method, get_model(), which retrieves a model class based on an application label and model name. The application name is case-sensitive, but the model name isn’t. >>> from django.db.models.loading import cache >>> cache.get_model('auth', 'user') <class 'django.contrib.auth.models.User'> Using Model Fields One of the most important aspects of models is the set of fields that are available to hold data. Without fields, a model would just be an empty container with no way to do anything useful. Fields provide a way to organize a model’s values and validate against specific data types, providing a bridge between the database and native Python data types. Normally, when accessing a field as an attribute of a model instance, the value will be a standard Python object representing the value found in the database. Previous sections in this chapter have described a variety of ways to get access to the actual field objects themselves, rather than this converted value. There are a variety of useful things that can be done with field objects. Common Field Attributes Different field types will have different attributes according to their needs, but there are several attributes that are common across most built-in Django fields. These can be used to generically access various details of fields, and by association, the values and behaviors they’re meant to interface with. Note that there are more attributes used internally than those listed here, but these are the most useful and stable, and will provide the greatest value to applications looking to work with fields. The descriptions listed here are how Django itself uses these attributes, and how developers will expect them to behave. Other applications will likely find use for them as well, to control certain types of behaviors, so the following descriptions will help illustrate their intended usage. Some applications may find uses that are slightly different from what Django itself expects to use them for, but the general semantics of the values should remain intact. Remember that developers will build their expectations for these values based on how Django itself behaves, and third-party applications should avoid violating these expectations. • attname—The name of the attribute on model instances where the database-related value is stored. This is typically the same as the name attribute, for simple cases where the value from the database is stored directly on the model. In other cases, it’s more appropriate to expose a more complex object, such as another model instance, to other code when the actual field name is accessed. For those cases, attname and name will be different, with the attribute referenced by name being the complex object, while the attribute referenced by attname contains the raw data required to create it. • blank—A Boolean value indicating whether the field must have a value supplied when using a form generated automatically based on the model. This is purely validation-related behavior; the null attribute controls whether a model can actually be saved in the database without a value for the given field. • choices—A sequence of 2-tuples indicating the valid choices for the field. The first item in each tuple is the actual value that would be stored in the database if selected, while the second item is the text that will be displayed to the user for that value. Chapter 3 ■ Models 52 • column—The name of the database column that will be used to hold the field’s value. This will either match db_column, if the field declared its database column explicitly, or will have been generated automatically, based on the field’s name. Normally, this can be ignored, since Django manages the database interaction directly, but some applications may have need to communicate directly with the database or interface with some other database adapter that will need this information. • db_column—The name explicitly supplied as the database column name for the field’s values. This is different from column in that db_column refers to what the model itself declares, rather than what will actually be used. This will only have a value if the model field specified its db_column argument explicitly; it will be None otherwise. • db_index—A Boolean indicating whether the field was declared to have an index created for it in the database. This only indicates whether the field was configured to instruct Django to create the index. Other indexes may have been added directly in the database itself, which won’t necessarily be reflected in the value of this attribute. • db_tablespace—The tablespace directive indicating where the field’s data will be stored. Currently only supported for the Oracle backend, the format of its contents will depend on which database backend is in place. It will always have a string value, defaulting to the value of the DEFAULT_INDEX_TABLESPACE setting if not set explicitly. • default—The default value for the field, to be used if no value has yet been supplied to the field itself. In addition to being inserted into the database in such a case, this value will be used as the field’s initial value for any forms generated based on the model. The type of value stored in this attribute will be whatever native Python data type the field is intended to interact with, such as a string or an integer. • description—A simple text description of the field or its purpose. A docstring is generally useful as well, but this description can be used when displaying information about the field inside an application, such as admindocs. • editable—A Boolean indicating whether the field should be presented to users for editing when generating forms based on the model. This doesn’t make the field itself read-only from within Python so this is far from a guarantee that the field won’t be edited. It’s simply a directive to control the default behavior of forms, though other applications can—and should—use it to control other behaviors as well, if they provide editing capabilities. • empty_strings_allowed—A Boolean indicating whether the field allows an empty string as a possible value. This isn’t an option specified as the configuration of a specific field instance, but is rather defined in the field’s class itself. Many fields, such as CharField and EmailField, treat empty strings separately from None, so this attribute allows backends to decide how to handle empty strings for databases, such as Oracle, that might otherwise lose that distinction. • help_text—The informative text provided in the field definition, to be displayed to users when the field is presented for editing. This will be passed in for forms that are generated based on the model, such as the provided admin interface. • max_length—The maximum length the field’s value can contain. Most string-based fields, such as CharField and EmailField, use this to limit the length of string content, both in form fields and the underlying database column. Other field types, such as IntegerField and DateField, simply ignore it, as it has no meaning in those cases. Chapter 3 ■ Models 55 and blank_choice, a list of tuples containing the values and display text that should be used for the empty options. By default, these arguments are configured so that a single choice of ("", "---------") is included. • get_db_prep_lookup(value, lookup_type, connection, prepared=False)—Returns a representation of the supplied value that’s suitable for comparing against existing values in the database. • get_db_prep_save(value, connection)—Returns a representation of the supplied value that’s suitable to be stored in the database. • get_db_prep_value(value, connection, prepared=False)—Returns a representation of the supplied value that’s ready for general use with the database. This is called internally by both get_db_prep_lookup() and get_db_prep_save(). • get_default()—Returns the default value that would be used for the field. This takes care of all the necessary logic, checking if a default value was provided, executing it if a callable was provided as the default, and differentiating between empty strings and None, for database backends needing that behavior. • get_internal_type()—Returns a string representing a high-level idea of what type of data the field contains. This is primarily used, along with a mapping provided by each database backend, to determine the actual database column to be used. • get_prep_lookup(lookup_type, value)—Like get_db_prep_lookup(), except that this method is used for simpler conversions that don’t require knowing which type of database is used. • get_prep_value(value)—Like get_db_prep_value(), except that this method is used for simpler conversions that don’t require knowing which type of database is used. • has_default()—Returns True if the field has a default value associated with it, or False if the default behavior will be left to the database backend. • pre_save(model_instance, add)—Returns a value for the field just prior to being saved in the database. By default, this simply returns the value that is already set on the supplied model_instance, but it could return a value derived from some other field or perhaps completely unrelated to the instance, such as the current time. The add argument is a Boolean indicating whether the provided instance is being added for the first time. • save_form_data(instance, data)—Stores the supplied data to the appropriate attribute on the supplied instance. This is a shortcut for forms to be able to adequately populate a model instance based on form data. • set_attributes_from_name(name)—Uses the supplied name argument to set the field’s name, attname, column and verbose_name attributes as necessary. This method defers to get_ attname_column() for the attname and column values, while verbose_name is only set here if it wasn’t explicitly defined when instantiating the field. • to_python(value)—Coerces the supplied value to a native Python data type that can be used when accessing the field’s value on a model instance. See its description later in this chapter for further details. • validate(value, instance)—Returns without error if the field’s value is appropriate for the field’s configuration and other data on a model instance, or raises django.core.exceptions. ValidationError otherwise. This is called internally by clean(). • value_from_object(obj)—Returns the field’s value as it appears on the supplied object. Chapter 3 ■ Models 56 Subclassing Fields One of the more useful things that can be done with Django models, particularly with regard to reusable applications, is to tie into a model’s ability to process individual types of fields in a generic fashion. This allows fields themselves to have considerable control over how they interact with the database, what native Python data type is used to access their contents and how they’re applied to the model classes that use them. The majority of this section assumes that the custom field will need to retain much of the same functionality of existing fields, such as interacting with the database and generated forms. There are many other applications, such as the historical records application described in Chapter 11, which use the hooks described in this section to provide much more functionality than just a simple field. The term “field” here is used loosely to describe any object that uses some of these techniques to present itself to a Django developer as something resembling a standard Django model field. In reality, such an object could encapsulate complex relationships, such as a tagging application, or even control the creation of entire new Django models on the fly, based on the model to which they’re assigned. The possibilities are nearly limitless. The key to remember is that Django uses duck typing principles with regard to fields. It simply accesses whatever attributes and methods it expects in each situation, without regard to what those actually do behind the scenes. In fact, there’s not even any requirement that objects be a subclass of django.db.models.fields.Field to make use of these hooks. Inheriting from Field simply provides an easy way to reuse much of the existing functionality, if that behavior is required. Deciding Whether to Invent or Extend One of the first things to consider when writing a new field is whether to try to invent an entire new type of field, starting perhaps from scratch without the aid of Field at all, or to extend some existing field type and inherit much of its behavior. There are advantages and disadvantages to each approach, and which is most appropriate depends very much on the demands of the new field being created. By inheriting from Field or one of its subclasses, most of the behaviors in the following sections will be inherited, potentially reducing the amount of new code the custom field must include. If its behavior is similar to an existing field type, this can be a very useful way not only to cut down on new code, which helps reduce bugs, but also to automatically receive any new or updated functionality provided by Django itself in future releases. After all, by relying on Django itself for much of this behavior, updates to that code will automatically be reflected in the behavior of the custom field. On the other hand, if the new field varies considerably from any existing field type, the standard behaviors will need to be rewritten for its own use anyway, negating any value of inheriting from a parent class. If most—or all—of these behaviors have to be written from scratch, inheriting from an existing field will simply create an extra step in the process Python uses to manage the class, even though that extra step offers little or no benefit. In these cases, it’s best, therefore, to simply start from scratch, implementing just those behaviors that make sense for the custom field, and Django will still process it properly, due to its use of duck typing. Of course, there is some middle ground between the two approaches. For instance, a custom field may interact with a completely unique data type, bearing little resemblance to any existing field types, but it may still store its data in the database like a standard field, and could benefit from reusing many of Django’s more basic field methods, such as assigning names and storing itself in _meta.fields. In these cases, it’s quite reasonable to inherit from Field itself, rather than a specific subclass, and inherit just this most basic functionality. Performing Actions During Model Registration The first step any field goes through is being processed by the ModelBase metaclass, whenever Python encounters a model class that utilizes the field in question. For standard Python objects, this means simply getting assigned to the model class as normal, with no additional processing. Fields take a different path, however, and each field gets the chance to customize how it’s applied to a model class. Chapter 3 ■ Models 57 contribute_to_class(self, cls, name) This is perhaps the most important method a field can contain, as it provides an essential feature: the ability for a field to know what class it was assigned to, and what name it was given. This may seem like a simple requirement, but Python itself doesn’t normally have a way to facilitate this. You may recall that descriptors, described in Chapter 2, have a way to identify what class—and even what instance of that class—was used to access the object, but this is only available at the time the attribute is accessed; there’s still no way to know this information at the time the assignment took place. More importantly, even descriptors don’t provide any way to identify what name was used to access them, which can be a considerable problem when trying to cache information or interact with other features that require the use of a name, such as that of a database column. Instead, by using a metaclass, Django can intercede at the point where Python is processing the class, and use the presence of a contribute_to_class() method to identify objects that need to be handled differently. If this method exists, it’s called instead of the standard setattr(), allowing the field to register itself in whatever way is most appropriate for its purpose. When doing so, Django also provides the class itself as an argument, as well as the name it was given, which was discovered while looking through the attributes assigned to the class. Therefore, in addition to the usual self, this method receives two arguments. • cls—The actual class object of the model the field was assigned to. This can be used to customize the field based on the name or other attributes of the model itself. • name—The name, as a string, of the attribute as it was assigned to the model’s class. Fields will typically store this away as an attribute of the field itself, for future reference. Once these two arguments have been processed in whatever way is appropriate for the field, the method shouldn’t return anything, as its return value is ignored by Django. CONtrIBUte_tO_CLaSS() VS Setattr() there is one very important thing to keep in mind when dealing with contribute_to_class(). It’s been mentioned a few times already in various places, but it’s so important that it merits driving home very explicitly. If django identifies an object as having a contribute_to_class() method, only that method will be called. Normally, setattr() is used to set attributes on an object such as a class, but since model fields don’t get set in the standard namespace, that step is skipped intentionally. therefore, if a custom field does in fact need to be set as an attribute on the model class itself, doing so is the sole responsibility of the field itself, during the execution of its contribute_to_class() method. sometimes, fields will instead need to set some other object, such as a descriptor, as the attribute on the class, to provide additional customizations for other types of access. this, too, is the responsibility of the field class, and the only time to do so in a way that will maintain the appearance of a standard field is during the execution of its contribute_to_class() method. In the case of standard Django fields, and perhaps for many types of custom fields and other objects that behave as fields, this avoidance of setattr() is quite intentional. If that behavior is desired, contribute_to_class() should simply avoid setting anything on the model class, and Django’s own behavior will make sure that nothing is assigned to the class itself. Chapter 3 ■ Models 60 Unfortunately, to_python() is an extra method call that’s not always necessary, so it’s not always called when it seems like it would be. In particular, it’s provided mainly for validating data prior to committing to the database and when retrieving content from serialized data, so when retrieving from the database, it’s assumed that the data has already been validated, and the database backends generally suffice for returning the proper type. Because of this, Django doesn’t call to_python() when retrieving data from the database. For the built-in types, and many potential add-on fields, this is sufficient, but for other data types or complex objects, some more work will be done to convert the database value to something appropriate to work with. To support these types of fields, Django provides a special way to force to_python() to be called when populating the field’s value. Supporting Complex Types with SubfieldBase Sometimes databases just don’t have the necessary data types to support certain types of applications. For example, most databases don’t have a way to store a length of time and present it to Python as a datetime.timedelta4 object. PostgreSQL has a column type called interval5 for this purpose, which does map directly to a Python timedelta as it should, but other databases don’t, which makes this impractical in terms of reusability. It would work suitably for PostgreSQL, but in order to make an application portable, it needs to be usable with more than one database. Thankfully, timedelta stores its values in days, seconds and microseconds, and can write the entire value based on just a number of seconds passed in as a float. Therefore, it’s possible for a new DurationField to use a DecimalField to store a value in the database, convert to a float in Python, then pass it into timedelta for use on the model instance. import datetime import re from django.core.exceptions import ValidationError def to_python(value): if isinstance(value, datetime.timedelta): return value match = re.match(r'(?:(\d+) days?, )?(\d+):(\d+):(\d+)(?:\.(\d+))?', str(value)) if match: parts = match.groups() # The parts in this list are as follows: # [days, hours, minutes, seconds, microseconds] # But microseconds need to be padded with zeros to work properly. parts[4] = groups[4].ljust(6, '0') # And they all need to be converted to integers, defaulting to 0 parts = [part and int(part) or 0 for part in groups] return datetime.timedelta(parts[0], parts[3], parts[4], hours=parts[1], minutes=parts[2]) try: return datetime.timedelta(seconds=float(value)) except (TypeError, ValueError): raise ValidationError('This value must be a real number.') 4http://prodjango.com/timedelta/ 5http://prodjango.com/postgresql-interval/ Chapter 3 ■ Models 61 except OverflowError: raise ValidationError('The maximum allowed value is %s' % \ datetime.timedelta.max) This is the type of process that simply can’t be handled without using to_python(), and it must take place every time the model is instantiated, even when coming from the database. However, calling an extra method call on every access from the database can get quite expensive, so it’s essential to be able to handle this without penalizing those fields that don’t use it. As will be shown at the end of this chapter, a descriptor can be used to customize what happens when a field’s value is accessed, which can be an excellent way to control this type of behavior. Of course, descriptors can be tricky if they’re just a means to an end, and the to_python() behavior described here is a fairly common need for these complex data types, so Django provides a shortcut to ease the creation of this descriptor. Located at django.db.models.fields.subclassing, the SubfieldBase metaclass is Django’s way of easing the creation of model fields whose to_python() method will always be called. By simply applying this to a model class, it takes care of the rest, setting up a descriptor that calls to_python() the first time the field is loaded. Therefore, the DurationField example would use this in the field definition as follows: from django.db import models from django.db.models.fields.subclassing import SubfieldBase class DurationField(models.DecimalField, metaclass=SubfieldBase): pass # Field logic then continues here Controlling Database Behavior Another important aspect of fields is how they interact with the database. This can include how the data itself is stored, how it’s prepared before being sent to the database and how it’s prepared for comparison with values already in the database. This process is already taken by Django itself, with every existing field type providing a few methods to define this behavior. For custom fields, it’s often necessary to override this behavior, interacting with the database in ways other than how Django itself would expect to do so. The following methods define nearly every aspect of how a field works with the database, so fields have a great deal of control over how the database interaction is handled. db_type(self, connection) Rarely overridden by individual fields, this method returns a database-specific string that controls how the column is created for use with the given field. Django internally uses the result of the get_internal_type() method in conjunction with a mapping provided by each individual backend to provide a return value from this method. That functionality is enough for the vast majority of field applications. The most important thing to remember when considering the use of this method is that its return value is specific to a particular database backend. In order to use this field in projects with different backends, the connection argument is provided to help you decide what to use. In a simple case, you can use connection.settings_ dict['ENGINE'] to determine what type of database the field is being used on, and behave accordingly. For example, if DurationField could in fact use interval in PostgreSQL, while still supporting other databases: class DurationField(models.Field): def db_type(self, connection): engine = connection.settings_dict['ENGINE'] Chapter 3 ■ Models 62 if engine == 'django.db.backends.postgresql_psycopg2': return 'interval' else: return connection.creation.data_types['DecimalField'] One other feature of this method is that if you return None instead of a string, Django will skip the creation of this particular field. This can be necessary if the field must be created in a more complicated fashion than a single string can represent. Django will still attempt to reference the column when executing queries, though, so you’ll need to make sure you do in fact create the column before attempting to use this field. Most of time, you’ll want to leave this method to Django, but it does provide a way to override the default behavior when you really need to. Just be careful doing this in a distributed application, because you’ll end up having to support multiple types of databases, not just the one you’re most familiar with. get_prep_value(self, value) There are a few methods that deal with preparing a value for different kids of use within the database, but they typically share the same code for preparing a value for use in the database at all. The get_prep_value() method is used by both of the following methods to perform this basic conversion. In most cases, converting a Python object to some more basic type will suffice to allow a custom field to pass values to the database. By overriding get_prep_value(), the other database preparation methods can typically use their default implementations without issue. For example, DurationField requires this type of conversion, since timedelta objects can’t be passed directly to most databases, which led to using a DecimalField to control the column’s behavior. A custom get_prep_value() method can convert timedelta objects to Decimal values, which can then be passed to the database normally. from django.db import models from django.db.models.fields.subclassing import SubfieldBase from django.utils import _decimal class DurationField(models.DecimalField, metaclass=SubfieldBase): def get_prep_value(self, value): return _decimal.Decimal('%s.%s' % (value.days * 86400 + value.seconds, value.microseconds)) # Field logic then continues here get_db_prep_value(self, value, connection, prepared=False) In cases when you need to prepare the value differently for different database connections, this method will allow you the flexibility to do so. The connection argument again represents the database connection being used, and can be used to make the necessary decisions about how to proceed. The prepared argument indicates whether the value has already been passed through get_prep_value(). If False, you should call that method before proceeding further. Here’s what DurationField could look like if it continued to split up its behavior between PostgreSQL and other databases: from django.db import models from django.db.models.fields.subclassing import SubfieldBase from django.utils import _decimal Chapter 3 ■ Models 65 Dealing with Files Many applications have need to manage content that goes beyond what’s traditionally stored in database. Beyond the usual numbers and strings, there’s a world of other data formats, from audio and video to print-ready Portable Document Format (PDF) files and plenty more. Content like this isn’t well suited for being stored directly in the database—though in some cases it’s at least possible—but it’s still useful to tie it to other content that is in the database. To handle this, Django provides a special FileField, with extra methods designed to facilitate access to files. It also uses many of the hooks described in this chapter to store a reference to the file in the database, as well as provide a special object that can access files in a portable manner. Django also provides an ImageField, which inherits much of its functionality from FileField, while adding some of its own, specifically tailored for dealing with the special needs of images. Subclasses of FileField shouldn’t generally need to override many of its methods, since they’re mostly related to those features of a file that are common to all file types. This includes things like the filename and relative path, which don’t have anything to do with the specifics of a particular type of file. Some, however, such as save_file(), can be overridden to provide special handling of attributes related to a specific type of file. get_directory_name(self) This method simply returns a relative path that will be stored in the database along with the filename. By default, this looks at the upload_to attribute of the field to determine what the directory should be, and even subclasses should respect this behavior. Exactly how that attribute is used, however, is where subclasses can customize this method to great effect. Normally, Django creates a directory name using two pieces of information: the upload_to string itself and the current date. The date the file was uploaded is applied to the directory name, replacing certain characters with portions of the date. This allows individual fields to more accurately control where their files are stored, which helps keep directories smaller, and can possibly even make better use of disk capacity. In a subclass, however, it may be more useful to generate the directory name based on some other type of information, such as the current site’s domain name in multisite setups, or the Internet Protocol (IP) address of the machine where the upload was received, in larger production environments where there are multiple Web servers sharing common storage. Essentially, anything’s fair game here, as long as it only requires information that can be determined by only having access to the FileField instance. The current site or IP address can be obtained without regard to the current model at all, as can the current time. Other information, however, such as the user who submitted the file, the IP address of his or her remote computer, or the object the file will be attached to, is not accessible from this function, and thus can’t be used. Of course, there is another option to specify some of this additional information, but doing so bypasses this method entirely. By specifying a callable for upload_to, as described in Django’s file documentation,7 the directory can be generated based on the object it will be attached to, which may include the User who owns the object. Note that when using a callable as upload_to, that callable is expected to return the entire path, including the directory and filename, so get_directory_name() won’t be called at all in such cases, unless that callable explicitly calls it. Also, the incoming request still isn’t available, even to that callable, so making directory naming decisions based on that information will require a custom view. get_filename(self, filename) This works in much the same way as get_directory_name(), except that it’s responsible for specifying the filename portion of the path instead of the directory. It receives the original filename that was specified with the incoming file, and returns a new filename that will be used in the database, as well as the underlying storage system. 7http://prodjango.com/file-api/ Chapter 3 ■ Models 66 If a FileField subclass has need to customize the filename that will be used for a particular file, such as stripping out certain characters or altering the file’s extension, this would be the place to do it. That’s also why it receives the original filename as well, so that it has a way to create a filename that’s at least partially related to the one provided by the user. By default, its output is combined with that of get_directory_name() to form the full path to be stored in the database and passed to the storage system. Like its counterpart, however, this is only true if the upload_to argument to the field was not a callable. If a callable was specified, it’s responsible for specifying the entire path, including the filename. Therefore, in such cases, this method will only be called if the upload_to callable specifically requests it. generate_filename(self, instance, filename) This is the default method used to generate the entire path. It uses the same function signature as a callable upload_to argument, because it plays the exact same role. In fact, internally to FileField, all references for generating the filename to be used for the file reference this method; if a callable was supplied to upload_to, it’s simply assigned to this same name, replacing the default behavior. The default behavior is to use os.path.join() to combine the output of both the get_directory_name() and get_filename() methods, ignoring the model instance provided as an argument. If a FileField subclass needs the ability to specify the file’s entire path all at once, this method would be the place to do it. Of course, remember that if a callable was supplied as the upload_to argument, this method will get replaced. This is true regardless of what behavior is supplied by a FileField subclass; the needs of a specific instance always win over the behavior of its class. So, while overriding this behavior can provide a more useful default, it doesn’t remove an individual developer’s ability to replace it entirely. save_form_data(self, instance, data) This is a utility method for forms to use as a shortcut for saving a file associated with a model instance. It accepts an instance of the model the field was attached to, as well as the uploaded file data provided by the form. By default, it just extracts the necessary information from the uploaded file object, and passes it through to the standard file saving methods. The instance argument is an instance of the model where the FileField was defined, and the data argument is an UploadedFile object, as described in Chapter 8. The uploaded file contains a name attributes, which contains the filename and a read() method, which is used to access the file’s contents, so that it can be saved properly. As this is the primary way files are handled by most areas of Django itself, overriding this field provides an excellent opportunity to tie into extended functionality based on specific field types. For example, Django’s own ImageField uses this as an opportunity to store the width and height of an image in separate fields, so they can be indexed and searched in the database directly. Other file types could take this same approach, storing certain attributes of the file in other fields for easier access later on. Since this method gets access to the entire file’s contents, it’s possible to pass those contents into most libraries that deal with files. Anything that can read an open file object can process uploaded content by simply wrapping it in a StringIO8 object. That way, the contents can be accessed without having to write them to the storage system first, only to have to read them back again. delete_file(self, instance, sender) While this may look like simply a way to delete a file, it actually serves a very particular purpose, which is alluded to by the presence of a sender argument. The contribute_to_class() method of FileField sets up this method as a listener for the post_delete signal. It’s not intended to be called individually, but instead it gets called every time a model instance with a FileField is deleted. As described for post_delete, the instance argument is the object that was just deleted, and the sender argument is the model class for that instance. 8http://prodjango.com/stringio/ Chapter 3 ■ Models 67 When triggered, it checks to see if the file referenced by this field on the specified instance should be deleted. After all, if no other instances are referencing the same file, and it’s not the default values for new instances, it’s quite likely that no references to the file remain. In those cases, the file is permanently removed from the storage system. The uses for overriding this are clear, because the logic for when to delete the file are included directly within this method. If a FileField subclass needs to have different rules for this, simply overriding this method is enough to make it happen. The obvious example is if files should always remain, for historical reasons, even after the model instances associated with them have been deleted. Providing that behavior is a simple matter of just defining an empty implementation of this method. from django.db import models class PermanentFileField(models.FileField): def delete_file(self, instance, sender, **kwargs): pass Of course, there are other possible use cases for this as well, but the specifics of what those would look like will depend very much on the needs of an individual application. attr_class As a simple attribute, rather than a method, attr_class might not seem like it would provide much power or flexibility. Thankfully, looks are often deceiving, as it’s actually the gateway to some very useful features. The attr_class attribute is set to a class that will be used to represent the field’s value when referenced in Python. That means that the value of this simple attribute is actually the primary way of specifying what features are available on the public API for data entered into a particular FileField instance. The following section describes the behavior of the class specified by default for this attribute, and how its methods can be overridden to provide additional functionality. Customizing the File Class When a model defines a FileField, the value made available as the attribute on actual model instances is a special object designed specifically for managing files. Located at django.db.models.fields.files, the File class provides a number of platform-independent and storage-independent methods for accessing a file’s content and properties of that content, as well as for saving new files and deleting existing ones. Because it’s the public-facing API for accessing files, it’s often quite useful to provide additional functionality for file types that have common qualities that will need to be referenced often. This provides a nice, clean, object- oriented way to encapsulate that common code in one place, rather than requiring the rest of the application to write it over and over again. For example, Django’s own ImageField provides its own subclass, ImageFile, which contains additional methods for accessing the width and height of an image, as well as caching it to speed up subsequent accesses. It’s an excellent example of how easy it is to provide this extra functionality. In addition to providing new methods, though, there are a number of existing methods that could benefit from being overridden. These are a bit less likely to be of use directly, but as ImageFile shows, they can be used to perform some important tasks, such as updating or invalidating cached values. For the most part, the methods described next map directly to file storage methods described in Chapter 8. The main difference is that these are specific to a particular file type, and can be customized for aspects that are unique to that file type, while storage systems are just designed to work with files, without regard to what type of content gets handled. Chapter 3 ■ Models 70 Unlike most of the other signals listed in this section, class_prepared only sends the standard sender argument. Since there isn’t any instance available at the point in time when the signal is fired and the _meta attribute on the new model class contains all the information about how it was declared, the model itself is enough to obtain all the information that’s available at that point in time. >>> from django.db import models >>> def listener(sender, **kwargs): ... print('%s.%s' % (sender._meta.app_label, sender._meta.object_name)) ... >>> models.signals.class_prepared.connect(listener) >>> class Article(models.Model): ... title = models.CharField(max_length=255) ... class Meta: ... app_label = 'news' ... news.Article Like all signals, listeners for class_prepared can be registered with or without a specific model to listen for, though it may not seem like this would be possible. After all, if the listener must be registered prior to the signal being fired, and the signal is fired before Python even continues with the rest of the module, how can it possibly be registered with a class to listen for? Even if it could, what possible purpose could it serve? The answer to both of these questions is contribute_to_class(). Remember that attributes on a model are given the opportunity to customize how they’re applied to the model. When an object with a contribute_to_class() method is encountered, that’s called instead of the usual setattr(), where it’s passed the model class and the attribute name, allowing the object to perform whatever functionality it wants to. The key here is that contribute_to_class() receives the model class as an argument. It makes for an excellent opportunity to register a listener for class_prepared specifically for the class being processed. In fact, depending on the need at hand, this is not only possible, but could be downright essential. Consider a situation where a field-like object needs to know everything about the model it’s attached to in order to properly configure itself. Since there’s no guarantee that all the other fields have been processed by the time contribute_to_class() is called on the object in question, it’s necessary to defer the rest of the configuration until the class has finished processing. pre_init and post_init When a model is instantiated, pre_init fires before any other work is performed. It gets dispatched even before any of the arguments passed into the model are assigned to their appropriate attributes. This is a good opportunity to inspect the arguments that will be assigned to the instance prior to that actually happening, especially since this allows a listener to fire before encountering any errors that might come as a result of the arguments specified. Because this takes place prior to any of the field values being populated on the object itself, it doesn’t send the new object along when the signal is fired. Instead, it passes along two additional arguments besides sender that correspond to the positional and keyword arguments that were passed in to the model. • args—A tuple containing the positional arguments that were passed to the model constructor • kwargs—A dictionary containing the keyword arguments that were passed to model constructor Note that even though these are the same names as those usually given to the excess argument technique described in Chapter 2, these are passed to the listener as explicit keyword arguments, rather than using * and **. Listeners must define these arguments explicitly in order for them to work properly. Chapter 3 ■ Models 71 >>> from django.db.models.signals import pre_init >>> from news.models import Article >>> def print_args(sender, args, kwargs, **signal_kwargs): ... print('%s(*%s, **%s)' % (sender._meta.object_name, args, kwargs)) ... >>> pre_init.connect(print_args, sender=Article) >>> article = Article(title=u'Testing') Article(*(), **{'title': u'Testing'}) Similarly, post_init gets fired as part of the model instantiation process, but at the end instead of the beginning, once all the arguments have been mapped to the appropriate attributes based on the fields that were defined on the model. Therefore, as the name implies, the object is completely initialized at this point. It would make sense, then, that when post_init fires, it gets passed the fully configured model instance as well as the standard sender, which is the model class. The new object is passed in as the instance argument to the listener, which can then do with it whatever is necessary, according to the application. >>> from django.db.models.signals import post_init >>> from news.models import Article >>> def print_args(sender, args, kwargs, **signal_kwargs): ... print('Instantiated %r' % instance) ... >>> post_init.connect(sender=Article) >>> article = Article(title=u'Testing') Instantiated <Article: Testing> pre_save and post_save When a model instance is being committed to the database, Django provides two ways to hook into that process, both at the beginning and at the end. The primary difference, therefore, between the two is that pre_save is called before the object was committed to the database, while post_save is called afterward. This simple distinction can be very important, depending on the needs of the application. When triggered by pre_save, a listener receives the model class as sender, and also the instance of the model as instance. This allows the listener to get access to—and even modify—the instance that’s about to be saved, before it hits the database. This can be a useful way to provide or override default arguments for models provided by third-party applications. On the other hand, post_save is called after the save has been performed, and the instance has been committed to the database. This is a useful step in two ways, because it not only ensures that the data is in fact present in the database, which is necessary when dealing with related models, but it also occurs after Django has made the decision about whether to insert a new record into the database or update an existing record. In addition to the sender and instance arguments that work the same way as in pre_save, listeners for post_save can receive another argument. The created argument is a Boolean indicating whether or not the instance had to be created from scratch. A value of True means it was newly inserted into the database, while False means an existing record was updated. When using the post_save signal to track database changes, this is an important distinction, and can be used to determine the behavior of other applications. To see this in action, see the history example in Chapter 11 of this book. Because a model manager’s create() method does in fact commit a new instance to the database, it fires both of these signals. It’s also safe to assume that any time create() is used, the created argument will be True, but just remember that there may well be other times when that argument is also True. Chapter 3 ■ Models 72 >>> from django.db.models import signals >>> from news.models import Article >>> def before(instance, **kwargs): ... print('About to save %s' % instance) ... >>> signals.pre_save.connect(before, sender=Article) >>> def after(instance, created, **kwargs): ... print('%s was just %s' % (instance, created and 'created' or 'updated')) ... >>> signals.post_save.connect(after, sender=Article) >>> Article.objects.create(title='New article!') About to save New article! New Article! was just created<Article: New article!> a NOte aBOUt COMBINING pre_SaVe() aND pOSt_SaVe() there’s another very important difference between pre_save and post_save, because they’re not always called as a pair. Because pre_save is triggered at the beginning of the process, you can reliably assume that it will always be called every time a save() is initiated. however, post_save only happens at the end, so if anything goes wrong during the save itself, post_save won’t get triggered. this is an important distinction, because it may seem convenient to register a pair of listeners for the model saving signals, expecting that both will always be called every time. While that may be true for the majority of cases, and certainly when nothing goes wrong, things do go wrong sometimes. examples include an entry with a duplicate primary key or other unique column, data being of the wrong type or a timeout connecting to the database. In situations where this type of behavior is required, the only reasonably sane way to go about it is to override the save() method on the model. this allows custom code to be run before and after the actual database interaction, but it also provides a way to identify problems that occurred in the process. In addition, it allows the code a better opportunity to pair the two pieces of functionality more fully, since if something does go wrong, it’s easier to identify, and thus any pending actions can be canceled as a result. pre_delete and post_delete Similar to the previous section in spirit, pre_delete and post_delete are the pair of signals relating to the deletion of model instances. They function almost identically to their saving counterparts, except that they both provide just the sender and instance arguments. When using post_delete, keep in mind that the instance passed in to the listener will have already been removed from the database, so many of its methods will raise exceptions if used. This is especially true if it had previously related to instances of other models. Those relationships will have been lost by the time post_delete is triggered, so any handling of those situations should be done in pre_delete or by overriding the delete() method on the model. If you do override the model’s delete() method, you’ll need to make sure to access the model and its relationships prior to calling the delete() method on the parent class. Once you delete it through the parent class, you’ll be in the same situation as when using the post_delete signal. Also, because the instance will have been deleted, its primary key value will no longer match up with anything in the database. However, in order to more accurately keep track of which object was deleted, the primary key value is left intact on the instance, and can be read using the pk shortcut described earlier in this chapter. Chapter 3 ■ Models 75 Storing Raw Data The first step is to tell Django how to manage the raw data in the database, using a standard field. Since pickled objects are just strings, some form of text field would clearly be prudent, and since there’s no way to know in advance how large the pickled representation will be, the nearly limitless TextField seems like an obvious choice. Of course, given that there will be some extra work going on for this new field, TextField alone won’t suffice. Instead, we’ll create a subclass that inherits the database functionality of TextField, while allowing extra customizations where necessary. Since fields are just Python classes like any other, this works just like you’d expect, but with one addition. In order to interact with the database using a different value than is used to interact with other Python code, the attname attribute needs to be different than the name attribute. This is controlled by a custom get_attname() method. from django.db import models class PickleField(models.TextField): def get_attname(self): return '%s_pickled' % self.name This much alone will suffice for getting the field set up properly for the database. At this point, it’s even possible to assign a PickleField instance to a model and sync it with the database, and the column created will be perfectly usable for the duration of this example. Of course, it only manages the raw data so far; it won’t be able to handle real Python objects at all, much less deal with pickling and unpickling as necessary. Pickling and Unpickling Data To make the translation between a full Python object and a string representation that can be stored in the database, Python’s pickling modules10 will be the tool of choice. There are actually two separate modules provided by Python for this purpose: cPickle, written in C for improved performance, and pickle, written in pure Python for flexibility and portability. There are some minor differences between the two,11 but they can be used interchangeably. Having two modules available makes importing a bit trickier than usual. For obvious reasons, it’s very valuable to have the greater performance when it’s available, but a key aspect of Python and Django is the ability to be used across multiple platforms and environments. Therefore, when looking to import a pickling module, it’s best to try the more efficient module first, falling back to the more portable module when necessary. try: import cPickle as pickle except ImportError: import pickle With a pickle module available, we can give PickleField the ability to actually pickle and unpickle data. By providing a couple basic methods, it’s possible to interface with the underlying module in a more object-oriented manner. In addition, it’s safe to assume that when preparing to commit to the database, the field’s value will be the full Python object, which obviously must be pickled. 10http://prodjango.com/pickle/ 11http://prodjango.com/cpickle/ Chapter 3 ■ Models 76 On the other hand, when using a QuerySet’s filter() method to make comparisons against values in the database, pickled data will be quite useless. It would technically be possible to pickle the query’s value to compare against that found in the database, but it would be comparing the pickled values, not the original Python objects, which could lead to incorrect results. More importantly, even though a pickled value is guaranteed to be unpickled properly when necessary, it’s quite possible that the same value, pickled on different occasions or possibly on different machines, will have different strings representing the original object. This is a documented side effect of the way pickling works, and must be taken into account. With all of this in mind, it’s unreasonable to allow any kind of comparison against pickled data, so an exception should be thrown if such a comparison is attempted. As described previously in this chapter, that behavior is controlled by get_db_pre_lookup(), which can be overridden to throw such an exception. The full field thus far follows: class PickleField(models.TextField): def pickle(self, obj): return pickle.dumps(obj) def unpickle(self, data): return pickle.loads(str(data)) def get_attname(self): return '%s_pickled' % self.name def get_db_prep_lookup(self, lookup_type, value): raise ValueError("Can't make comparisons against pickled data.") Note that pickle and cPickle only support pickled data strings as plain byte strings, not as full Unicode strings. Since everything in Django gets coerced to Unicode wherever possible, including retrieving from the database, unpickle() needs to take the extra step of forcing it back to a byte string in order to be unpickled properly. WhY the eXtra MethODS? It may seem odd to define separate pickle() and unpickle() methods, when the pickling module is already available in the module’s namespace. after all, it’s not only extra lines of code for you, the developer, to write, but it’s also an extra function call that python has to go through to get the job done, which slows things down slightly, and seemingly unnecessarily. the biggest advantage of doing it this way is that if any other application has need to subclass PickleField and wishes to override exactly how the data gets pickled and unpickled, having explicit methods for it makes that process considerably easier. they can just be overridden like normal, and as long as the rest of PickleField just references the methods, the subclass will work quite well. This gets us one step closer, now that PickleField can store values in the database properly. However, it still doesn’t solve the main issue of loading data into a Python object, and doing so only when it’s really necessary. Chapter 3 ■ Models 77 Unpickling on Demand If we weren’t concerned with performance, it’d be easy to perform the unpickling step in the to_python() method and just use SubfieldBase to make sure it happens every time an object is instantiated, regardless of where it came from. Unfortunately, that would incur a good deal of unnecessary overhead for those cases where this field wouldn’t be accessed, so it’s still well worth loading it up on demand, only when it’s requested. As mentioned earlier, Python descriptors are particularly well suited for this scenario. They get called when an attribute is accessed, and can execute custom code at that time, replacing standard Python behavior with something designed for the task at hand. The first step is determining how to instantiate the descriptor, which also means identifying what data it will need in order to get the job done. In order to retrieve the raw data from the model instance properly, it’ll need access to the field object, from which it can gather the name of the field itself. class PickleDescriptor(property): def __init__(self, field): self.field = field That will store references to all the features of the field that will be useful later on. With those in place, it’s possible to write the __get__() and __set__() methods that will actually do the hard work in the long run. Actually, __set__() is the easier of the two to implement; it just has to assign the raw data to the instance’s namespace directly. def __set__(self, instance, value): instance.__dict__[self.field.name] = value setattr(instance, self.field.attname, self.field.pickle(value)) With that in place, the trickiest bit of this whole process is the descriptor’s __get__() method, which must be able to perform the following tasks in order to work properly. Identify whether or not the full Python object needs to be created.• Generate a full Python object, by way of unpickling the raw data, only when necessary.• Cache the generated Python object for future use.• Return the cached copy of the object if it’s available, or the new one otherwise.• That last one’s actually a bit of a red herring, since it’s easy to make sure that a Python object is available at the end of the method, and just return that, without regard to where it came from. The rest, though, may look like quite a laundry list, but it’s really not that difficult to perform all those tasks in a small, readable method. def __get__(self, instance, owner): if instance is None: return self if self.field.name not in instance.__dict__: # The object hasn't been created yet, so unpickle the data raw_data = getattr(instance, self.field.attname) instance.__dict__[self.field.name] = self.field.unpickle(raw_data) return instance.__dict__[self.field.name] Chapter 3 ■ Models 80 Unfortunately, that’s actually a little too simplistic. Trying this out in Python will result in a KeyError, because Django expects the attribute dictionary to include a __module__ key, with its value being the import path of the module where the model was defined. This is normally populated by Python automatically for all classes defined in source files, but since we’re generating a model at runtime, it’s not available. This is just one of the minor details that dynamic models have to face, and there’s really no way of avoiding it entirely. Instead, create_model() needs to be updated to provide a __module__ attribute directly. This is also another example of why it’s a good idea to put this code in one place; imagine having to deal with this every time a dynamic model is required. Here’s what it looks like to include a module path for the class: def create_model(name, module_path): return type(name, (models.Model,), {'__module__': module_path}) Now it can accept a module path and keep Django happy. Well, it can keep Django happy as long as the module path has already been imported, which means it has to actually exist. Under normal circumstances, the model’s __module__ attribute is set to the path of the module where it was defined. Since the model will only be processed while executing that module, it’s always guaranteed that the module will exist and have been imported successfully. After all, if it hadn’t, the model would’ve been encountered in the first place. For now, since the only requirement of the module path is that it be valid and already imported, Django’s own django.db.models will make a reasonable candidate. It should be overridden where appropriate, of course, but it’s a decent default until things get rolling. def create_model(name, attrs={}, module_path='django.db.models'): attrs = dict(attrs, __module__=module_path) return type(name, (models.Model,), attrs) Clearly, these dynamic models shake things up quite a bit, bypassing much of how Python normally works with a process like this. The __module__ issue is just the first issue encountered, and one of the easiest to work around. Thankfully, even though there are a few others to be handled, it can be well worth it if used properly. The next step in this basic example is to include a dictionary of attributes to be set as if they were declared directly on a class definition. This will allow fields to be included on the model, as well as custom managers and common methods like __unicode__(). Since we’re already passing a dictionary to be used as attributes, assigning additional items to that dictionary is a simple process. def create_model(name, attrs={}, module_path='django.db.models'): attrs = dict(attrs, __module__=module_path) return type(name, (models.Model,), attrs) Ordinarily, it’s not advisable to supply a mutable object, such as a dictionary, as a default argument, since modifications to it would affect all future executions of the function. In this example, however, it’s used only to populate a new dictionary, and is immediately replaced by that new dictionary. Because of this, it’s safe to use as the default argument, in an effort to keep the method reasonably succinct. So far, we’ve set up a 3-line function to create basic models with any number of attributes, which can then be used in other areas of Django. Technically, this function alone could be used to generate any model imaginable, but it already provides a shortcut for setting up __module__, so it would make sense to provide another shortcut for setting up the model configuration by way of a Meta inner class. That way, code to create a model won’t have to set up that class directly. Chapter 3 ■ Models 81 Adding Model Configuration Options Django models accept configuration through an inner class called Meta, which contains attributes for all the options that are specified. That should sound familiar, since that’s basically what models themselves do as well. Unfortunately, because of how Django processes the Meta class, we have to take a different approach. The attributes defined within Meta are passed along into a special Options object, which lives at django. db.models.options. As part of this process, Options makes sure that no attributes were supplied that it doesn’t know how to handle. Unfortunately, because the fact that Meta is a class is just a way to separate its namespace from that of the main model. Options only knows how to handle old-style Python classes—that is, classes that don’t inherit from the built-in object type. This is an important distinction, because calling type() directly creates a new-style class, even if it doesn’t inherit from object, or any subclasses for that matter. This ends up creating two additional attributes on the class that Options doesn’t know how to deal with, so it raises a TypeError to indicate the problem. That leaves two options for creating a Meta class: removing the additional attributes or creating an old-style class using some other means. While it would be possible to just remove the attributes that offend Options, an even better idea would be to provide it exactly what it expects: an old-style class. Clearly, using type() is out of the question, which leaves us with just declaring a class using standard syntax. Since this is possible even within functions, and its namespace dictionary can be updated with new attributes, it’s a decent way to go about solving this problem. from django.db import models def create_model(name, attrs={}, meta_attrs={}, module_path='django.db.models'): attrs['__module__'] = module_path class Meta: pass Meta.__dict__.update(meta_attrs, __module__=module_path) attrs['Meta'] = Meta return type(name, (models.Model,), attrs) This will now accept two attribute dictionaries, one for the model itself, and another for the Meta inner class. This allows full customization of Django models that can be created at any time. While this may seem like a rather abstract concept at the moment, see Chapter 11 for a full example of how this can be used in practice to automatically record all changes to a model. Now What? With a solid foundation of Django’s models under your belt, the next step is to write some code that will allow users to interact with those models. The next chapter will show how views can provide your users with access to these models. 83 Chapter 4 URLs and Views Much of this book is split into fairly self-contained chapters, but this one covers two seemingly unrelated concepts together, because each relies very much on the other. URLs are the primary entry points to your site, while views are the code that respond to incoming events. What goes on in a view is very open-ended. Aside from accepting a request and returning a response, there’s no particular protocol that views should adhere to, and no rules about what they are or aren’t allowed to do. The possibilities for views are too vast to consider describing in detail, and there aren’t any utilities designed explicitly for views to use while executing. Instead, it’s possible to hook into the process Django uses to map Web addresses to the views they should execute. This makes the link between URLs and views extremely important, and a thorough understanding of it can enable further advanced techniques. Also, in terms of how Django manages incoming requests, URL configurations exist solely to dispatch a request to a view that can handle it. Discussing URLs and URL configurations independently of views would be of little value. URLs Since all incoming requests to a Web server originate with the Web browser accessing a URL, a discussion of URLs is an important place to start. The process taken by the browser to transform a URL into a message to be sent to the Web server is beyond the scope of this chapter, but Chapter 7 provides more information. One common point of confusion is whether a Web address should be called a Uniform Resource Identifier (URI) or a Uniform Resource Locator (URL). Many people use these two terms interchangeably, regardless of whether they know the difference. In a nutshell, a URI is a complete addressing mechanism that includes two pieces of information. The name of the scheme or protocol to be used to connect to the resource. This is always • followed by a single colon. The path where the resource can be found. The exact format of this path may be different for • different schemes, so not all URI paths look alike. URLs, on the other hand, are addresses from a small set of connection schemes whose path portions all conform to a single format. Included in this set are such common protocols as HTTP, HTTPS and FTP—essentially the common protocols found on the Web today. The path format shared by these protocols is as follows. The protocol to be used to access the resource, such as • http:// for standard HTTP. This is a slight extension to the scheme portion of the URI because it is assumed that all URL protocols will include two forward slashes following the colon. The host domain where the resource can be found, such as • prodjango.com or www.prodjango.com. Chapter 4 ■ UrLs and Views 86 The url( ) Function In an effort to provide better flexibility in the long run, URL pattern tuples have been deprecated in favor of the url() utility function. url() takes the same arguments that are passed into the tuple, but can also take an extra keyword argument to specify the name of the URL pattern being described. This way, a site can use the same view multiple times, yet still be able to be referenced using reverse URL lookups. More information on that can be found later in this section. The include( ) Function Rather than supplying all your URL patterns in a single file, the include() function allows them to be split up among multiple files. It takes a single argument: an import path where another URL configuration module can be found. This not only allows the URL configuration to be split across multiple files, but it also allows the regular expression to be used as a prefix for the included URL patterns. One important thing to remember when using include() is to not specify the end of the string in the regular expression. The expression should never end in a dollar sign ($). The dollar sign ($) causes the expression to only match the full URL. This wouldn’t leave any additional URL fragments to pass along to the included configuration. This means that the extra URL patterns would only be matched if they check specifically for an empty string. Resolving URLs to Views Views are rarely called directly by your own code but are instead invoked by Django’s URL dispatch mechanism. This allows views to be decoupled from the particular URLs that trigger them, and the details of how those two aspects are linked can be safely ignored for most projects. But since views don’t always have to just be simple functions, knowing how Django goes from URL to view is important in order to determine what views are truly capable of. Mapping URLs to views is a simple, well-documented process, but it’s worth covering the basics here for reference. A typical URL pattern consists of a few distinct items: A regular expression to match against the incoming URL being requested• A reference to the view to be called• A dictionary of arguments to be passed along every time the view is accessed• A name to be used to reference the view during reverse lookups• Since URL patterns are expressed in regular expressions, which can capture certain portions of a string for later use, Django uses this as a natural way to pull arguments out of a URL so they can be passed to a view. There are two ways these groups can be specified, which determine how their captured values are passed into the view. If groups are specified without names, they’re pulled into a tuple, which is passed along as excess positional arguments. This approach makes the regular expression a bit smaller, but it has some drawbacks. Not only does it make the regular expression a bit less readable, it also means that the order of arguments in your view must always match the order of the groups in the URL, because Django sends them in as positional arguments. This couples the URL to the view more than is usually preferable; in some situations, such as the object-based views described later in this chapter, it can still be quite useful. If groups are given names, Django will create a dictionary mapping those names to the values that were extracted from the URL. This alternative helps encourage looser coupling between URLs and views by passing captured values to the view as keyword arguments. Note that Django doesn’t allow named and unnamed groups to be used together in the same pattern. Chapter 4 ■ UrLs and Views 87 Resolving Views to URLs As alluded to in the previous section, there’s another URL resolution process that Django provides, which can be of even more use if applied properly. Applications often need to provide links or redirects to other parts of the application or elsewhere on the site, but it’s not usually a good idea to hard-code those links directly. After all, even proprietary applications can change their URL structure, and distributed applications may not have any idea what the URL structure looks like in the first place. In these situations, it’s important to keep the URLs out of the code. Django offers three distinct ways to specify a location without needing to know its URL in advance. Essentially, these all work the same way, as they all use the same internal machinery, but each interface is suited for a particular purpose. The permalink Decorator One of the most obvious places for code to reference a URL is in the get_absolute_url() method of most models. Providing this method is a common convention, so templates can easily provide a direct link to an object’s detail page without having to know or care what URL or view is used to display that page. It doesn’t take any arguments and returns a string containing the URL to be used. To accommodate this situation, Django provides a decorator, living at django.db.models.permalink, which allows a function to return a set of values describing a view to be called, transforming it into a URL that calls the view. These values are provided as the return value from a function such as the get_absolute_url() method and follow a specific structure—a tuple containing up to three values. The first value is the name of the view to be called. If the view was named, that name should be • used here. If not, the import path of the view should be used instead. This is always required. The second value is a tuple of positional arguments that should be applied to the view. If there • are no arguments to be applied to the view at all, this value doesn’t need to be provided, but if keywords are needed, this should be an empty tuple. The third value in this tuple is a dictionary mapping keyword arguments to their values, all of • which will be passed to the specified view. If no keyword arguments are necessary, this value can be left out of the tuple. Given the following URL configuration: from django.conf.urls.defaults import * from django.views.generic.detail import DetailView from library.import models class LibraryDetail(DetailView): queryset = models.Article.objects.all() urlpatterns = patterns('django.views.generic', url(r'^articles/(?P<object_id>\d+)/$', LibraryDetail.as_view(), name='library_article_detail'), ) Chapter 4 ■ UrLs and Views 88 a corresponding model (located in a library application) might look like this: from django.db import models class Article(models.Model): title = models.CharField(max_length=255) slug = models.SlugField() pub_date = models.DateTimeField() def get_absolute_url(self): return ('library_article_detail', (), {'object_id': self.id}) get_absolute_url = models.permalink(get_absolute_url) The url Template Tag Another common need is to have templates provide links to views that aren’t based on models but still shouldn’t have a hard-coded URL. For instance, a link to a contact form doesn’t necessarily have any ties to the database or any models, but will still need to be linked to in a way that can accommodate future changes or distribution. The syntax for this template looks quite similar to the permalink decorator because it passes values to the same utility function. There are some slight differences, because as a template tag, it doesn’t use true Python code. {% url library_article_detail object_id=article.id %} The reverse( ) Utility Function Django also provides a Python function that provides the translation from a description of a view and its arguments to a URL that will trigger the specified view. Living at django.core.urlresolvers, the reverse() function does exactly that. It takes all the same arguments described for the previous two techniques, but also one other, allowing it to specify which URL configuration module should be used to resolve the URL. This function is used internally by both the permalink decorator and the url template tag. The reverse() function takes up to four arguments. • viewname—The name of the view to be called or the import path if no name was specified. This is always required. • urlconf—The import path of a URL configuration module to use for lookups. This is optional and if it’s absent or None, the value is taken from the ROOT_URLCONF setting. • args—A tuple of any positional arguments that will be passed to the view. • kwargs—A dictionary of any keyword arguments that will be passed to the view. Using the same example as in the previous section, here’s how reverse() would be used to obtain a URL for a specific object. >>> from django.core.urlresolvers import reverse >>> reverse('library_article_detail', kwargs={'object_id': 1}) '/articles/1/' Keep in mind that args and kwargs are separate, distinct arguments. The reverse() utility function does not use any form of the argument expansion described in Chapter 2. Chapter 4 ■ UrLs and Views 91 Use Lots of Arguments Typically, a view could perform quite a few different tasks, all combining to solve a particular problem. Each of these tasks often has to make assumptions about how it should work, but these assumptions can typically be pulled out into a configurable option using arguments. Consider the following view, designed to retrieve a blog post and pass it along to a template. from django.shortcuts import render_to_response from django.template import RequestContext from blog.models import Post def show_post(request, id): post = Post.objects.get(id=id) context = RequestContext(request, {'post': post}) return render_to_response('blog/detail.html', context) This view will work perfectly well for its intended purpose, but it’s quite tightly connected to a specific blog application. It’s still loosely coupled in the sense that it doesn’t need to deal with the details of how to retrieve the blog post or render the template, but still relies on details specific to the blog application, such as the model and template. Instead, it’s possible to move these assumptions into arguments that can be swapped out for other situations. While initially this will involve some extra work, it can save a lot of time later, if this view is used in a great number of situations. More importantly, the more complex the view, the more code that can be reused using this technique. Once these options have been moved out into arguments, specific values can be passed in with a URL configuration, so a view doesn’t have to be written for each purpose. For this particular view, a few things can be factored out in this way. The model doesn’t need to be known in advance and the view should also be able to work with a QuerySet so that a particular URL could operate on a limited set of data. Also, the field name shouldn’t be hard-coded, and the template name should be provided outside the view. from django.shortcuts import render_to_response from django.template import RequestContext def show_object(request, id, model, template_name): object = model._default_manager.get(pk=id) context = RequestContext(request, {'object': object)}) return render_to_response(template_name, context) Then, when it comes time to use this view, it’s easy to customize by providing these details using a URL configuration. Simply supply the argument values as an extra dictionary in the URL configuration, and they’ll be passed along each time the view is called from that URL pattern. from django.conf.urls.defaults import * from blog.models import Post urlpatterns = patterns('', (r'^post/(?P<id>\d+)/$', 'blog.views.show_object', { 'model': Post, 'template_name': 'blog/detail.html', }), ) Chapter 4 ■ UrLs and Views 92 This approach can even be used with models that use other types of IDs, such as a music database using catalog numbers in the format of DJNG-001; anything that can be guaranteed unique among all objects can be used as an object’s primary key. Since our new generic view simply passes the ID straight through to the database API, it’s easy to support these other types of IDs by simply adjusting the URL pattern appropriately. r'^album/(?P<id>[a-z]+-[0-9])/$' This particular view shouldn’t have to be written in the first place, because Django provides one out of the box for this purpose, DetailView, and it’s even more versatile than the example shown here. It uses nearly a dozen different arguments, all of which are expected to be customized in URL configurations. Once you have a view that accepts a number of arguments for customization, it can become quite easy to require far too many arguments be specified in each URL configuration. If every use of a view requires all the configuration options to be specified, it could quickly become just as much work to use the generic view as it would be to write the view from scratch each time. Clearly, there needs to be a better way to manage all these arguments. Provide Sensible Defaults Since functions can define default values for any arguments that can use them, the most reasonable way to manage this complexity is to provide decent defaults wherever possible. Exactly what defaults can be provided and what they look like will be different for each view, but it’s usually possible to come up with some sensible values for them. Sometimes you have a number of views that each serve a different purpose but may have some code in common. This is often boilerplate, which every view needs to use, but isn’t geared toward the true functionality of any individual view. For example, views for private pages must always verify that users are logged in and that they have the appropriate permissions. An application may have a dozen different types of views, but if they’re all private, they must all use that same code every time. Thankfully, we’re working in Python, which provides a useful alternative. View Decorators Most boilerplate in views is either at the very beginning or the very end. Usually it handles such tasks as initializing various objects, testing standard prerequisites, handling errors gracefully or customizing the response before it goes out to the browser. The real meat of the view is what sits in the middle, and that’s the part that’s fun to write. Described in Chapter 2, decorators are a great way to wrap several functions in some common code that can be written once and tested easily, which reduces both bugs and programmer fatigue. Since views are typically just standard Python functions, decorators can be used here as well. Chapter 2 illustrated how decorators can be used to write a wrapper around the original function, which can then access all the arguments that were intended for that function, as well as the return value from the function itself. In terms of views, this means that decorators always have access to the incoming request object and the outgoing response object. In some cases, a decorator can be special-cased for a particular application, which would allow it to anticipate a greater number of arguments that are specific to that application. There are a number of things decorators can offer views, and a few of them are common enough to warrant inclusion in Django itself. Living at django.views.decorators are a few packages containing decorators you can use on any view in any application. The following packages are listed with just the trailing portion of their full import path provided, given that they all live at the same location. • cache.cache_page—Stores the output of the view into the server’s cache so that when similar requests come in later, the page doesn’t have to be re-created each time. • cache.never_cache—Prevents caching for a particular view. This is useful if you have site-wide caching set up but certain views can’t afford to go stale. Chapter 4 ■ UrLs and Views 93 • gzip.gzip_page—Compresses the output of the view and adds the appropriate HTTP headers so the Web browser knows how to handle it. • http.conditional_page—Only sends the whole page to the browser if it has changed since the last time the browser got a copy of it. • http.require_http_methods—Accepts a list of HTTP methods (described in detail in Chapter 7) that the view is limited to. If the view is called with any other method, it sends a response telling the browser it’s not allowed, without even calling the view. Two included shortcut variations are http.require_GET and http.require_POST, which don’t take any arguments and are hard coded for GET and POST requests, respectively. • vary.vary_on_header—Helps control browser-based caching of pages by indicating that the page’s content changes, depending on the values of the headers passed into the decorator. A simple variant specific to the Cookie header is available at vary.vary_on_cookie. Additional decorators are provided as part of the bundled applications living at django.contrib. These decorators all live below that path, so as in the previous list, only the relevant path is supplied: • admin.views.decorators.staff_member_required—A simple decorator that checks the current user to see if it has staff access. This is used automatically for all the views in Django’s built-in admin, but could also be used for any other staff-only views on your site. If the user doesn’t have staff permissions, the decorator redirects the browser to the admin’s login page. • auth.decorators.user_passes_test—Accepts a single argument, which is a function to test the current user against some arbitrary condition. The provided function should accept just the User object and return True if the test passes or False if it fails. If the test passes, the user will be granted access to the page, but if it fails, the browser will redirect to the site’s login page, as determined by the LOGIN_URL setting. • auth.decorators.login_required—A specialized version of user_passes_test, this decorator simply checks that the user is logged in before allowing access to the view. • auth.decorators.permission_required—Another specialization of user_passes_test, this checks that the user has a given permission before the view is loaded. The decorator takes a single argument: the permission to be checked. These are just the decorators that are bundled with Django itself. There are many other purposes for decorators, and third-party applications can provide their own as well. In order for these decorators to be of any use, however, they must be applied to views. Applying View Decorators Chapter 2 described how decorators can be applied to standard Python functions. Applying decorators to views works the same way, but there’s a notable difference: views aren’t always under your control. The techniques described in Chapter 2 assume that the functions you decorate are your own. While that’s often the case, the number of distributed applications means that many Django-powered Web sites will use code from other sources, with views of their own. Applying decorators as described previously would require changes to the third-party code. The goal is to apply decorators to third-party views without actually modifying third-party code. The key to doing this lies in the older-style decorator syntax from Python 2.3 and earlier. Remember that the new syntax allows decorators to be applied above the function definition, but the older syntax relies on passing the function to the decorator directly. Since Python functions can be imported from anywhere and can be passed in as arguments at any time, this is an excellent way to create decorated views from third-party code.