Tuesday, October 25, 2011

Getting Started with MongoDB and Python

I'm going to over some basic steps to get a feeling of MongoDB. I will be using Python with Pymongo module to interact with MongoDB.  I've gone over the installation of these two in a previous post. Any way, let's get to it.


MongoDB (from "humongous") is one of the so called NoSQL databases.  It is document-based, schema-free, has no joints, and it supports indexing and  adhoc queries.  For those of us that are used to RDBMS systems, these document-based data stores are easier to understand and work with when making the transition into the NoSQL world.
Let's go over some concepts. MongoDB has the same "database" or "schema" concept as RDBMS. A database can have zero or more collections (a collection is similar to a table in RDBMS), a collection can have zero or more documents (a document is similar to a row in RDBMS), and a documents can have zero or more key-value pair fields (fields are similar to columns in RDBMS).  One of the characteristics of document-based data stares is that they are schema-less, practically they are not strict about what goes in a collection. Essentially, a collection can have documents that are completely different from each other and that is all fine.
MongoDB uses BSON (not exclusive to MongoDB) as the data storage and network transfer for documents. BSON documents are very much JSON with extra support for Boolean, integer, timestamp and other data types. We can query for these documents either by using the native MongoDB queries or by crafting more advanced queries using mapReduce.
Let's summarize and look at some other key features of MondoDB (based on the main stie mogodb_home)
  • A NoSQL data stare
    • document based.
  • BSON documents are a binary-encoding serialization of JSON
    • Language independent data interchange format – not exclusive to MongDB
    • Supports
      • Boolean, integer, float, date, string and binary types.
  • Protocol: programming language specific
  • Document-based query language
    • can leverage on defined indexes.
  • GridFS links
    • GridFS is a storage specification for large objects.
      • Video, images, etc.
  • Support indexing much like relation databases
    • including secondary and complex indexes.
    • Indexes are implemented as B-Trees indexes.
    • Indexes are used by Mongo's query optimizer to quickly sort through and order the documents in a collection.
Ok, enough of terminology and lets write some code :)  As I said before I'll be using Python with Pymongo.
1.  I'll start my mongodb server
2.  I'll open my Python interactive shell.
3.  I'm creating a connection to MongoDB.  I've explicitly "selected" a database ("newsdb") and a collection ("articles"), even if they don't yet exist in MondoDB it will work fine. Whenever we try to insert a document, Mongo will check to see if we have these two defined, if they are it will used them, if they are not Mongo will create them.
>>>  import pymongo
>>>  from pymongo import Connection
>>>  connection = Connection('localhost', 27017) 
>>>  db = connection.newsdb
>>>  articles = db.articles


4.  Le'ts create an article
>>>  article = {"title": "some title", "desc": "some desc", "author": "jane"}
5.  Insert document - notice the auto-generated id created by MongoDB
>>>  articles.insert(article)
ObjectId('4ea75b857041ef105c000000')


6.  Collection was created automatically
>>>  db.collection_names()
[u'articles', u'system.indexes']


7.  Checking for the newly created document(notice the criteria is just a JSON doc itself)
>>>  articles.find_one({"title": "some title"})
{u'title': u'some title', u'_id': ObjectId('4ea75b857041ef105c000000'), 
  u'author': u'jane', u'desc': u'some desc'}


8. Lets create a new document with different schema 
>>>  import datetime
>>>  article = {"title": "short title", "desc": "a short desc", "author": "abdel", 
                     "date" :datetime.datetime.utcnow()}
>>> articles.insert(article)
ObjectId('4ea765ac7041ef105c000001')


8. We can have embedded docs
>>>  article = {"title": "petite title", "desc": "", "author": "abdel", 
                        "comments": [{"user": "mino", "comment": "I agree"}]}
>>>  articles.insert(article)
ObjectId('4ea766677041ef105c000002')


9.  We can inset articles in bulks 
>>>  li_articles = [{"title": "another title ", "desc": "another desc"}, 
                            {"title": "yet another tile", "desc": "yet another desc",  "author": "jane"}]


10.  Lets iterate over all our documents 
>>>  for article in articles.find():  article
11. Get the count
>>>  articles.count()
8
12.  Let's find the count of docs that match a specific query
>>>  articles.find({"author": "abdel"}).count()
2
11. Let's update a document
>>>  articles.update({"title":"some title"}, {"$set": {"desc": "updated some desc"}})
>>>  articles.find_one({"title":"some title"})
{u'title': u'some title', u'desc': u'updated some desc', 
u'_id': ObjectId('4ea75b857041ef105c000000'), u'author': u'jane'}


12. Let's delete/remove a document
>>>  articles.remove({"title":"some title"})


I just scratched the surface, for more information check out the Mono documentation. There we can fine a nice description of Mongo's queries and how they compare to SQL queries. It also has a good explanation for how and when to use more advance queries using MapReduce. This is useful for processing batches of data and for doing data aggregation operations.

Monday, October 24, 2011

NoSQL Databases the "new" Popular Kid on the Block

In this post, I'm going to go over the NoSQL ("Not Only SQL") data stores. NoSQL data stores can mean several things (depending on who you ask) but certainly it refers to a broad class of databases that differ from the traditional Relational Database Management Systems (RDBMS). NoSQL data stores are about new approaches to store and access data.


We live in an "information age" and the amount of digital information (data) has grown dramatically in the last decade. This trend will only continue to grow as we move ourselves more and more into the digital world.  Some organizations have been taking notice of this, and are now rethinking in how to better organized their data. Traditionally, RDBMS systems were (and still are) an integral part of any application's back-end system. However, there is a "new" kid on the block offering us alternative ways to store and access our data. The so called NoSQL databases take a different approach at handling data that might not be well suited for a traditional RDBMS. Here are a few examples of the many different NoSQL solutions available. 
Key-Value Stores - Redis, SimpleDB
Column Stores - HBase, Cassandra
Document Store - MongoDB, CouchDB
Graph Store- Neo4J
In general they are all designed to handle huge amounts of data, have no table schema nor joins and are particularly suited to scale horizontally. And is this last one - horizontal scaling - which can make a huge different (performance overhead) when dealing with huge amounts of data.  


A gentle note in scalability before we move on.  This is important, even more with today applications that  must handle (read/write) enormous amount of data.
  • Vertical Scaling or scaling up essentially means adding more computing power to a single machine ( CPU's, RAM, disk space, etc) where the application or database resides.
  • Horizontal Scaling or scaling out essentially means that when on application or database needs more computer power then we add additional servers and distribute the load across them.
Traditional RDBMS tend to have limitations to scale horizontally.  The reasons for this limitation can be better explained by going over a fundamental theorem. 


CAP Theorem
The CAP Theorem was formulated by Eric A. Brewer a UC Berkeley professor. It practically states that there are 3 desirable properties for a distributed system that share date must have. Let see these three in summary.
Consistency:  all clients see the same version of the data, data is correct at all the time.
Availability:  all clients can always find at least one copy of requested data, even if some servers in the cluster are down. It is always on, there is no downtime.
Partition-tolerance: system should continue to function in case network is partitioned or disrupted. If nodes are added or nodes fails the system still works.
The interesting aspect about the CAP Theorem is that it states that we can only have 2 of the 3 properties at the same time. Let's keep this in mind as we continue...


ACID
Most of the RDBMS databases have loosen up the Portion-tolerance property in favor of Consistency and Availability. Most if not all them are ACID compliance, which practically means that database transactions are processed in a way that guarantees immediately data consistency.
Atomic - transactions are all-or-nothing. Either complete or not complete but never leave data in some in between state.  
Consistent - only valid data is persisted into the database. Any modifications change the data from one consistent state to another consistent state.  
Isolated - concurrent transactions must not interfere with one another. In other words other transactions cannot access or see data that has been modified during a transaction that has not yet completed.  
Durable - actions results in a permanent change of state of the system, even in case of system failure.
ACID operations are definitively of great value and are a crucial aspect of RDBMS, they are fundamentally important in many scenarios.  For example, in a financial application is extremely important that my transaction for withdrawing money from one account is ACID-compliant even if this imposes some performance overhead. However, there are other applications that don't really need to pay for this performance overhead.  Most if not all NoSQL databases sacrifice ACID compliance to eliminate this overhead.


BASE
Most of the NoSQL data stores have loosen up on the requirement for Consistency in order to achieve Availability and Partition-tolerance. The result is know as BASE
Basically Available, - system seems to work all the time
Soft-state, - it doesn't have to be consistent all the time
Eventual consistency - becomes consistent at some time later
So what does this means, really? It essentially means that consistency is ensured but not necessarily immediately.  Let's use a hypothetical social networking applications as an example. Suppose this system is distributed across multiple servers, in multiple sites and in multiple countries. If I make a change to my relationship status from single to married, then this change might not be immediately reflected (someone in China might still see my status as single) but this change will eventually at some time be reflected (consistent). So BASE system simply guarantees consistence after some time, where as an ACID systems ensures consistency after every operation. Back to my hypothetical example, is it really that important for my relational status to be ACID... I'll say no! it makes no difference in this case, but it sure does when I am withdrawing money from my bank's ATM.


To conclude this long post :)  NoSQL data stores are not a replacement to traditional RDBMS. They are not a "silver bullet".  It is however, great to have more options available to choose from.

Thursday, October 20, 2011

Setting up a Django Environment (Eclipse, MongoDB, MySQL, Python)

I will be going over some of the steps to set up a Django development environmental.  Django is a popular Python Web framework that encourages rapid development, it is well documented and there are plenty of experience users if one runs into trouble.
Note: I'm setting all these packages in a Windows environment.


Installing Python
1.  Get the Python binaries from here.  
2.  If you get the installer, the installation becomes quite easy - simply, click and agree. 
     I put my installation in the root directory ("C:\Python27") but you can put it anywhere.
3.  To test the installation,
     Open the command prompt and type: python. If there are no errors you should get something 
     like this ">>>" at the beginning of the line. This means that the Python interactive 
     interpreter is ready go. 
     If you don't see this, you might have to add Python to the system variables PATH.  
     There are several ways to get there. One is to go Start->run and type sysdm.cpl, after that 
     select Advance tab, click Environment Variables and under System Variables add the
     following to the path: C:\Python27;C:\Python27\Tools\Scripts
Installing Django 
1.  Get the packages from here.
2.  Extract the folder.
     Open a command prompt, navigate to this folder and type: python setup.py install. 
     This will install all the required files in the site-packages directory. In my system this is 
      in "C:\Python27\Lib\site-packages\django"
      To tell if Django is installed open a command prompt and type python to run the
      Python interactive interpreter (>>>) type import django. If no errors are shown, 
      then Django is installed.
 Installing PyDev Plugin
1.  I like working with Eclipse IDE for my Java projects. And I am glad I can still use it with 
     my Python projects. With Eclipse open go to Help->Install New Software-> 
     Work with ("add the following from link from here") -> select PyDev and install it.
     This is all we need to start with Django and Python in Eclipse.


Extras - I'm using the following databases so I'll add the installation instructions for these two as well.
MongoDB
1.  Get MongoDB from here
2.  Download and unzip the file. I put the extracted folder at the root (“C:\mongodb-2.x”) .
3.  Next, we have to create the data directories were MongoDB will store the data.
     The default locations is in "C:\data\db" you can create these directories via the Windows
     Explorer or through the command prompt by navigating to the "C:\" drive, create a data 
     folder (mkdir data), cd to this created folder and create an new folder (mkdir db).
4.  To test the installation open a command prompt and navigate to the ("C:\mongodb-2.x\bin").
5.  To run the   database server execute the "mongod.exe". To run the administrative shell 
     execute the "mongo.exe" .
Python driver for MongoDB
1.  The driver needed to work with MongoDB is pymongo you can get it from here.
2.  Install the downloaded binaries.
3.  To test the installation go to a command prompt and type python to get the Python 
     interactive interpreter.  Type import pymongo.  If no errors are shown the pymongo 
     module was installed correctly. 
4.  Try connecting to the MongoDB by typing the following code: 
     connection  = pymongo.Connection("localhost", 27017) 
     The port number is the default port number used by MongoDB.
MongoEngine 
1.  This installation is really very much optional, since most DOM (ORM) solutions are very new.
     However, I'm going to experiment with MongoEngine which is a Document-Object Mapper 
     (very similar to Django's ORM)
2.  Get zip file and extract the files  here.
3.  Open a command prompt navigate to the root of the folder extracted and type:
     python setup.py install
4.  The first time I tried installing it I got an exception about the "setuptools" missing.
5.  Get the setuptools installer from here
6.  Installed them and try step 3 again.   
MySQL
1.  Get the installer from here.
2.  Install the downloaded binaries. The installation it is pretty easy just click next and 
     agree to term and conditions :)
3.  A MySQL command line client should available be under the MySQL installation 
     directory.
MySQL Python driver/adapter
1.  Get the adapter binaries from here.
2.  Install the adapter by executing the binaries.
3.  The best way to test this is to create a database in MySQL and a Django project.


That is it, we should be ready to start writing some Django apps.

Saturday, October 15, 2011

Google's new programming language: Dart

This week Google released details of Dart their new programming language for building web applications.  According to its creators the goal of Dart is not to be a replacement to good all JavaScritpt, but rather to offer a more modern alternative.  A Dart compiler will be going into Chrome later this year and for those browsers that do not support it, there is an option to compile Dart code to plain JavaScript.  It seems to me that Dart is meant to be more like a JavaScirpt killer in the long run... only time will tell that.  Ok enough of this, and let's get to some Dart code. Dart's syntax is very java-like, c#-like so having any familiarity with these two will easily help you understand what's going on. 

interface Shape {  
    num Area( );  
}  
   
class Rectangle implements Shape {  
    
    // The "_" undercore before field names makes them private  
    num  _height;   
    num  _width;  
    
    // Short form for boilerplate constructor  
    Rectangle(num this._height, num this._width);   
    
    num Area( ) =>  _height * _width;  
   
    // Getters and setters can be define "similar" to property fields   
    get width( ) = > _width;  
    set width(num value) {  
        if (value < 0)  
            throw "The width cannot be negative. ";  
        _width = value;  
    }  
    
    get height( ) => _height;  
    set height(num value) {  
        if (value < 0)   
            throw "The height cannot be negative. ";  
        _height = height;  
    }  
}  
   
main( ) {  
    // var rectangle = ... is ok too  
    Rectangle rectangle = new Rectangle(4, 5);  


    // Accesing getters settles: there is no visible difference   
    // between this and the "real" class fields  
    print("Rectangle, height: " + rectangle.height + " width: " + rectangle.width);  
    print("Rectangle, area is: " + rectangle.Area());  
    
    rectangle.width = - 2;  
 }   
-----------------------------------------------  Output  -----------------------------------------------

I have placed comments in the example above for some of the things that I think are kind of neat. There are a lot of interesting features like Generics, Isolates, and Factory classes for interfaces that I'll have to explore next.  Dart is not finished yet: enums and a reflection API are just some of the upcoming features.  Anyway, weather if not Dart "takes over the world",  I'll be giving it a fair change... just like I did to other languages.  

Friday, October 14, 2011

Pragmatic Programmer Tips, 44 - 70

Continuing with the tips given in the Pragmatic Programmer book. This is the third and final entry. 


Pragmatic Programmer reference tips: 44 - 70


Chapter 6 - While You Are Coding
44.  Don't Program by Coincidence
       Rely only on reliable things. Beware of accidental complexity, and don't confuse a happy
       coincidence with a purposeful plan.
45.  Estimate the Order of Your Algorithms
       Get a feel for how long things are likely to take before you write code.
46.  Test Your Estimates
       Mathematical analysis of algorithms doesn't tell you everything. Try timing your code in 
       its target environment.
47.  Refactor Early, Refactor Often
       Just as you might weed and rearrange a garden, rewrite, rework, and re-architect code
       when it needs it. Fix the root of the problem.
48.  Design to Test
       Start thinking about testing before you write a line of code.
49.  Test Your Software, or Your Users Will
       Test ruthlessly. Don't make your users find bugs for you.
50.  Don't Use Wizard Code You Don't Understand
       Wizards can generate reams of code. Make sure you understand all of it before you
       incorporate it into your project.


Chapter 7 - Before The Project
51.  Don't Gather Requirements – Dig for Them
       Requirements rarely lie on the surface. They're buried deep beneath layers of assumptions, 
       misconceptions, and politics.
52.  Work with a User to Think Like a User
       It's the best way to gain insight into how the system will really be used.
53.  Abstractions Live Longer than Details
       Invest in the abstraction, not the implementation. Abstractions can survive the barrage of 
       changes from different implementations and new technologies.
54.  Use a Project Glossary
       Create and maintain a single source of all the specific terms and vocabulary for a project.
55.  Don't Think Outside the Box – Find the Box
       When faced with an impossible problem, identify the real constraints. Ask yourself: 
       "Does it have to be done this way? Does it have to be done at all?"
56.  Start When You're Ready
       You've been building experience all your life. Don't ignore niggling doubts.
57.  Some Things Are Better Done than Described
       Don't fall into the specification spiral—at some point you need to start coding.
58.  Don't Be a Slave to Formal Methods
       Don't blindly adopt any technique without putting it into the context of your development 
       practices and capabilities.
59.  Costly Tools Don't Produce Better Designs
       Beware of vendor hype, industry dogma, and the aura of the price tag. Judge tools on
       their merits.


Chapter 8 - Pragmatic Projects
60.  Organize Teams Around Functionality
       Don't separate designers from coders, testers from data modelers. Build teams the way 
       you build code.
61.  Don't Use Manual Procedures
       A shell script or batch file will execute the same instructions, in the same order, time 
       after time.
62.  Test Early. Test Often. Test Automatically
       Tests that run with every build are much more effective than test plans that sit on a shelf.
63.  Coding Ain't Done 'Til All the Tests Run
       'Nuff said.
64.  Use Saboteurs to Test Your Testing
       Introduce bugs on purpose in a separate copy of the source to verify that testing will catch 
       them.
65.  Test State Coverage, Not Code Coverage
       Identify and test significant program states. Just testing lines of code isn't enough.
66.  Find Bugs Once
       Once a human tester finds a bug, it should be the last time a human tester finds that bug. 
       Automatic tests should check for it from then on.
67.  English is Just a Programming Language
       Write documents as you would write code: honor the DRY principle, use metadata, 
       MVC, automatic generation, and so on
68.  Build Documentation In, Don't Bolt It On
       Documentation created separately from code is less likely to be correct and up to date.
69.  Gently Exceed Your Users' Expectations
       Come to understand your users' expectations, then deliver just that little bit more.
70.  Sign Your Work
       Craftsmen of an earlier age were proud to sign their work. You should be, too.

Pragmatic Programmer Tips, 20 - 43

Continuing with the tips given in the Pragmatic Programmer book. This is the second part of three. 


Pragmatic Programmer reference tips: 20 - 43


Chapter 3 - The Basic Tools
20.  Keep Knowledge in Plain Text
       Plain text won't become obsolete. It helps leverage your work and simplifies debugging 
       and testing.
21.  Use the Power of Command Shells
       Use the shell when graphical user interfaces don't cut it.
22.  Use a Single Editor Well
       The editor should be an extension of your hand; make sure your editor is configurable, 
       extensible, and programmable.
23.  Always Use Source Code Control
       Source code control is a time machine for your work—you can go back.
24.  Fix the Problem, Not the Blame
       It doesn't really matter whether the bug is your fault or someone else's—it is still your 
       problem, and it still needs to be fixed.
25.  Don't Panic When Debugging
       Take a deep breath and THINK! about what could be causing the bug.
26.  "select" Isn't Broken.
       It is rare to find a bug in the OS or the compiler, or even a third-party product or library. 
       The bug is most likely in the application.
27.  Don't Assume It—Prove It
       Prove your assumptions in the actual environment-- with real data and boundary conditions.
28.  Learn a Text Manipulation Language.
       You spend a large part of each day working with text. Why not have the computer do some 
       of it for you?
29.  Write Code That Writes Code
       Code generators increase your productivity and help avoid duplication.
30.  You Can't Write Perfect Software
       Software can't be perfect. Protect your code and users from the inevitable errors.


Chapter 4 - Pragmatic Paranoia
31.  Design with Contracts
       Use contracts to document and verify that code does no more and no less than it claims to do.
32.  Crash Early
       A dead program normally does a lot less damage than a crippled one.
33.  Use Assertions to Prevent the Impossible
       Assertions validate your assumptions. Use them to protect your code from an uncertain 
       world.
34.  Use Exceptions for Exceptional Problems
       Exceptions can suffer from all the readability and maintainability problems of classic 
       spaghetti code. Reserve exceptions for exceptional things.
35.  Finish What You Start
       Where possible, the routine or object that allocates a resource should be responsible for 
       deallocating it.


Chapter 5 - Bend, or Break
36.  Minimize Coupling Between Modules
       Avoid coupling by writing "shy" code and applying the Law of Demeter.
37.  Configure, Don't Integrate
       Implement technology choices for an application as configuration options, not through
       integration or engineering.
38.  Put Abstractions in Code, Details in Metadata
       Program for the general case, and put the specifics outside the compiled code base.
39.  Analyze Workflow to Improve Concurrency
       Exploit concurrency in your user's workflow
40.  Design Using Services
       Design in terms of services—independent, concurrent objects behind well-defined, 
       consistent interfaces.
41.  Always Design for Concurrency
       Allow for concurrency, and you'll design cleaner interfaces with fewer assumptions.
42.  Separate Views from Models
       Gain flexibility at low cost by designing your application in terms of models and views.
43.  Use Blackboards to Coordinate Workflow
       Use blackboards to coordinate disparate facts and agents, while maintaining independence 
       and isolation among participants.

Pragmatic Programmer Tips, 1 - 19

A year ago a good friend recommended me a great book called The Pragmatic Programmer.  I then ordered it immediately, and a soon as it arrived home I read it cover to cover in a few hours. The book is really great, easy to read, not too technical (not that there is anything wrong with technical books), and full of good tips sprinkled throughout the book.
I immediately put this book next to my favorites.  I highly recommend The Pragmatic Programmer to anyone who wants to become a better developer, in fact even if you're not a software developer you'll get something valuable from it.  Anyway, I recently re-read this book, and I thought I'll post some of those tips mentioned in the book.  There are 70 tips in total so instead of posting all 70 in one post, I'll break them down and post them in at least three blog entries.  I hope they can help anyone looking for a quick online reference for these tips... at the very least they will help me :)

Pragmatic Programmer reference tips: 1 - 19 


Preface
1.  Care About Your Craft
     Why spend your life developing software unless you care about doing it well?
2.  Think About Your Work
     Turn off the autopilot and take control. Constantly critique and appraise your work.


Chapter 1 - A Pragmatic  Philosophy
3.  Provide Opinions, Don't Make Lame Excuses
     Instead of excuses, provide options.  Don't say it can't be done: explain what can be 
     done.
4.  Don't Live with Broken Windows
     Fix bad designs, wrong decisions, and poor code when you see them. 
5.  Be a Catalyst for Change
     You can't force change on people.  Instead, show them how the future might be 
     and help them participate in creating it.
6.  Remember the Big Picture
     Don't get so engrossed in the details that you forget to check what's happening
     around you.
7.  Make Quality a Requirements Issue
     Involved your users in determining the project's real quality requirements.
8.  Invest Regularly in Your Knowledge Portfolio 
     Make learning a habit.
9.  Critically Analyze What You Read and Hear
     Don't be swayed by vendors, media hype, or dogma. Analyze information in terms 
     of you and your project.
10. It's Both What You Say and the Way You Say It
      There's no point in having great ideas if you don't communicate them effectively.


Chapter 2 - A Pragmatic Approach
11.  DRY–Don't Repeat Yourself: 
       Every piece of knowledge must have a single, unambiguous, authoritative representation 
       within a system.
12.  Make It Easy to Reuse
       If it's easy to reuse, people will. Create an environment that supports reuse.
13.  Eliminate Effects Between Unrelated Things
       Design components that are self-contained. independent, and have a single, well-defined
       purpose.
14.  There Are No Final Decisions
       No decision is cast in stone. Instead, consider each as being written in the sand at the 
       beach, and plan for change.
15.  Use Tracer Bullets to Find the Target
       Tracer bullets let you home in on your target by trying things and seeing how close they
       land.
16.  Prototype to Learn
       Prototyping is a learning experience. Its value lies not in the code you produce, but in 
       the lessons you learn.
17.  Program Close to the Problem Domain
       Design and code in your user's language.
18.  Estimate to Avoid Surprises
       Estimate before you start. You'll spot potential problems up front.
19.  Iterate the Schedule with the Code
       Use experience you gain as you implement to refine the project time scales.

Thursday, October 13, 2011

A short review of cohesion and coupling

Developing good quality software is something that we should always try to achieved. So here in this post I'll try to gently review two concepts that will hopefully help us get closes to the goal.  Cohesion and coupling are two factors that help to increase reliability, understandably, inefficiency, and maintainability within and between classes.

Cohesion measures the degree of interactions and how strongly-related are the responsibilities within a class. Low cohesion is a sign of bad design, high cohesion is the contrary.  Low cohesion is when a class has a bunch of unrelated responsibilities. If a class has responsibilities that don't relate to its name, than maybe they don't below there in the first place. A high cohesive class ensures strong related responsibilities.

Coupling is the measure of how interconnected classes are, how much a class relies on other classes. Tightly couple classes means that classes need to know about the internal details of the others.  There are many disadvantages to tightly couple systems mainly because a change to one class might force a change in the other, this can potentially create a ripple effect to other classes.  Classes that are tightly couple can also be difficult to unit test because dependent classes have to be included.  If tight couple systems are not ideal, than how do we achieve loosely coupling? classes are loosely couple when changes in one class rarely or never involved changes in the other. One way to achieve loosely coupling is to make classes interact with others through a well defined and stable interface, after all they shouldn't be concerned with the other's internal implementation.

The higher the cohesion and the lower amount of coupling tends to lead to a good system design. Thinking about these concepts the next time we find ourselves writing some software will definitely make a good difference.

Law of Demeter (LoD)

The LoD can be summarized as, "Each unit should only use a limited set of other units: only units "closely" related to the current unit". In other words "each unit should only talk to its friends, and not to strangers". The main goal with the LoD is to organized and reduce dependencies between classes. The benefit is loose coupling because a class only knows about its neighbors.

The more formal definition of the LoD states that "Inside of a method M of a class C, data can be accessed in and messages can be sent to only the following objects (namely, method M of an object O may only invoke the methods of the following kinds of objects):

  • Parameters of method M
  • (itself)this, super
  • Data fields (data members) of class C
  • Objects created by functions called by M
  • Global Variables
  • Temporary variables created in method M
Lets look at an example below to show some of the suggestions mentioned above.
 public class Demeter { 

      private A a; 
      private B b; 
       
      // Any method of an object should call only methods  
      // belonging to 
      public void example(B b) { 
           C c = new C(); 
           this.b = b;         // itself 
           int num = f();           
           b.invert();         // passed parameters 
           a = new A(); 
           a.setActive();      // directly held component objects 
           c.print();          // objects created in method  
      } 
       
      public int f() { 
           //implementation omitted 
      } 
 }  

The LoD can also be stated as "use only one dot". What that means is that we can do something like this: foo.someMethod(), however, x.foo.someMethod() will break the LoD.