ICT704 Non-Relational Database Systems Essay

Question:

Background

Movie Maniacs is a site which lists top charts of movies and have put an emphasis in follower engagement. They are active on Facebook and Twitter and share the latest updates and news to their followers. However, they have recently decided to create a page where viewers can rate the movies on the list and leave comments for others to see. They have asked you to come up with a database using MongoDB to be able to store their movie lists and allow viewers to easily rate and review the movies.

There are two parts to this assignment. Part A is the creation of the database in MongoDB and Part B is the report.

Part A - Database

  • Create a MongoDB database using the data provided to you in the
  • Insert data from the provided .xlsx file into MongoDB using the insert command
  • Create indexes which you think will be needed and beneficial
  • Create the following queries (all output should be displayed in a formatted way):
    • List all the movies in the collection oList the movies that are from Japan o List just the directors name(s) for every movie o List the distinct names of every director o Count the number of movies in the list
    • Return only the movies that have won at least one Oscar oList the movies that were released before 1980 o Return the title and average rating of each movie
    • Return the title of movies that have had no ratings or comments
  • Update the title of movie 6 to “E.T.”
  • Add a new field called notes to the following movies:

12 Terminator and Terminator 2 are rated together o 18 The trilogy consists of the three movies

Part B - Report

For the report you are required to explain the structure of the database you created. This includes justifying the indexes you created. You need to describe how the relationships were handled in the database. In your report, discuss potential alternatives to how the relationships could have been modeled and implemented in MongoDB and the benefits/issues of each. Provide recommendations to Movie Maniacs for any additional functionality for the database.

Answer:

Structure of the Database:

We have only one collection Movies in our database. This database has zero to many relationships with rating type of Document. So the movies in this collection may or may not have ratings. Also, a rating is not a separate collection in our database. We have used embedded Relationship approach in our database. So ratings are stored in embedded form inside a movie document. One more important property of MongoDB document is that null values are not there as in Relational databased. There may be some cases where some attributes exist for one Document of movies but they do not exist for other documents. In that case, these attributes are simply ignored and they do not exist for the documents which do not have their values. In the embedded relationship, our movie document has an array named Rating. This attribute only exists for those movies for which the users have provided some rating. This rating attribute is an array type object containing the Rating documents. It can contain any number of ratings related to this movie document.

Eg.

db.Movies.insert({

"MovieID": "11",

"MovieName": "King Kong",

"Director": "Ernest B Schoedsack, Merian C Cooper",

"Leading actors": "Bruce Cabot, Ernest B Schoedsack, Fay Wray, Frank Reicher, James Flavin, John Armstrong, Noble Jhonson, Robert Armstrong",

"ReleaseDate": "1933",

"Country": "USA",

"Rating":[

{"MovieID": "11","ReviewedBy": "Mitch","Date": "6/9/18","Rating": 5,"Comments": "Was ok, could have been better"},

{"MovieID": "11","ReviewedBy": "Matt","Date": "6/7/18","Rating": 9,"Comments": "Brilliant"}

]});

Here, this movie is rated by two Users. So the rating attribute contains an array of ratings related to this movie. Each rating array index is a separate document having own object Id.

Indexes created for this collection: -

  1. db.movies.createIndex({"MovieID":1});

The first index is created on the basis of MovieID. Here 1 denotes that the index is created in ascending order. We have a lot of searches on the basis of MovieId, so making an index on MovieId will reduce the overall time taken by the query to fetch results.

  1. db.movies.createIndex({"MovieName":1});

This second index is created on the basis of MovieName. Here 1 again denotes that the index is again using the ascending order for making the index. After the id, the next main component of every table names. A lot of searches are executed on the basis of the name. So we have chosen the MovieName as the second attribute for making an index in order to make the search faster.

Handling of relationships in database: -

We have used embedded relationship approach in our database. There is zero to many relationships with movies collection with ratings. So each movie can have zero or any number of the rating associated with it. As we have used embedded relationship, so ratings are embedded inside movie collection. Each movie has its associated ratings embedded inside it in an array form. The Movies having ratings have an array named Rating which contains all the Rating documents associated with it inside them. In Embedded relationship one child/embedded document can be associated with one and only one Parent document. So each rating is associated with only one and only one movie document.

Potential alternatives to how the relationships could have been modelled and implemented in MongoDB: -

  1. Embedded Relationship:

We have used this approach in our database implementation. In this approach, each document is embedded inside another document. One document can contain zero or any number of embedded documents. There is no limit on the number of documents that can be embedded inside other document but the size of a document in mongo Db cannot exceed more than 16 MB.

Benefits of using Embedded Approach: -

  1. Direct association of one document with the other document. If we have the main document, then we can easily get the other document that is embedded inside the main document.
  1. There is the direct relationship. I.e. with one single query, we can retrieve all the embedded documents from the main document.
  1. It is more efficient to retrieve the document while using the embedded approach.

Drawbacks of using Embedded Approach: -

  1. Size of the document increases drastically. As each embedded document is an integral part of the main document. So the size of the main document increases drastically and in mongo dB, the maximum size of a document can be 16 MB.
  1. There is a direct relationship i.e. one embedded document can be associated with only one main document. The embedded document cannot be associated with more than one document at the same time. So if there are some cases where we need to use the same document in more than two documents, then we will have to make a copy.
  1. Embedded documents do not have the separate life. These documents exist only until their parents are alive. As soon as we delete any parent document, all the child/embedded documents associated with it are automatically deleted.
  1. Referenced Relationships:

We can also use reference of a document to create a relationship. Rather than embedding the child document into the parent document, we can easily separate the child document as a standalone document and add its reference to the parent document. Each will help in reducing the size of the parent document as we are just storing the reference and each child document will have their size.

Benefits of using Referenced Relationship: -

  1. There is no direct relationship between the parent documents and the child documents. We are just storing the references in the parent document. So we can safely delete the parent document without affecting the child document. On even the deletion of the parent document, the child document will live as a separate document.
  1. Each child document has its own size and it does not affect the size of the parent document. As mongo DB limits the size of a document to 16MB. So this helps a lot in keeping the size of a document inside the limit.
  1. We have child documents as a separate document. So we can easily add the reference of the child document to more than one parent.

Drawbacks of using Referenced Relationship:

  1. Querying a referenced database is not efficient. We will have to make many hits to fetch the child. As firstly, we will have to get the references and then from those references, we will have to get the child documents.

Recommendations to Movie Maniacs for any additional functionality for the database:

We can add various other information in the database like

  1. Link to movie trailers
  2. Information about this movie's sequels
  3. Cast information

References:

Alam, M. (2014) Oracle NoSQL database. New York: McGraw-Hill Education/Oracle Press.

Celko, J. (2014) Joe Celko's complete guide to NoSQL. Waltham, MA: Morgan Kaufmann.

Copeland, R. (2013) MongoDB applied design patterns. Sebastopol, CA: O'Reilly Media, Inc.

Dayley, B. (2015) Sams Teach Yourself NoSQL with MongoDB in 24 hours. Indianapolis, Ind.: Sams.

ELIOT, G. (2019) MILL ON THE FLOSS. [S.l.]: MACMILLAN COLLECTOR'S LIB.

Francia, S. (2012) MongoDB and PHP. Beijing: O'Reilly.

McCreary, D. and Kelly, A. (2014) Making sense of NoSQL. Shelter Island: Manning.

Mehrabani, A. (2014) MongoDB high availability. Birmingham: Packt Publishing.

Perkins, L., Redmond, E. and Wilson, J. (2018) Seven databases in seven weeks. Raleigh, N. C.: The Pragmatic Bookshelf.

Vaish, G. (2013) Getting started with NoSQL. Birmingham: Packt Publishing.

How to cite this essay: