Data Deletion


This is more of an issue discussion. Data deletion is arguably one of those simplest concepts that gives database administrators and software engineers a headache. The problem is not a matter of how fast you can type "Delete *; it is whether a computer system should really delete data, as in make every byte available to be overwritten for any subsequent data.

The Front: From User's Point of View

When a user deals with a set amount of data, it is often a case that the user does not want to deal with a subset of the data, hence 'removing' them from the system and not to deal with them in any time in the future. That would be an ideal case, but what happens if the user removes a data subset accidentally?
There are tools nowadays that we can use to circumvent this incident. For a local file system, there are a number of un-(delete/remove) recovery tools available on the web to retrieve all of the removed data subset; or go to the (recycle bin/trash) and do a right-click restore.
For a local or remote RDBMS, a sufficient amount of firepower is needed to recover deleted data, most of the times it takes up a day or more to recover data from a backup server, provided that there is a backup server. In rare cases, the data can be lost beyond recovery.

The End: From Software Engineer's Point of View

Be it a database administrator or a simpleton software engineer, deleting data from a table in the database can be done easily, and this is dangerous. A computer system can remove a row from a table in the database by a set of simple statements.
The question is, do we really want to delete those (seemingly) unwanted rows? The answer obviously varies from one system to another. The determining factors may include, but not limited to: hardware resources, data sensitivity, project scope, and stakeholder requirements.

The Resolution: Do Not Delete Anything

This is not an absolute solution to this deletion issue. Not deleting anything in the database can help the recovery process. Does this mean that the user will always have to deal with 'junk' data? No. The strategy can be broken down as follows:

User Interface

  • Users can still mark data as deleted and remove it from their interface so they will not deal with them anymore.
  • The next time the user loads up the set of data, the rows that are marked as removed will not be queried.
  • The interface shall have a feature of recovering lost data; rows that are marked as removed are queried and recovered.

Database Control

Employ a data removal strategy. The strategy includes having a data field to deal with marking a subset of data as removed and unmark it to set them as recovered. An example of this strategy is as follows:

  • Add a boolean field on a data table called "Delete_Flag" defaulted to false. When a row needs to be removed, set the field value to true.
  • Add a nullable datetime field on the same data table called "Date_Deleted". When a row is removed, set it to the current date time.
  • Select and Update procedures exclude deleted data.
  • Delete procedure sets "Delete_Flag" field to true and "Date_Deleted" to current date, while no delete row function is called.
  • Recovery procedure sets "Delete_Flag" field to false and "Date_Deleted" to null.
  • Actual data deletion procedure (removing rows from database) can be done on scheduled basis, only when necessary.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License