Sort by relevance (last access date)

Thomas Lohrum 11 years ago updated by Christof Deininger 9 years ago 18

I suggest to add support for sorting a search result by relevance. As an indication of relevance a last access date should be added to the note. Every time the note will be read, it increases the relevance for this note.

notes-list searching


Closed due to inability to collect 10 votes for more than 2 years.

Since adding a column to the notes table itself is a bad idea, a table on its own should handle the job. The table should include the notes id, the last access date and a usage count. The usage count(er) should reflect the number of updates and reads. This generates automatic favorites, which don't need to be taken care of.

The idea can be made even more advanced. Create a field "Relevance_Value", which is calculated by the last access date, represented by its integer value, plus the usage counter. Sorting for the Relevance_Value should give a smart and unique result set.

Thanks for the suggestion, Thomas!

I think it is a useful thing to have. The only problem that I see is that just reading the DB will change the db file - very bad for Dropbox syncing and R/O files support. We'll probably have to store this data somewhere else.

As for R/O files, i don't know. How is this feature available with CN? As for Dropbox, if only reading is all you did, there is no need for syncing, right? The problem could be, that users don't understand the difference, that is changes are written when data is read. You are right, there are drawbacks, but how many users will be affected? Could the feature be optional?

> if only reading is all you did, there is no need for syncing, right

Yes. But Dropbox will start syncing, because somewhere in the DB the "relevance" counter has been increased. This can lead to conflicts.

>but how many users will be affected? Could the feature be optional?

Pretty much everyone who uses some kind of file-based syncing. It is not only Dropbox, but also SugarSync, SpiderOak and a bunch of other services. Introducing another option - this is a road to bloatware and complexware which I want to avoid at all costs.

Quote: But Dropbox will start syncing, because somewhere in the DB the "relevance" counter has been increased. This can lead to conflicts.
How often will it happen that syncing starts because of a read (updating relevance_value)? What about the following issue in todays solution? The user edits and saves a note, thus syncing starts in the background. A minute later the user edits and saves a second note. Will syncing be restarted? Otherwise integrity of the copied database file can not be assured.

Yes, it will be restarted when CintaNotes releases the file lock - which happens after 30 seconds of inactivity.

Quote: I think it is a useful thing to have. The idea comes from observing how i work. When working on project data updates occur occasionally. Other notes of the project nead to be referenced (read) sometimes. This made me aware of the need for a "usage counter". In combination with the last access date this could complement CintaNotes' powerful search- and organize-features with a smart ordering of relevant notes. This is something CintaNotes lacks so far. Sorting by relevance could actually bring up important notes over recently changed notes, which makes the difference to sorting by modification date.

I guess this can be done via introducing an extra database, which would serve as a container of cached data and statistics. Actually some of cached data could be moved there out of current .db files (the NoteCache and TagCache tables, for example), so the size of the .db files could be reduced. The cache database could be stored e.g. in cintanotes\cache\{notebook-uid}.cache.db.

Also viewstate settings could be moved there from the settings file.

I disagree. Both last access date and usage-counter, which make up the relevance_value can not be rebuild from other data. It is not the sort of cached data, which can be rebuild any time. Another great disadvantage is the dependency between the database cache file and the one-to-n database files. How do you update the cache, when i restore a database file? Will it fail on database uids? Does it make things more complicated for the portable version? It sounds interesting to remove true cache information from the actual notes database. On the other hand the database file is always complete. No need to worry about any dependencies.

You have a point there with the ability to rebuild as a criterion. However, I still think that data which can change on read operations has no place in the db file itself. Anyone who got the "do you want to save changes?" prompt from Excel when he in fact hasn't changed anything would agree here.

The data is already somewhere else! In fact it is available in the backup file! Dropbox syncing and the alike, can be configured to sync with the hourly-created-backup file. This avoids resyncing problems when the database gets changed frequently over a period of time, which is most likely shorter, than it takes to sync the file with the server. Overlaps can be avoided, thus integrity of the database is assured. This supports syncing from the client to the server. I don't know how syncing back from the server to the client is handled right now. That's an issue that needs to be checked. An enhancement could be to allow the period for the "hourly" backup. For example, when a user wants to sync more often, he can configure a value of 30 minutes. The backup's filename would be cintanotes.30.db. This is just an extended suggestion, it is not mandatory. The main idea is to sync with the backup-file! This should remove the shortcomings discussed in this thread.

CN would need to be changed, and drastically:

1) it would need to monitor when this backup file is changed by Dropbox

2) it would need to know that it was indeed changed by Dropbox and not by itself

3) since the backup is a lengthy background process, the probability of conficts increases

4) on conflicts, Dropbox renames the conflicted file and CN would need to do something about it

5) when Dropbox updates the backup file, CN would need to read the changes back into the main file, which is also a new functionality that needs to be implemented.

In short, this creates much more problems than it solves, sorry)

I don't like the idea of a central (cache) database. It can cause some trouble. For example, with Sqlserver i made the experience, that i could not access my databases, when one of the central databases had a problem, e.g. Master.mdf. The most i would like to have the relevance_value inside my db. I understand the shortcomings. How about the following summary: introduce CintaCache.db, a second db file that contains additional data to the main notes database. The file will contain a new table to store last access date and usage counter for each note. The both make up the relevance_value. (Remark: While writing this i realize the problem with this separated db approach: the relevance_value is needed in the main db to perform the order clause). There will be a cache db to every main db. Because the db contains data, that can not be recreated, the cache file needs to be part of the backup operation. Last access date, usage counter and relevance value need to be added to the xml export. The file will NOT be part of the syncing process, because of the dicussed issues. (Remark: After a client received a new main db from a sync server, how can its local cache db be updated, so that integrity between the main and the cache db can be achieved?). The whole matter of not being able to store the data in the main table, kills  my original idea. Imo the relevance_value can really make sorting more efficient. It is a pain it can not be achieved, because of the requirement, that it must not interfere with (dropbox) syncing. A feature i personally don't use. Would it really be so bad, if syncing would be caused by a read (which causes internal updates)? Maybe a true update happens a minute later, making this entire discussion meaningless. 

Well you have a point here, having one cache .db for all notebooks will certainly have its shortcomings. But we are already discussing technical details. Let's get back to this discussion when this issue gets to the top of the implementation list, because when it does, some of the circumstances may change and we'll need to discuss it again anyway.

Closed due to inability to collect 10 votes for more than 2 years.