PELN: Plaintext Electronic Lab Notebook

I currently keep “lab notebooks” on my computers, both at home and at work. I prefer to use plaintext ASCII format, and I check my notebook into my version control software (VCS). My lab notebooks are simple running logs, in reverse-chronological order, and I manually timestamp each entry.

I usually view lab notes as permanent-records. I try not to go back and modify entries. If I do, I usually add a timestamped note about the changes, and try to remember to check in (to VCS) both the “before” and “after” versions (though if it’s an addition w/o deletion/modification, the “before” isn’t necessary).

However, manual timestamping, manual VCS check in/out, and multiple logs gets burdensome. I’d like an automated solution.

Apparently there are some electronic lab notebook (ELN) software packages available, but I haven’t found any that are both free and simple. Some are online-only or require a rather large server installation. Others are expensive professional collaboration packages. I’d really like an ASCII plaintext package with good version tracking and automated timestamping, but not much else.

So, while I continue to scour the web for existing application, I’m also going to look into creating my own ELN, though I’m not committing to it. It does seem like a good learning project, though.

To that end, if I were going to create an ELN, what features would I expect, and how would it be implemented?

The minimal set of features are:

(1) Automated timestamps.
(2) Automatic tracking of changes to entries, with timestamps (implies viewing history and diffs).
(3) ASCII only.
(4) Use a standard text editor (my pick: Notepad++).
(5) Text search.

In order to implement something like that, an option is to use VCS as a backend, but I have some reservations about that, plus I’d like to work out a database (dB) solution to VCS because it might be useful for asset-pipeline projects in the future. I’ll start with a dB, and SQLite is my first choice at the moment since I only need this to work locally on my PC/laptop — I’ll worry about client/server dBs later.

I’m not certain what I want to do for the GUI at the moment, though maybe I can start with a CLI. The GUI can be built later and “shell out” to the CLI to perform it’s functions (the Linux way, so to speak). Even without a GUI, I’ll be able to visually browse the database with the SQLite browser (as limiting as that may be).

The first order of business would be to create a content addressable store (CAS) which records each lab entry, along with a timestamp.

Initially, version tracking isn’t even necessary. Each entry could be recorded when it is complete. If another version of the “same” entry is recorded later, it just becomes another lab notebook entry.

In any case, the CAS needs to record all lab entries and versions as separate entities. It is not concerned with versions, other than as a possible optimization (i.e. store back-diffs instead of entire entries). However, storage optimization will come later if at all (plaintext is not large by today’s standards).

A question regarding the CAS is whether to store data in blobs or as separate files. Considering that most of the data is not expected to be excessively large, I think blobs may be the way to go, at least for the initial implementation.

Actually, it’s looking like this project would be a lot simpler to begin than I imagined. A tier 0 implementation could be as simple as two CLI commands: (i) create a new lab notebook dB, (ii) record a text file as an entry.

And, strictly speaking, I don’t really need a command to create a new lab notebook — it could be created manually, or the record command could create the notebook dB on first use.

Recording text files as entries is still somewhat cumbersome (plus I would like to have 2 timestamps: one when the entry is begun and another when it is recorded, though the recoding timestamp is more important).

Still, I can probably do a lot with that, for example, write a batch script which creates a file, opens it in a new Notepad++ window (with the command line options “-multiInst” and “-nosession”), then records the text file entry after Notepad++ exits.

Any persistent state which is necessary can be recorded in the dB. This does not appear to be necessary for the proposed tier 0 implementation, but would be handy should entries be opened and later closed, or if version-on-save were added. The CLI could check what files were open and could include an option to record all currently open entries.

What would the dB tables look like?

I would need the CAS table, which would probably be: an ID, a timestamp, and a blob. Records are added and never deleted. That reminds me that I should review SQL constraints and other features so I write the dB (more-or-less) correctly this time. Actually, it might be better to eliminate the ID and use the timestamp as the key (though I need to make sure no two entries ever have the same timestamp, e.g. a previously mentioned option to record all currently open entries would need to wait between each recording).

I was thinking about whether the timestamp should be separate, but it is so tightly bound to the recorded blob that it seems it should be maintained in the same table. This seems in-keeping with 3NF.

That means the first implementation is just one dB table! Wow, this is getting easier all the time. I’m definitely going to give it a try.

One Comment

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>