trfit - numerical fitting of time-resolved fluorescence decays

trfit is a program for fitting time-domain (Time Correlated Single Photon Counting, TCSPC) fluorescence data with various mathematical models. This page should help you to get up to speed with it. One caveat upfront: trfit is command-line driven, which means that you need to drop your mouse and type commands like

trfit mode=fit data=*.dat model=exp1 tau=2 izero=4

into a console window (on Windows this is called a command prompt). This may take some getting used to, but is in some ways more effective than a graphical user interface.


Features:


Installation

1. Install the prerequisites

trfit requires several the following pieces of software:

These are not included in trfit, so you have to install them yourself. The installation procedure depends on your system.

Windows: If you haven't used Python previously, Python most likely isn't installed. Download the installers from the above websites and run them in sequence. Take care to match the versions of all libraries to that of the main Python version. As of July 2009, the main version of Python is 2.6. The matplotlib site offers an '.egg' installer up front - avoid this foul egg and instead click your way the .exe installer.

After running the installers, open a command prompt (DOS) window and type python <enter>. If you see some message and an >>> prompt, Python started; if you get an error message instead, you have to add Python's root directory (c:\Python26 or similar by default) to the system's path: Control panel -> advanced -> environmental variables, PATH -> edit: Append ;c:\Python26 and save. Close and re-open the DOS window and try again.

As an alternative to the stepwise installation, you might consider the Enthought or the Python(x,y) distributions, which include Python and all prerequisites in one single installer, along with huge amounts of other stuff you may never need. I have not tried either myself.

Linux: On most Linux distributions, Python will already be installed, and the libraries should be just a few mouse clicks away in your package manager. (Debian Linux has lots of software for scientific computing. Try it.)

Mac: I have no Mac and therefore I won't give any hypothetical instructions here. However, I have been told that trfit worked on a Python 2.6 install on a Mac.

2. Install trfit

On both Linux and Windows: Download the trfit zip file, save in a convenient location and unzip. Open a console window and change into the directory trfit-1.1 that you just created by unzipping. Type python setup.py install and hit Enter. Note that this step usually requires administrator privileges. cd back to your home directory and try to run trfit. If you see a short blurb describing the program, everything should work now. If you get an error instead, try appending the trfit directory to the system path (see above for Windows; Linux users are supposed to know how to do it). If it still doesn't work, installation failed - may be a permission problem. Become root (Linux) or log on as a user with administrator privileges (Windows) and try installation again.

Installation reportedly has succeeded on Macs, too, but I have not done it myself.


Tutorial

This tutorial assumes that you are already familiar with the TCSPC technique. Just to make certain that we are on the same page with respect to terminology:

Decay
A data series of an actual TCSPC experiment. The file containing this data series may also contain the associated instrument response function.
Instrument resonse function (IRF)
A data series, usually obtained with a light scatterer, that is used for correcting fluorescence decays in numerical fitting routines. This data series can either reside in a separate file or within the same file as the decay.
Model
A mathematical function that theoretically describes a decay. The main purpose of trfit is to optimize the parameters of a model to fit experimental decays, and to report the results of such fits in tabulated and in graphical form.
Report
A format for displaying the results of a fit. It will typically contain the fitted values of the parameters of the model used in the fit. It may also contain some information calculated from these parameters.

Now, let's get started.

The directory that you created when unzipping trfit also contains an example directory. Inside this directory, you will find a couple of actual experimental fluorescence decays, all named using the extension dat, another data file (named irf) that contains the matching instrument response function, and a couple of command files. We will use these to quickly illustrate the program's functionality.

  1. Open a console window, cd to the example directory, and run the command trfit cmd=showfiles. After a few moments, a window should open that displays one decay with the corresponding IRF. Also shown are the baselines for both curves (or, more precisely, what trfit assumes to be the baselines).

    The command you just typed caused trfit to read the instructions contained in the file named showfiles. Close the curve displays and have a look at the content of this file by typing the command cat showfiles (Linux) or type showfiles (Windows). The file contains both instructions and some explanations (prefixed with #). If you want to read more on an individual parameter, type e.g. trfit help timestart. For an overview of all parameters, type trfit help parameters.

    You should look at the content of the other files discussed here, too - the comments will give you additional hints on how to use trfit effectively.

  2. In the previous exercise, the time interval displayed contained the entire range covered by the data files. To limit the time interval, edit the corresponding lines in file showfiles (Attention Windows users: Use a plain-text editor for this, like Notepad, not a word processor likeWord. You can load the file into Notepad directly from the command line by typing notepad showfiles).
    Alternatively, you can achieve the same effect without editing showfiles, just by typing trfit cmd=showfiles timestart=5 timeend=40. This works because parameters passed on the command line will always override any setting of this parameter in a command file.
  3. Type trfit cmd=fitshow. This will load one decay and fit it with a single-exponential decay. The wavy residuals, as well as the high chi-square value, tell us that the fit is poor. Try model=exp2 to apply a two-exponential model. The chi-square is much lower now, but you may notice that the fitted baseline is not ideally placed in the first part of the curve. To allow trfit to optimize the baseline, addfloatbaseline=yes. You may want to try model=exp3 or model=exp4 as well.

    So far, the initial parameters for the fits were guessed by trfit automaticaly. Type trfit cmd=fitshow mode=calc,show to see the initial guesses applied without fitting. (Yeah, pretty bad.) File fitshow2 illustrates how to supply explicit values for the parameters. With simple models such as exp1 and exp2, there should be no need for this, but fits that use more complex models may require some reasonable starting values to succeed.

  4. In the graphics windows that pop up when running the above command, there are several controls that let you zoom in and out, move the graph around, and save it in PNG, SVG or EPS format. EPS format is great for use with LaTeX, and it can be converted to PDF with the proper tools. Unfortunately, importing EPS into MS Office is complicated and error-prone. Bitmap graphics such as PNG give up somewhat in resolution but have the benefit of broad compatibility.
    Run trfit cmd=fitshow mode=fit,dump dumpext=csv. No graphics will show up, but you will now find a new file with the extension .csv appended. This file contains all experimental data points, the corresponding fitted values, and the residuals, in a format that is easily imported into spreadsheets or other plotting software.
  5. Run trfit cmd=fitbatch. This fits all files that match the pattern eNOSpep*mM.dat to the two-exponential model.
  6. Now run trfit cmd=fitglobal. A global fit will be performed for parameters tau and tau2, meaning that the same values for these parameters are applied across all data files, and the overall chi-square is minimized. The other two parameters (izero and izero2) remain individually variable for each data file, however. Because each data file has to be fitted many times over, this global fit takes a while to run, but it's not too bad, really, thanks to the hard work of the people at Numeric Python - and the Fortran programmers that supplied the underlying libraries.
    Notice that the results table has changed - instead of the raw pre-exponentials (izero, izero2), we now see the fractional contributions, which nicely highlights a trend in the data. Type trfit help reports for more info on this topic.

This short tour of trfit has already illustrated most of its functionality. For additional info, have a look at the output of trfit help parameters, trfit help reports and trfit help models.


Using trfit with your own data

  1. File and directory names: My students have an unfortunate tendency to name their files something like The-sample-containing-some-stuff-or-other-that-I-measured-the-morning-my-cat-crapped-on-the-carpet.dat. You will notice very quickly that this naming convention does not work very well with a command line-driven program. Save yourself some pain and give your files short and sweet names. Name them so that you can easily specify groups of files that belong together (protein1.dat, protein2.dat, membrane1.dat, membrane2.dat ... )
  2. File formats: The program that you use to control your TCSPC instrument should let you save or 'export' your data in some kind of text format, for example as CSV (comma-separated values) for import into spreadsheets. trfit should be able to read most of these text files. Try to save one such file and to run trfit mode=show data=yourfile on it. You should see at least one curve, which should be the decay; if you see two curves, the second one should be the IRF. If you don't see anything, try opening the file in a text editor. If it is human-readable and not a garbled (binary) mess, see whether you can figure out whether you can adjust the xcolumn, ycolumn andirfcolumn parameters to read the file (specifying a separate irf file as needed). If everything fails, send me an example file (mpalmer at uwaterloo dot ca).
  3. When setting the time interval for fitting your decays, don't crop away too much - include some stretch of flat baseline at the right hand side of the decay.
  4. Start fitting with simple models first and work your way up to more complex ones as required. Use the results of simple fits as guideline when setting initial parameters for more complex models. This can prevent the fitting routines to go astray.
  5. Apply limits to parameters in order to keep them in line. For example, you can use tau=1,0.5,1.5 and tau2=4,3,6 to restrict the range of these parameters to 0.5-1.5 and 3-6 respectively; in this way, you can ensure that tau will always remain the shorter of the two lifetime components.

Extending trfit

You may have noticed that the mathematical models that come with this distribution are a little limited. How can you use other models, tailored to your needs? There are two possibilities:

  1. Do it yourself - the recommended strategy, since it will serve you best in the long run. It requrires familiarity with the Python programming language.
  2. Ask someone else. I'm interested in fluorescence, and if I find time I will try to help you out.

To pursue option 1, have a look at modules numericmodels.py and model.py. Your own models will most likely inherit from class model.ExponentialModel. This class works as follows:

  1. In each iterative round of fitting, the method model is passed the current values of the model's numerical parameters, in order of appearance in the class attribute parameters. For example, method numericmodels.DoubleExponential.model is called with the current values of parameters izero, tau, izero2, and tau2.
  2. Method calculateExponentials calculates a list of 2-tuples, each of which contains one pre-exponential and its associated lifetime. Most of the time, you will need to override this method; look at classes models numericmodels.ConstrainedTriple and numericmodels.NormalDistribution for examples.
  3. Method applyExponentials will expand and accumulate all these lifetime components; it should not normally need to be overridden.

Where should you keep your own models? You might save them in the trfit source directory; the problem then is that they might get lost inadvertently when you upgrade trfit. So, you probably want to keep them in a location elsewhere along your Python path. trfit assists you with this by trying to run from usermodels import *

during program startup, and likewise from userreports import *. If you are familiar with python, you will know how about the various ways to make any such modules visible to trfit.

You may also want to develop your own customized reports. A good, concise description of how reports work is beyond my powers of narration; however, a look at module reports.py and its immediate buddies may tell you enough about how to accomplish at least some simple customizations.

Lastly, if you have written a new model or report class, I'd like to hear from you, so that it can be included in future releases of trfit. Of course, if you find bugs or have suggestions for improvement, I'd appreciate a message, too.


Technical details

Here, I describe some of trfit's inner workings. This information may be of interest if you are considering to extend trfit with added functionality. It will only make sense to you if you are familiar with the Python language, or rather, it won't if you aren't ;-)

Program structure

trfit really does not much more than string together the mathematical library functions supplied by the Scientific Python / Numerical Python libraries. Likewise, the graphical display is adopted wholesale from matplotlib. These underlying libraries, in turn, rely to a large extend on C- and Fortran-coded libraries, which lets trfit benefit from the efficiency of those languages. Such library functions are used at various stages:

  1. Convolution of theoretical models with the instrument response function. Since this step has to be repeated in each iteration of fitting, it must run fast. The code is in module irfconvolver.py.
  2. Fitting of individual decays. This relies on the widely-used Levenberg-Marquardt algorithm, which is invoked in module decayanalyzer.py.
  3. Global fitting of parameters. This is a higher order procedure in which multiple decays are simultaneously fit, while sharing one or more parameters. Each of these global parameters is optimized to yield the lowest cumulative chi-square of all participating decays.
    Variation and optimization of global parameters is handled by the Numerical Python implementation of the Nelder-Mead simplex algorithm. In every iterative step of this optimization, each individual decay is fit over again, whereby the current values of the global parameters are treated as invariant (fixed); only the non-global parameters are optimized using the Levenberg-Marquardt algorithm as above.
    The code for global fitting is located in wrappers.py.

Reading input files

trfit uses a heuristic approach to the parsing of data files, which employs the following rules:

  1. Data in a row should be separated by white space or by commas. If we find commas, use those, if not, use white space to break up the row into data.
  2. Files usually contain some lines with descriptive information preceding the data. Data lines are detected using the assumption that all of them have the same number and sequence of types (text or numeric) of entries. By signature we mean the sequence of numeric or non-numeric entries. We will simply use the longest continuous run of lines with the same length and signature.
  3. Time intervals should be linear. We will use the first numerical column that has constant intervals and use it for the times. If we don't find a column with such data, we will upchuck and die. If we do, we will use the next column as decay, and the next one as IRF. These automatic choices can be overruled by the xcolumn, ycolumn,and irfcolumn parameters.

Input parsing is implemented in module loaddata.py.

Output formatting

The results of numerical fits can be post-processed for presentation or evaluation. For an overview for all 'raw' output available, run a fit with the option report=fulltable. Examples of post-processed output can be seen with report=normtable. For each file that was fitted, this output is passed around as a list of (name, value) tuples. In addition, there is a list that contains global information, such as the average chi-square.

The base classes dealing with all of this are implemented in processresults.py, while the special subclasses are found in resultprocessors.py.


Maintained by Michael Palmer, University of Waterloo