Back to the main page.

Bug 2887 - return all channels from an EDF file with multiple sampling rates

Status CLOSED FIXED
Reported 2015-05-01 09:46:00 +0200
Modified 2015-08-27 15:22:39 +0200
Product: FieldTrip
Component: fileio
Version: unspecified
Hardware: PC
Operating System: Mac OS
Importance: P5 normal
Assigned to: Robert Oostenveld
URL:
Tags:
Depends on:
Blocks:
See also: http://bugzilla.fieldtriptoolbox.org/show_bug.cgi?id=2954

Robert Oostenveld - 2015-05-01 09:46:46 +0200

Joe wrote I need to revise the EDF code but am not quite sure how to proceed. I?ve got some EDF files with different physiological measures with different sampling rates. The current code uses a rule-of-thumb in this situation to keep only the sampling rate with the greatest number of channels, which doesn?t work for me. I can see a couple ways forward. One is to implement code to allow for channel selection in ft_read_header not just ft_read_data. This would allow one to override the rule-of-thumb to choose whichever channels are of interest. Another approach would be to use interp1 to resample the channels so that they all have the same sampling rate (the highest).


Robert Oostenveld - 2015-05-01 09:47:09 +0200

I wrote Hi Joe, Coincidentally I happen just now to be working on some improvements of the code related to an edf+ file. It relates to this http://bugzilla.fieldtriptoolbox.org/show_bug.cgi?id=2736. Not directly sampling rate related, but I see that a general cleanup of some EDF related stuff is needed. I have made some local changes, and just committed them. mac011> svn commit fileio/ Sending fileio/ft_read_event.m Sending fileio/private/read_edf.m Transmitting file data .. Committed revision 10347. and Committed revision 10348. The same person that reporting the issue with the EDF+ annotations also ran into the multiple samping rates limitation. Let me share some random thoughts on this (no solution yet): Resampling on the fly is prone to edge artifacts if you were to read small segments and if you were to glue them back together. Adding chanindx to ft_read_header would be a cleaner option, but would not be available to high-level functions like ft_preprocessing and ft_databrowser. There is the complication that ft_read_data and ft_read_event both call ft_read_header. And events (i.e. annotations in the edf+ files) are defined in time but translated by ft_read_event in samples, which requires that they only apply to a single sampling rate. We could also add an option to ft_read_header (and data and event) for specifying the sampling rate rather than the channels. This idea stems from the consideration that you would not know a priori which channels have which sampling rate. There is now in read_edf.m line 230 elseif all(EDF.SampleRate(1:end-1)==EDF.SampleRate(1)) % only the last channel has a deviant sampling frequency % this is the case for EGI recorded datasets that have been converted % to EDF+, in which case the annotation channel is the last That code dates back to https://code.google.com/p/fieldtrip/source/detail?r=2076, but I don?t have a file any more that goes with it. Do you happen to have such an EDF+ file with annotations? I think that at this moment the code fails if the annotation channel has a sampling rate that is different from the rest. Upsampling the annotation channel (when resampling on the fly) would break the event handling, since that requires parsing it as character string in TAL segments (http://www.edfplus.info/specs/edfplus.html#tal). In general I feel more in favour of making a http://www.fieldtriptoolbox.org/getting_started/edf like the ones that are now on http://www.fieldtriptoolbox.org/reading_data. There it could be explained what the problem is and how to deal with it (once there is an agreed upon solution...) I just got a dataset from http://www.physionet.org/physiobank/database/sleep-edfx, which has >> hdr = ft_read_header('SC4001E0-PSG.edf?) >> hdr.orig.SampleRate ans = 100 100 100 1 1 1 1 >> hdr.orig.SPR ans = 3000 3000 3000 30 30 30 30 Quite extreme differences here. Upsamling on the fly (invisible for the user) can lead to unexpected results. I would rather use something more explicit like dathi = ft_preprocessing datlo = ft_preprocessing datup = ft_resampledata datlo to dathi datall = ft_appenddata dathi datup After the resampling and appending it would of course be fine to write the (continuous) data back to disk with consistent sampling rate.


Robert Oostenveld - 2015-05-01 09:47:44 +0200

Joe wrote What you say about possible resampling problems makes sense to me. How about having options for selecting based on both channels and sample rates? Sometimes one knows the desired channels but not the sampling rates. I do have an EDF+ file with annotations. I?ve posted it at: https://jh.box.com/s/xxx I agree about making a documentation page for edf. It could be modified from the bdf one. Also the changes would presumably be made to the bdf file I/O as well. For my own use, I?ve already made a first pass at implementing the channel selection option (using fieldtrip-20141123 as a start regretably). I?m attaching these files for the purpose of showing what I have in mind. I didn?t try to revise the EDF event code as I didn?t need it. While working on this, it occured to me that it might be better to separate the EDF header and EDF data reading code. Right now both are handled by the same read_edf function and it checks the number of input arguments to determine which is desired. If we start adding options like chanindx then that heuristic doesn?t work as well.


Robert Oostenveld - 2015-05-01 09:53:58 +0200

Better work in a more structured manner, i.e. here on bugzilla. I copied the data and made a test script. mac011> svn commit test_bug2887.m Adding test_bug2887.m Transmitting file data . Committed revision 10364. It now returns this warning (as expected) Warning: channels with different sampling rate not supported, using a subselection of 3 channels at 32.000000 Hz > In fileio/private/read_edf at 273 In ft_read_header at 659


Robert Oostenveld - 2015-05-01 09:58:32 +0200

(In reply to Robert Oostenveld from comment #3) reading events is very slow... First because read_edf goes over the network (but that does not explain it fully, network is not that slow). Then because tokenize is slow (not too slow). The output is not what I would expect, i.e. not in TAL format.


Robert Oostenveld - 2015-05-01 10:02:08 +0200

(In reply to Robert Oostenveld from comment #4) reading is slow because K>> hdr.orig ... NRec: 10131 Dur: 1 There is a huge number of 1-sample blocks, not what was expected in the code (common is to have 1 second blocks). The code loops over all blocks.


Joseph Dien - 2015-05-01 14:52:53 +0200

Just so you know what the test file is, it's from an Actiwave Cardio EKG sensor with EKG and actigraph channels. No idea why the annotation channel might be funky.


Joseph Dien - 2015-05-28 04:32:58 +0200

Just to keep things moving forwards, would you be good with my committing something along the lines we've discussed, with options to select by channel or by sampling rate and to split the function into separate readHeader and readData files?


Robert Oostenveld - 2015-05-28 11:06:23 +0200

(In reply to Joseph Dien from comment #7) Hi Joe, it is still not 100% clear to me how the API and user interface would look like. There is the low level interface in ft_read_xxx and the high level FT interface in ft_preprocessing, ft_databrowser, ft_definetrial, etc. I think you mainly care about the low-level interface, i.e. the ft_read_xxx functions. Already there is the need for a consistent representation of the data at th different sampling rates. Something like this needs to work hdr = ft_read_header(filename, 'changroup', xxx); dat = ft_read_data (filename, 'changroup', xxx, 'chanindx', yyy); evt = ft_read_event (filename, 'changroup', xxx); where hdr and evt must be consistent with dat. Also, yyy needs to be consistent with hdr.label and hdr.nChan. I don't care too much at the moment how xxx is specified, whether it is the sampling frequency or some other specification to identify the group of channels that is considered in the three functions. For the higher level FT code, an interesting thought experiment is this cfg = []; cfg.dataset = filename cfg.changroup = 1; data1 = ft_preprocessing(cfg) cfg.changroup = 2; data2 = ft_preprocessing(cfg) possibly with a trialfun and ft_definetrial, and then cfg = []; cfg.time = data1.time; data2resampled = ft_resampledata(cfg, data2) datacombined = ft_appenddata([], data1, data2resampled);


Robert Oostenveld - 2015-05-28 11:07:49 +0200

(In reply to Robert Oostenveld from comment #8) if your commit is going to result in something like hdr = ft_read_header(filename, 'changroup', xxx); dat = ft_read_data (filename, 'changroup', xxx, 'chanindx', yyy); evt = ft_read_event (filename, 'changroup', xxx); then it is fine to commit (as it won't break anything), and I will be able to look at it in more detail to see how to translate it to higher level FT functions.


Joseph Dien - 2015-05-30 07:16:59 +0200

Okay done! I went with a minimalistic strategy. I basically added a case where nargin equals three, which indicates reading a header with a chanindx parameter.


Robert Oostenveld - 2015-05-31 13:11:11 +0200

(In reply to Joseph Dien from comment #10) I made some cleanups and added chanindx to the input parameters of ft_read_header. I also added it to the documentation of the three functions. I have not tested any of this yet. Can you please see whether all three functions work as expected? thanks, Robert


Joseph Dien - 2015-06-03 21:36:45 +0200

There was some code I neglected to include in the read_data.m function so I just committed a revision. Otherwise seems to be working as intended.


Robert Oostenveld - 2015-06-25 10:55:43 +0200

I made a helper function and a faq for this http://www.fieldtriptoolbox.org/faq/how_can_i_read_all_channels_from_an_edf_file_that_contains_multiple_sampling_rates mac011> svn commit edf2fieldtrip.m Adding edf2fieldtrip.m Transmitting file data . Committed revision 10485.


Robert Oostenveld - 2015-07-15 13:31:06 +0200

closed several bugs at once that were recently fixed