Back to the main page.

Bug 2994 - ft_freqstatistics crash when dimord is 'subj_freq_time'

Status CLOSED FIXED
Reported 2015-10-27 13:11:00 +0100
Modified 2016-01-14 21:26:47 +0100
Product: FieldTrip
Component: core
Version: unspecified
Hardware: Macintosh
Operating System: Mac OS
Importance: P5 normal
Assigned to: Jan-Mathijs Schoffelen
URL:
Tags:
Depends on:
Blocks:
See also:

Niels - 2015-10-27 13:11:46 +0100

Since a while, the ft_freqstatistics does not work if I run it with dimord = 'subj_freq_time', ie on TFR's that are based on a pooling of sensors. I now tried ft version fieldtrip-20151005, but it works with a ~1 year old version (not sure exactly). I wonder if you can reproduce the problem with your own freq, otherwise I am happy to provide one of mine. Did maybe something change about how the function should be called? Here is my cfg: __________ cfg = []; cfg.latency = XLIM; % eg [0.2 0.4] cfg.frequency = YLIM; % eg [5 150] cfg.method = 'montecarlo'; cfg.statistic = 'depsamplesT'; cfg.correctm = 'cluster'; % no cfg.clusteralpha = 0.05; cfg.clusterstatistic = 'maxsum'; cfg.tail = 0; cfg.clustertail = 0; cfg.alpha = 0.025; cfg.numrandomization = 1000; cfg.neighbours = []; %in case no channel data present design = zeros(2,2*nsub); for i = 1:nsub design(1,i) = i; end for i = 1:nsub design(1,nsub+i) = i; end design(2,1:nsub) = 1; design(2,nsub+1:2*nsub) = 2; cfg.design = design; cfg.uvar = 1; cfg.ivar = 2; ________________________ And here is the command window output: -------------------- latstat{isoi, idrug, idiff} = ft_freqstatistics(cfg, freq2stats, freq2statszero); the call to "ft_selectdata" took 0 seconds and required the additional allocation of an estimated 1 MB using "ft_statistics_montecarlo" for the statistical testing using "ft_statfun_depsamplesT" for the single-sample statistics constructing randomized design total number of measurements = 36 total number of variables = 2 number of independent variables = 1 number of unit variables = 1 number of within-cell variables = 0 number of control variables = 0 using a permutation resampling approach repeated measurement in variable 1 over 18 levels number of repeated measurements in each level is 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 computing a parametric threshold for clustering computing statistic estimated time per randomization is 0.01 seconds computing statistic 1000 from 1000 Warning: adding /Users/kloosterman/gridmaster2012/MATLAB/fieldtrip-20151005/external/spm8 toolbox to your MATLAB path Error using findcluster (line 56) invalid dimension of spatdimneighbstructmat Error in clusterstat (line 194) posclusobs = findcluster(reshape(postailobs, [cfg.dim,1]),channeighbstructmat,cfg.minnbchan); Error in ft_statistics_montecarlo (line 356) [stat, cfg] = clusterstat(cfg, statrand, statobs); Error in ft_freqstatistics (line 190) [stat, cfg] = statmethod(cfg, dat, design); -----------


Robert Oostenveld - 2015-11-23 13:24:25 +0100

we discussed this shortly in the FT meeting, Jan-Mathijs will look into it


Jan-Mathijs Schoffelen - 2015-11-26 09:53:40 +0100

Hi Niels, Would it be possible to upload two very compact data structures for reproduction (i.e. multi-subject freq2stats structure with 2 frequency bins and optionally a few time bins)?


Niels - 2015-12-03 12:11:03 +0100

Created attachment 759 example freq2stats structure Hi Jan Matthijs, See attached. The problem occurs for me already when I replace freq.powspctrum with zeros and test the freq against freqzero using freqstatistics. Just let me know if you need more data. Niels


Jan-Mathijs Schoffelen - 2015-12-03 14:27:09 +0100

OK, confirmed...after updating the data a bit (convert to double + making the number of labels consistent with the 'chan' dimension in the data :o) ).... there's apparently unexpected behaviour when creating the channelconnectivity, with cfg.channel = 'all'.


Jan-Mathijs Schoffelen - 2015-12-03 14:28:10 +0100

there's an evaluation of the number of elements in the variable cfg.channel, which pre-specifies the dimensionality of the spatial neighbourhood matrix, the 'all' returns a value of 3, rather than the required 2 (which is the number of channels in the data).


Jan-Mathijs Schoffelen - 2015-12-03 14:29:24 +0100

for now, it should work when you explicitly specify cfg.channel = freq.label (assuming, of course, that the number of labels is equal to the channel dimension in your data). I'll think of a fix.


Niels - 2015-12-03 15:44:41 +0100

(In reply to Jan-Mathijs Schoffelen from comment #4) I don't have a 'chan' dimension anymore here right? I must be missing something...


Jan-Mathijs Schoffelen - 2015-12-03 16:12:11 +0100

you're absolutely right! my bad, what was I thinking :o)? back to the drawing table it is.


Jan-Mathijs Schoffelen - 2015-12-03 16:17:44 +0100

it works when I make size of powspctrm: [18 1 2 5], i.e. squeeze in a singleton 'chan' dimension, and change the dimord into 'subj_chan_freq_time'. Removing the 'label' field on the other hand does not work, because the function explicitly expects a structure with a 'label' field. in the past you probably got away with it with the less explicit dimensionality check, but as far as I know the lower level cluster function has always expected the first dimension to be spatial, even if it were singleton. Requires some further digging...


Jan-Mathijs Schoffelen - 2015-12-04 09:57:48 +0100

Hi Niels, I created the following test script (see below), which runs through fine. It implements various scenarios to get from single subject data to a 'stat'-structure, where I start with the outdated way you seem to have done it (which by the way is the way I wouldn't do it anymore, see the other suggested ways as shown in the script). The sole difference with your approach is that the subject-concatenated and channel-averaged data is slighlty different when computed from the script, as compared to the data you sent along. Although according to Fieldtrip's philosophy in itself your data structure would be (almost) an allowed data structure, because the dimord of 'subj_freq_time' matches the numeric data, the presence of a label-field in the structure, and the absence of a singleton 'chan' dimension in the data makes it slightly inconsistent. When the 'label' field is ditched, the data structure becomes fully consistent, but the high-level fieldtrip code (ft_freqstatistics) does not seem to like this, because it explicitly requires a label-field in the data structure in order for it to proceed. Although we think that backward compatibility is very important, I would vote in favour of not trying to patch this. I think the recent improvements in data bookkeeping and consistent data structure definition outweigh a potential patch in the current situation. Would you be OK with adjusting your pipelines with one of the snippets from the suggested code below? Below is the matlab-code: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Bug 2994 has been reported by Niels. Apparently once upon a time it was % possible to call ft_freqstatistics with ft_freqgrandaveraged data in the % input, where the dimord is 'subj_freq_time' (NOTE the lacking chan % dimord) with the data containing apparently an average across channels. % The input data contains a label field. ft_freqstatistics crashes when the % low level function tries to cluster across space. % Let's start this form the beginning, create a multisubject % freq-structure, with data averaged across channels, and see whether it % pulls through tmp.freq = [1 2 3]; tmp.time = [1 2 3 4 5]; tmp.label = {'a';'b'}; tmp.dimord = 'chan_freq_time'; freq = cell(1,10); freq0 = cell(1,10); for k = 1:10 tmp.powspctrm = rand(2,3,5); freq{k} = tmp; tmp.powspctrm(:) = 0; freq0{k} = tmp; end %% ------------------------------------------------------------------------ % here, the following is done: % -ft_freqgrandaverage (keepindividual = 'yes') % -ft_selectdata (avgoverchan = 'yes') % -ft_freqstatistics cfg1 = []; cfg1.keepindividual = 'yes'; cfg2 = []; cfg2.avgoverchan = 'yes'; cfg3 = []; cfg3.method = 'montecarlo'; cfg3.statistic = 'ft_statfun_depsamplesT'; cfg3.design = [ones(1,10) ones(1,10)*2;1:10 1:10]; cfg3.numrandomization = 100; cfg3.correctm = 'cluster'; cfg3.ivar = 1; cfg3.uvar = 2; freqall = ft_selectdata(cfg2, ft_freqgrandaverage(cfg1, freq{:})); freqall0 = ft_selectdata(cfg2, ft_freqgrandaverage(cfg1, freq0{:})); assert(isequal(size(freqall.powspctrm),[10 1 3 5])); % there should be a singleton channel dimension stat1 = ft_freqstatistics(cfg3, freqall, freqall0); %% ------------------------------------------------------------------------ % here, the following is done: % -ft_freqgrandaverage (keepindividual = 'yes') % -ft_freqstatistics (avgoverchan = 'yes') cfg1 = []; cfg1.keepindividual = 'yes'; cfg3 = []; cfg3.method = 'montecarlo'; cfg3.statistic = 'ft_statfun_depsamplesT'; cfg3.avgoverchan = 'yes'; cfg3.design = [ones(1,10) ones(1,10)*2;1:10 1:10]; cfg3.numrandomization = 100; cfg3.correctm = 'cluster'; cfg3.ivar = 1; cfg3.uvar = 2; freqall = ft_freqgrandaverage(cfg1, freq{:}); freqall0 = ft_freqgrandaverage(cfg1, freq0{:}); assert(isequal(size(freqall.powspctrm),[10 2 3 5])); % there should be two channels stat2 = ft_freqstatistics(cfg3, freqall, freqall0); %% ------------------------------------------------------------------------ % here, the following is done: % - NO ft_freqgrandaverage % -ft_freqstatistics (avgoverchan = 'yes') cfg3 = []; cfg3.method = 'montecarlo'; cfg3.statistic = 'ft_statfun_depsamplesT'; cfg3.avgoverchan = 'yes'; cfg3.design = [ones(1,10) ones(1,10)*2;1:10 1:10]; cfg3.numrandomization = 100; cfg3.correctm = 'cluster'; cfg3.ivar = 1; cfg3.uvar = 2; cfg3.avgoverchan = 'yes'; stat3 = ft_freqstatistics(cfg3, freq{:}, freq0{:}); %% ------------------------------------------------------------------------ % here, the following is done: % -ft_appendfreq % -ft_freqstatistics (avgoverchan = 'yes') cfg1 = []; cfg1.parameter = 'powspctrm'; cfg1.appenddim = 'rpt'; cfg3 = []; cfg3.method = 'montecarlo'; cfg3.statistic = 'ft_statfun_depsamplesT'; cfg3.avgoverchan = 'yes'; cfg3.design = [ones(1,10) ones(1,10)*2;1:10 1:10]; cfg3.numrandomization = 100; cfg3.correctm = 'cluster'; cfg3.ivar = 1; cfg3.uvar = 2; cfg3.avgoverchan = 'yes'; freqall = ft_appendfreq(cfg1, freq{:}); freqall0 = ft_appendfreq(cfg1, freq0{:}); stat4 = ft_freqstatistics(cfg3, freqall, freqall0); assert(isequal(stat1.stat,stat2.stat)); assert(isequal(stat1.stat,stat3.stat)); assert(isequal(stat1.stat,stat4.stat)); assert(isequal(stat2.stat,stat3.stat)); assert(isequal(stat2.stat,stat4.stat)); assert(isequal(stat3.stat,stat4.stat)); % the conclusion so far is that with the current state of the code, all % seems to work fine.


Arjen Stolk - 2015-12-05 07:31:34 +0100

(In reply to Jan-Mathijs Schoffelen from comment #10) Hey cap, I just got the same error in findcluster.m using the home/common/ version. Maybe it's a matter of updating of this depository still, so I'll try again tomorrow. Just in case you think it might be due to that change, I'm getting a 3x3 spatdimneighbstructmat (cfg.connectivity is a 3x3 matrix with zeros) whereas the onoff matrix is 17x21x17 (matching the data dimension). This violates the 2nd part of the statement at line 55. The context: using ft_sourcestatistics for comparing .pow fields across groups of subjects (indepsamplesT). Yours, Arjen


Jan-Mathijs Schoffelen - 2015-12-05 08:43:09 +0100

zoals aangegeven, komt dit waarschijnlijk door een foutieve interpretatie van cfg.channel = 'all' in channelconnectivity. Indien de leading dimensie van je data (die van 17) geen spatiele dimensie is, zou je kunnen proberen om cfg.connectivity op nan te zetten, voordat je ft_freqstatistics aanroept.


Arjen Stolk - 2015-12-05 09:01:11 +0100

(In reply to Jan-Mathijs Schoffelen from comment #12) Die cfg.connectivity wordt on the fly gedefinieerd. Dit is 'gewone' source power data, geen connectiviteit (er is dus ook geen cfg.channel). Stel je voor dat ik probeer te debuggen door cfg.connectiviteit te veranderen in een van die low-level functies?


Arjen Stolk - 2015-12-05 09:02:27 +0100

(In reply to Arjen Stolk from comment #13) Ik bedoel ipv 'low-level functie', tijdens het aanroepen van fr_sourcestatistics?


Arjen Stolk - 2015-12-05 09:08:58 +0100

(In reply to Arjen Stolk from comment #14) met cfg.connectivity = nan tijdens het aanroepen van ft_sourcestatistics loopt de functie door


Jan-Mathijs Schoffelen - 2015-12-05 09:47:09 +0100

Het idee achter de cfg.connectivity is om findcluster een idee te geven, hoe te clusteren over de spatiele dimensie van de data. In het geval van kanaal data (of source data gedefinieerd op een triangular mesh, waarbij het wenselijk zou zijn om de buren te definiƫren aan de hand van de edges) wordt cfg.connectivity gebruikt. Indien deze niet door de gebruiker is gedefinieerd, wordt deze on-the-fly gemaakt, aan de hand van cfg.neighbours (kanaaldata), of aan de hand van source.tri. Indien de data impliciet een representatie is van een reshapable 3D-grid, wordt de spatiele dimensie clusterbaar verondersteld als per spm_bwlabel. Ik kan me voorstellen, dat de automatische machinerie faalt als je slechts 1 spatiele component hebt (e.g. 1 voxel, zoals in jouw geval, of 1 kanaal, zoals in Niels' geval). Voor kanaal-achtige data zou een oplossing kunnen zijn om cfg.connectivity=nan te zetten. In jouw geval (de aap kwam weer eens lekker laat uit de mouw) van ft_sourcestatistics zou idealiter een consistente data structuur voldoende moeten zijn. Dus, een source structuur met 1-pos, 1-inside en een dim van [1 1 1].


Arjen Stolk - 2015-12-05 10:03:08 +0100

(In reply to Jan-Mathijs Schoffelen from comment #16) I see: structurele connectivity. Dit was trouwens volume data, geen surface (ander project :p). Elke voxel zou dan minstens 6 buren moeten hebben?


Jan-Mathijs Schoffelen - 2016-01-13 13:17:00 +0100

I haven't heard back for a while from Niels, so I assume all is fine. Changing status for now.


Niels - 2016-01-13 15:07:37 +0100

Ha Jan-Matthijs, sorry voor de late reactie! Goed idee om een singleton dimensie voor de channel te gebruiken en niet de code aan te passen. Bedankt voor je moeite. Groeten, Niels


Jan-Mathijs Schoffelen - 2016-01-13 15:15:07 +0100

Check! Het belangrijkste is, dat je vooruit kunt, en dat lijkt het geval te zijn. Als altijd bedankt voor je input, much appreciated.