Models, Algorithms, Analysis: Basics

Gene

The following script will request an Allen gene ID, build the query text, query an Allen database. Gene data may be downloaded via other options, e.g., acronym ('ACCN1'), entrez id (40), etc., instead of an Allen gene id (37 for 'ACCN1'); download and amend the script according to your preference.

geneQuery.m

Humans

The ABA data sets of a gene are associated with more than one person. Herein, the details of all the people associated with ABA's genes data sets are downloaded.

donorsHumanBrainAtlas.m

Expression Levels

The expression levels of a gene, per structure, may be studied on more than one probe. Here, expression levels are extracted from all the probes associated with a gene.

expressionLevels.m

Anatomical Structures

The anatomical structures wherein expression measurements were made.

anatomicalStructures.m

Descriptive Statistics

Because the ABI usually investigates genes on more than one probe, we have to make a decison about how to use the resulting expression data. There are a number of options, e.g., use the

probe that has the highest set of expressions, on average.
arithmetic or geometric mean of the [corresponding] expression levels of all the probes.
arithmetic or geometric mean of the [corresponding] expression levels of the two probes that correlate best.

In any case, exploratory data analysis / descriptive statistics will help us to make an informed decision, e.g.,

1. Descriptive Statistics of Expression Levels per Probe of a Gene. There are three probes associated with gene ID 37 (entrez id: 40, acronym: ACCN1). The boxplots summarise the descriptive statistics of the expression data >>

Probe ID >>	'1059455'	'1059456'	'1059457'
'Median'	8.56701	7.92876	4.45727
'Maximum'	10.7729	10.2421	6.37936
'Minimum'	4.82021	3.85268	0.0528753
'# of Points'	893	893	893
'# of Outliers'	32	33	17

2. Line Graphs of Probes per Gene. This is the gene expression data summarised in the boxplots above >>

Acronyms -- PL: Parietal Lobe, TL: temporal Lobe, OL:Occipital Lobe, CgG: Cingulate Gyrus, PHG: Parahippocampal Gyrus

Per Top Structure

If interested in a particular top structure

Code >>


disp([num2cell( (1:size(uniqueTopStructureID,1))' ) uniqueTopStructureNames])
disp('')

j = input('Input a unique structure ID from the list above (from the left column): ');
if ismember(j, (1:size(uniqueTopStructureID,1))')

    for r = 1:1:size(humanData.msg, 2)

        Indices = (donorID == humanData.msg{1,r}.donor_id);
        Parts = partsID(Indices,:);
        Expressions = explevels(Indices,:);
        N = (1:sum(Indices))';


        Indices = find(Parts(:,1) == uniqueTopStructureID(j));
        if isempty(Indices)
            continue;
        end
        Set = [sortrows([Parts(Indices,:) Expressions(Indices,:)], 2) (1:size(Indices,1))'];
        iParts = dsearchn(uniqueStructureID, Set(:,2));

        figure
        hold on
        for n = 1:1:size(Expressions, 2)
            plot(Set(:,7), Set(:,3 + n), 'x', 'Color', rand(1,3))
        end
        box on
        [~, I] = unique(Set(:,2), 'first');
        set(gca, 'XTick', Set(I,7), 'XTickLabel', structureAbbreviations(iParts(I)))
        % xticklabel_rotate(Set(I,7), 90, structureAbbreviations(iParts(I)))


        String = uniqueTopStructureNames{j};
        iUpper = strfind(String, ' ');
        String([1 iUpper + 1]) = upper(String([1 iUpper + 1]));
        xlabel([String, ' Structures'])
        ylabel('Expression Level')
        title({String, ['Gene ID: ', num2str(geneID), ', Donor: ' humanData.msg{1,r}.name]})
        legendText = strcat(repmat({'Probe ID'}, numel(probesSet), 1), repmat({': '}, numel(probesSet), 1), cellstr(num2str(probesSet)));
        legend(legendText, 'location', 'northeastoutside')
        hold off

    end

else

    errordlg('Invalid Structure ID')

end

Comparative Analysis

The log-log Graph of the 2 probe data sets, per donor, that correlate best (and the correlation values).

Code >>


if size(probesSet, 1) > 1

    compareProbes = cell(1, size(humanData.msg, 2));
    for r = 1:1:size(humanData.msg, 2)

        Indices = (donorID == humanData.msg{1,r}.donor_id);
        Parts = partsID(Indices,:);
        Expressions = explevels(Indices,:);
        N = (1:sum(Indices))';

        % Correlations
        Correlations = corr(Expressions, 'type', 'Spearman');
        disp(['Allen Gene ID: ' num2str(genesParameters.msg{1,1}.id), ', Donor: ' humanData.msg{1,r}.name, ' -->'])
        disp([[{'Probe ID'}; num2cell(probesSet)] num2cell([probesSet'; Correlations])])


        % The Most Correlative Pair
        Correlations = Correlations - eye(size(Correlations));
        [Maximums, cIndices] = max(Correlations, [], 2);
        [~, rIndices] = max(Maximums);
        iCompare = [rIndices cIndices(rIndices)];

        % Check
        R = Correlations(rIndices,cIndices(rIndices));

        % Comparative Analysis
        figure
        maloglog( Expressions(:,iCompare(1)), Expressions(:, iCompare(2)));
        xlabel(['Probe ID: ', num2str(probesSet(iCompare(1)))])
        ylabel(['Probe ID: ', num2str(probesSet(iCompare(2)))])
        title({genesParameters.msg{1,1}.name, ['(Allen Gene ID: ' num2str(genesParameters.msg{1,1}.id), ', Donor: ' humanData.msg{1,r}.name, ', Correlation: ', num2str(R), ')']})

        compareProbes{1,r} = iCompare;
    end


end

Geometric Means

Observe the effect of using the geometric means of the expression levels (per point) of the two probes that correlate best.

Code >>


if size(probesSet, 1) > 1

    for r = 1:1:size(humanData.msg, 2)

        Indices = (donorID == humanData.msg{1,r}.donor_id);
        N = (1:sum(Indices))';
        Parts = partsID(Indices,:);
        Expressions = explevels(Indices,:);
        Expressions = Expressions(:, compareProbes{1,r});
        Expressions(Expressions < 0) = 0;

        uniqueParts = unique(Parts, 'rows');

        uniqueValues = zeros(size(uniqueParts, 1), 1);
        uniqueCorrelations = zeros(size(uniqueParts, 1), 1);
        valuesSeries = zeros(sum(Indices), 1);
        correlationsSeries = zeros(sum(Indices), 1);


        for n = 1:1:size(uniqueParts, 1)

            Index = ismember(Parts, uniqueParts(n,:), 'rows');

            iN = N(Index);

            Set = Expressions(Index, :);

            iSet = ~any(Set == 0, 2);

            if sum(iSet) == 1
                uniqueValues(n,1) = geomean(Set(iSet,:), 2);
                uniqueCorrelations(n,1) = 1 - erf(std(Set(iSet,:)));

                valuesSeries(iN(iSet)) = uniqueValues(n,1);
                correlationsSeries(iN(iSet)) = uniqueCorrelations(n,1);

                continue;

            elseif sum(iSet) == 0
                continue;
            end


            uniqueValues(n,1) = geomean(geomean(Set(iSet,:), 2));
            Correlation = corr(Set(iSet,:), 'type', 'Spearman');
            uniqueCorrelations(n,1) = Correlation(1,2);

            valuesSeries(iN(iSet)) = uniqueValues(n,1);
            correlationsSeries(iN(iSet)) = uniqueCorrelations(n,1);

        end

        correlationColour = [191 191 0]/255;

        figure
        plot(N, Expressions(:, 1), 'b+', N, Expressions(:, 2), 'k+', N, valuesSeries, 'g+')
        hold on
        plot(N, correlationsSeries, '.', 'Color', correlationColour)
        xlabel('Data Points of Probes')
        ylabel('Expression Levels')
        title({genesParameters.msg{1,1}.name, ['(Allen Gene ID: ' num2str(genesParameters.msg{1,1}.id), ', Donor: ' humanData.msg{1,r}.name]})
        legendText = [strcat(repmat({'Probe ID'}, numel(probesSet(compareProbes{1,r})), 1), repmat({': '}, numel(probesSet(compareProbes{1,r})), 1), ...
                        cellstr(num2str(probesSet(compareProbes{1,r}))));...
                            'Geometric Mean per Smallest Distinct Structure'; 'Correlation r per Distinct Structure Values'];
        legend(legendText)


        figure
        plot(N, Expressions(:, 1), 'b+', N, Expressions(:, 2), 'k+', N, geomean(Expressions,2), 'g+')
        hold on
        plot(N, correlationsSeries, '.', 'Color', correlationColour)
        xlabel('Data Points of Probes')
        ylabel('Expression Levels')
        title({genesParameters.msg{1,1}.name, ['(Allen Gene ID: ' num2str(genesParameters.msg{1,1}.id), ', Donor: ' humanData.msg{1,r}.name]})
        legendText = [strcat(repmat({'Probe ID'}, numel(probesSet(compareProbes{1,r})), 1), repmat({': '}, numel(probesSet(compareProbes{1,r})), 1), ...
                        cellstr(num2str(probesSet(compareProbes{1,r}))));...
                            'Geometric Mean of Corresponding Points per Probe'; 'Correlation r per Distinct Structure Values'];
        legend(legendText)

    end


end