Data for all combinatorially-encoded STARmap experiments can be found here:
Per-cell read information:
Use STARmapAnalysis object in https://github.com/weallen/STARmap/blob/master/python/analysis.py to load.
cell_barcode_count.csv: cell by gene count matrix
ncell x ngene
same data as in cell_barcode_count.csv but in npy matrix format
cell_barcode_names.csv: names and colorspace sequence of each gene (corresponding to columns of cell_barcode_count)
each row is: GeneIdx, ColorSpaceSeq, GeneName
where ColorSpaceSeq is an Nround color sequence in [1,2,3,4]
genes.csv: genes used in sequencing experiment + DNA sequence
each row is: GeneName,BaseSequence
class_labels.csv: final cluster identities shown in the paper.
each row is: CellIndex, ClusterID, ClusterName
Some cells were excluded from clustering due to too few or too many reads being detected (for 160-gene visual cortex: [200, 2000]; 1020-gene visual cortex: [200, 3000]; mPFC: [100, 2000]). These cells have cluster index = -1 and cluster name = “NA”.
Can be loaded using an optional argument to the “add_data” function.
Cell position/morphology data:
Use code in https://github.com/weallen/STARmap/blob/master/python/viz.py to load.
labels.npz: cell locations and morphology
2D image encoding the cell segmentation, where each cell is represented as a block of pixels with the same numeric ID.
to find cell locations in Python:
import numpy as np
labels = np.load("labels.npz")["labels"]
qhulls,coords = GetQHulls(labels)
all_centroids = np.vstack([c.mean(0) for c in coords])
# get centroids of cells
NOTE: using regionprops will not work, as there is size filtering of cell size and differences in indexing.
Plot expression using: plot_poly_cells_expression(labels, qhulls, counts.iloc[:,n], cmap)
Raw read data:
bases: 1xNspots -- colorspace sequence of bases
qualScores: Nspots x Nrounds -- quality scores per spot, per round
allPoints: Nspots x 3 -- 3D spatial location of each spot
Data for the sequentially-encoded STARmap experiment can be found here: https://www.dropbox.com/sh/mjrlxo4ws1vm8eb/AADBckK4HnQfWXxrWZ2TxDf7a?dl=0
File format: MATLAB struct containing fields
goodLocs: Ncell x 3 locations in 3D
expr: Ncell x 28 expression of each gene