Script for creating dataset based on exisiting labels.

get_dataset

get_dataset(label_source, trange, resample=None, clean=True, samples=0, var_list=['mms1_dis_dist_fast'])

Get a dataset based on a given config.

Parameters:

Name Type Description Default
label_source string

The source for the labels, either Olshevsky or Unlabeled.

required
trange List

List with the start and end times for the dataset. The times should be strings and can have either the format YYYY-mm-DD or YYYY-mm-DD/HH:MM:SS

required
resample string

The resample frequency, this varible follow the rules from the pandas resample function. Cannot be used with label_source set to Olshevsky.

None
clean Bool

If unknown (-1) labels should be removed.

True
samples Integer

The number of samples per label, set to 0 for all samples.

0
var_list List

List of varibles to get from the CDF-files

['mms1_dis_dist_fast']

Returns:

Type Description

A pandas DataFrame with the created dataset.

create_dataset

create_dataset(dataset_path, trange, force=False, **kwargs)

Create a dataset file based on given config.

Parameters:

Name Type Description Default
dataset_path string

Path to store dataset, end with either .csv or .feather.

required
trange List

List with the start and end times for the dataset. The times should be strings and can have either the format YYYY-mm-DD or YYYY-mm-DD/HH:MM:SS

required
force Bool

Overwrite exisiting file if one exists.

False
**kwargs

Futher arguments, passed directy to get_dataset(..)

{}