Script for creating dataset based on exisiting labels.
get_dataset
get_dataset(label_source, trange, resample=None, clean=True, samples=0, var_list=['mms1_dis_dist_fast'])
Get a dataset based on a given config.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
label_source
|
string
|
The source for the labels, either Olshevsky or Unlabeled. |
required |
trange
|
List
|
List with the start and end times for the dataset. The times should be strings and can have either the format YYYY-mm-DD or YYYY-mm-DD/HH:MM:SS |
required |
resample
|
string
|
The resample frequency, this varible follow the rules from the pandas resample function. Cannot be used with label_source set to Olshevsky. |
None
|
clean
|
Bool
|
If unknown (-1) labels should be removed. |
True
|
samples
|
Integer
|
The number of samples per label, set to 0 for all samples. |
0
|
var_list
|
List
|
List of varibles to get from the CDF-files |
['mms1_dis_dist_fast']
|
Returns:
| Type | Description |
|---|---|
|
A pandas DataFrame with the created dataset. |
create_dataset
create_dataset(dataset_path, trange, force=False, **kwargs)
Create a dataset file based on given config.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_path
|
string
|
Path to store dataset, end with either .csv or .feather. |
required |
trange
|
List
|
List with the start and end times for the dataset. The times should be strings and can have either the format YYYY-mm-DD or YYYY-mm-DD/HH:MM:SS |
required |
force
|
Bool
|
Overwrite exisiting file if one exists. |
False
|
**kwargs
|
Futher arguments, passed directy to get_dataset(..) |
{}
|