Class DatasetInfo

java.lang.Object
unifeat.dataset.DatasetInfo

public class DatasetInfo extends Object
This java class is used to keep the input data, split input data to test/train sets and crate CSV file format.
Author:
Sina Tabakhi
  • Field Details

    • numData

      private int numData
    • numTrainSet

      private int numTrainSet
    • numTestSet

      private int numTestSet
    • numFeature

      private int numFeature
    • numClass

      private int numClass
    • checkDataSet

      private boolean checkDataSet
    • checkTrainTestSet

      private boolean checkTrainTestSet
    • checkClassLabels

      private boolean checkClassLabels
    • checkSamplesClass

      private boolean checkSamplesClass
    • Classlabel

      private String[] Classlabel
    • allNameFeatures

      private String allNameFeatures
    • nameFeatures

      private String[] nameFeatures
    • pathData

      private String pathData
    • pathTestSet

      private String pathTestSet
    • pathLabel

      private String pathLabel
    • originalData

      private String[] originalData
    • allData

      private double[][] allData
    • trainSet

      private double[][] trainSet
    • testSet

      private double[][] testSet
    • rand

      private Random rand
  • Constructor Details

    • DatasetInfo

      public DatasetInfo()
  • Method Details

    • isCorrectString

      private boolean isCorrectString(String str)
      This method checks that the input string don't contain the semicolon(;) or tab characters. Also, its checks that the input string contains the comma character
      Parameters:
      str - the input string
      Returns:
      true if the input string is in the correct format
    • init

      private void init(String path1, String path2)
      This method sets the dataset and the class labels of samples
      Parameters:
      path1 - the path of the dataset file
      path2 - the path of the class label file
    • init

      private void init(String path1, String path2, String path3)
      This method read and set the train/test sets and class labels of samples
      Parameters:
      path1 - the path of the train set file
      path2 - the path of the test set file
      path3 - the path of the class labels of samples
    • readExample1

      private void readExample1()
      This method reads the dataset
    • readExample2

      private void readExample2()
      This method reads the train/test sets
    • readLabel

      private void readLabel()
      This method reads the class labels of the samples
    • indexClass

      private int indexClass(String nameClass)
      This method converts the String name of class labels to the integer value
      Parameters:
      nameClass - the name of the class labels as String
      Returns:
      the index of the class
    • splitFeatureWithLabel

      private void splitFeatureWithLabel()
      This method converts and sets the string values of the input data to the double values
    • splitDataSetToTrainAndTest1

      private void splitDataSetToTrainAndTest1()
      This method randomly splits the input dataset to the train/test sets 2/3 of the dataset is used as train set and 1/3 of the dataset is used as test set
    • splitDataSetToTrainAndTest2

      private void splitDataSetToTrainAndTest2()
      This method sets the train/test sets train and test sets previously are split by user
    • preProcessing

      public void preProcessing(String path1, String path2)
      This is used to read dataset and class label files, split datasets and set their values
      Parameters:
      path1 - the path of the datasets
      path2 - the path of the class labels
    • preProcessing

      public void preProcessing(String path1, String path2, String path3)
      This is used to read datasets and class labels, split datasets and set their values
      Parameters:
      path1 - the path of the train set
      path2 - the path of the test set
      path3 - the path of the class labels
    • isCorrectDataset

      public boolean isCorrectDataset()
      This is used to return the status of the dataset
      Returns:
      true if the dataset file is in the correct format
    • isCorrectClassLabel

      public boolean isCorrectClassLabel()
      This is used to return the status of the class label file
      Returns:
      true if the class labels file is in the correct format
    • isCorrectSamplesClass

      public boolean isCorrectSamplesClass()
      This is used to return the status of the samples' class
      Returns:
      true if the the class labels of the samples is valid
    • isCompatibleTrainTestSet

      public boolean isCompatibleTrainTestSet()
      This is used to return the status of train/test sets
      Returns:
      true if the train and test sets are compatible
    • getNumData

      public int getNumData()
      This is used to return number of samples in the dataset(train set + test set)
      Returns:
      number of samples
    • getNumFeature

      public int getNumFeature()
      This is used to return number of features in each sample
      Returns:
      number of features
    • getNumTrainSet

      public int getNumTrainSet()
      This is used to return number of samples in the train set
      Returns:
      number of samples in the train set
    • getNumTestSet

      public int getNumTestSet()
      This is used to return number of samples in the test set
      Returns:
      number of samples in the test set
    • getNumClass

      public int getNumClass()
      This is used to return number of classes in the dataset
      Returns:
      number of classes in the dataset
    • getClassLabel

      public String[] getClassLabel()
      This is used to return the names of class labels
      Returns:
      the array of class labels' names
    • getTrainSet

      public double[][] getTrainSet()
      This is used to return the train set values
      Returns:
      the matrix of train set
    • getTestSet

      public double[][] getTestSet()
      This is used to return the test set values
      Returns:
      the matrix of test set
    • getNameFeatures

      public String[] getNameFeatures()
      This is used to return the names of features
      Returns:
      the array of features' names
    • createFeatNames

      public String createFeatNames(int[] array)
      This method creates a string of the names of features in the selected feature array
      Parameters:
      array - the array of indices of the selected features
      Returns:
      a string of the integer array