ricevur.blogg.se

One hot encoding in r dplyr
One hot encoding in r dplyr




Programming Language: Logistics Regression, Naive Bayes, Decision Tree, Random Forest, KNN, Linear Regression, Lasso, Ridge, SVM, Regression Tree, XGboost, K - means Python 2.7/3.5 (Numpy, Scipy, Pandas, Seaborn, scikit learn, NLTK), R 3.0, SAS 9.1, SQLĪnalytic Tools: Hadoop Ecosystem Anaconda 4.0 (Jupyter NoteBook 4.X, Spyder), Rstudio Hadoop 2.X, Spark 1.6+ (Pyspark, Sparksql, MLlib ), MapReduce, Hive 1.X,ĭataBase: Data Visualization SQL Server 2008/2014, MongoDB 3.2 Tableau 9.4, Python - matplotlib

  • Ability to maintain a fun, casual, professional and productive team atmosphere.
  • Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making.
  • Used the version control tools like Git 2.X.
  • Excellent understanding Agile and Scrum development methodology.
  • Experience invisualization tools like, Tableau 9.X, 10.X for creating dashboards.
  • Solid ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008.
  • Strong experience in Big Data technologies like Spark 1.6, Sparksql, pySpark, Hive 1.X, AWS(S3, Redshift, EC2).
  • Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0Jupiter Notebook 4.X,R 3.0(ggplot2, Caret, dplyr) and Excel 2010/2013.
  • Extensively worked on Python 3.5/2.7 (Numpy, Pandas, Matplotlib, NLTK and Scikit-learn).
  • Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA.
  • Proficiency in a Neural Network library TensorFlow, Keras.
  • Experience in Deep Learning(CNN, RNN, DNN) with an understanding of RNN, LSTM.
  • one hot encoding in r dplyr

    Implemented Bagging and Boosting to enhance the model performance.Experienced with machine learning algorithm such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression and k - means.Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data.Professional qualified Data Scientist/Data Analyst with over 6 years of experience in Data Science and Analytics including Machine Learning, Data Mining and Statistical Analysis.






    One hot encoding in r dplyr