Manipulating Strings in Excel

Changing parts of data within an Excel cell, AKA String Manipulation, consists of a few simple functions. Once we understand the basics, we can look at how to combine them to do perform complex maneuvers.

When I use the term string, I am referring to a series of alphabetic, numeric, and other characters (e.g. comma, semi-colon, hashtag, at symbol, etc.) viewed as text.

LEFT

2017-07-23_222809

The LEFT function, as seen in the image above, consists of two inputs or arguments. The first is the text we are performing the function on, and the second is the number of characters to return.

The first argument, A2 points to the cell containing the word “unconventional”. The second argument, 2 means the resulting value will be “un”. If I change that to a 4, it will be “unco”. What do you think the value will be if I entered 1?

RIGHT

2017-07-23_222720

The RIGHT function is similar to the LEFT function, as it also consists of two inputs or arguments. The first is the text we are performing the function on, and the second is the number of characters to return.

The first argument, A2 points to the cell containing the word “unconventional”. The second argument, 3 means the resulting value will be “nal”. If I change that to a 7, it will be “ntional”.

LEN

2017-07-23_222605

LEN tells me the number of characters in a string, or length. So, while the prior two functions returned text, the LEN function returns a number. I will demonstrate the use of this function later in this post.

There is only one argument in LEN, which is the string we are measuring. The word “unconventional” contains 14 characters, so LEN will return 14. If I add a space after the last letter, the length will be 15.

MID

2017-07-23_222509

The MID function contains three arguments. The first is the string, while the second is the starting point in the string, and the last is the number of characters from the starting point. If I use the above arguments, B2 will contain “vent”.

If I change the second argument to 7, the result will be “enti”, so we see the third argument always counts characters from the second argument.

SUBSTITUTE

2017-07-23_222401

Substitute will replace one character for another. In the example above, all occurrences of the letter “n” will be replaced by an exclamation point “!”. The first argument is the text we are evaluating. The second argument contains the character we are looking for, and the third is the character we will replace it with. Note, that the second and third arguments could contain one or multiple characters. If “nc” were the second argument, and everything else remained the same, then it would return “u!onventional”.

Combining functions

LEFT and LEN

2017-07-23_221824

If I wanted to return all but the last letter of cell A2, I could use both LEFT and LEN combined. Excel will accept the LEN function because it returns an integer. I subtract one from the length to exclude the final letter. All of these functions can be combined together to help gain access to hard to reach string combinations. Try combining them and see what you get.

 

Importing Data into PowerPivot Part 1

Jumpstart your PowerPivot knowledge with this post. This Excel add-on can do some serious heavy lifting when it comes to crunching data. Here is an easy way to practice using PowerPivot with a free large dataset.

Microsoft PowerPivot extends the capabilities within Excel to BI Analytics. The process is to pull data in from a variety of sources, and then build pivot tables to explore this data. This can be done with OLAP data, Access, Excel files, DB2, and a list of other sources.

I find I have significantly more flexibility when pulling data using this tool. Additionally, unlike OLAP, as the author of the content, you can refresh the content or keep a snapshot in time of the data.

For this article, I will demonstrate how to import from Access, create a simple join between two tables, and pivot the data into Excel.

Years ago, I took a course in statistics, and a group of us obtained a large dataset for a project. It was the Major League Baseball database, and it was – and still is freely available here:

http://www.seanlahman.com/baseball-archive/statistics/

I downloaded the 2016 version for writing this piece. It contains information on players, batting, fielding, and other baseball related stats.

Once I downloaded and saved the database I opened a new Excel file. I am using Excel 2016 which has PowerPivot built in. If you are using Office 2010 or 2013, PowerPivot is available in those versions as well.

Upon opening, if you have PowerPivot available it will show in the ribbon. If your version of Excel is 2010 or 2013 and you do not see PowerPivot in the ribbon follow the steps as outlined here:

https://support.office.com/en-us/article/Start-the-Power-Pivot-in-Microsoft-Excel-add-in-a891a66d-36e3-43fc-81e8-fc4798f39ea8

2017-07-20_191003

I select the PowerPivot tab and click the green icon over the word Manage.

A new window opens for the PowerPivot extension. This is tied to the existing Excel document. If I save the Excel file, any changes I make within PowerPivot will be saved.

To import the Access file:
Select: Get External Data -> from Other Sources
From the popup menu select: Access

Then use the browse button to locate the Access file. Note that the version of Access available with my MLB database was an earlier version, and I had to use the dropdown to select that format and see my file.

Once the database is identified, the application prompts me to import by selecting tables or writing a query. For this article, I choose select tables.

Next, I am presented with a list of tables from the MLB database. I choose, Batting and Master tables. I click finish and I can see the data from the two tables I selected.

To enable the data between the two tables to populate a pivot table correctly, I now use the icon on the top right called Diagram View.

I now see representations of both tables in diagram form. I click and drag while holding down my mouse from PlayerID field in the Master Table to the PlayerID field in the Batting Table. The two tables are now have a relationhship via the PlayerID Field

2017-07-20_215535

I want to also include a custom field, so I click on Data View and I select the Batting Sheet. I scroll all the way to the right and start typing into any cell of the first empty column:

=YearID&TeamID

Then I label the column YEAR_TEAM by entering this in the top row of that same column.

Next I choose the pivot table option from the top menu and import data into a new pivot table. PowerPivot sends me back to my Excel file and I have a Pivot Table I can start to work with.

This is where some knowledge of baseball will come in handy. Do you know any player who has played on more than one team during his career? Well, if not, you can use Brian McCann. McCann was on the Braves and the Yankees. We are going to look at his Home Run records every year he played in the MLB through 2016.

Add the fields shown below into the PivotTable. Keep in mind, once you add playerID it will be easier to first filter on “mccannbr01” (or whichever player you choose) prior to adding additional dimensions or measures.

2017-07-20_220202

Note that through adding the link between the Master and Batting tables the data can be filtered through attributes in another table.

Additionally, the YEAR_TEAM dimension is one we created, yet it can be easily added as a column to note which year and team the player was on.

Using other measures such as RBIs, this free data source is a great way to learn more about what one can do with PowerPivot.

I plan to post more on PowerPivot in the near future, but this is a good place to start.

Where Analytics Fits Best in Business

Understanding the business you are in is critical to producing quality work. The further an analytical resource is from this knowledge, the more likely the results are churned out without vetting the validity of the data. Then again, individuals immersed too heavily in the culture or business can produce biases when analyzing for honest answers. This article in HBR speaks to the subject of where analytics fits within the structure of a team.

https://hbr.org/2017/06/3-things-are-holding-back-your-analytics-and-technology-isnt-one-of-them

 

Months in an Excel Dropdown

When I find I am performing the same task repeatedly, I look for ways to make my life easier. I prefer spending my time doing new and exciting things, not monotonous and redundant activities. Often, these experiences give me the chance to try something that will save time, and keep me entertained.

One example of this is a recent monthly report I am responsible for maintaining. It has sales numbers for the present year – Year to Date, as well as the previous year. I can change the month of my report in one place using a dropdown menu. This dropdown menu is referenced by any place in the file where it looks for current month.

unnamed

 

 

 

To do this:

  1. First I enter a list of all the months of the year into 12 cells
  2. I then choose where I want to add my dropdown list by selecting a cell
  3. Then I select the Data tab from the ribbon
  4. Next, I choose Data Validation from that menu (screenshot above)
  5. A Data Validation popup menu then gives some additional options – Choose list from the dropdown menu

image-2

4. Then the prompt asks to select data. I select the list of months from step 1

image-3

Now I have my dropdown box. I can always reference cell D3 whenever I want to populate the current month I want to pull data for. This part was fun to do, but the next part will make life even easier.