Need help figuring out what's wrong with my SQL queries

CodeDragon57 · November 19, 2023, 9:35pm

I don’t know that you can call SQL code, but I need help debugging some of my Postgres queries that I have for a CS assignment. I am completely lost as to why it is broken like it is.

I am performing the SQL queries on the DVD rental database from www.postgresqltutorial.com.

The assignment is pretty silly and unrealistic, in my opinion, but it’s a post-secondary assignment; so why have realistic expectations.

I am supposed to generate a “real world” business report using the data already within the database and insert it into a new pair of tables: a detailed table and a summary table. The business report I am making is one in which I determine which actors’ movies were most often rented by customers. The idea is to use the count function to determine how many rows in the rental table are associated with each actor_id in the actor table. I was thinking that the broken query is the following:

-- Detailed Table --
INSERT INTO detailed
SELECT a.actor_id, get_actor_name(a.actor_id) as actor_name, COUNT(DISTINCT r.rental_id) AS rental_count
FROM actor AS a
    JOIN film_actor AS fa ON a.actor_id = fa.actor_id
    JOIN film AS f ON fa.film_id = f.film_id
    JOIN inventory AS i ON f.film_id = i.film_id
    JOIN rental AS r ON i.inventory_id = r.inventory_id
GROUP BY a.actor_id
ORDER BY rental_count DESC;

However, I am beginning to think that this is not the case. (get_actor_name() just concatenates the actor_name using the built-in CONCAT() function. It was one of the assignment’s sillier requirements). The real problem comes down to this:

CREATE OR REPLACE FUNCTION refresh_count_data()
    RETURNS TRIGGER
    LANGUAGE plpgsql
AS
$$
BEGIN
    DELETE FROM detailed;
    DELETE FROM summary;
    INSERT INTO detailed
    SELECT a.actor_id, get_actor_name(a.actor_id), COUNT(f.film_id) AS rental_count
    FROM actor AS a
        JOIN film_actor AS fa ON a.actor_id = fa.actor_id
        JOIN film AS f ON fa.film_id = f.film_id
        JOIN inventory AS i ON f.film_id = i.film_id
        JOIN rental AS r ON i.inventory_id = r.inventory_id
    GROUP BY a.actor_id
    ORDER BY rental_count DESC;
    INSERT INTO summary (actor_name, rental_count)
            SELECT actor_name,rental_count FROM detailed LIMIT 15;
    RETURN NULL;
END;
$$;

CREATE TRIGGER refresh_on_new_rental
    AFTER INSERT
    ON public.rental
    FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_count_data();

So I created that trigger and stored procedure, then I performed some queries to insert some fake data into the rental table. I use the summary table to check things. Nothing changed; although, I am fairly certain the trigger is being triggered.

P.S. I don’t know why we cannot just use an UPDATE query in the stored procedure, but I wasted like an hour forming that stored procedure before I realized that the instructions explicitly state that I must delete the data and repopulate it completely. One of the more unrealistic requirements…

jode · November 19, 2023, 10:11pm

I assume the summary table is supposed to “sum” rentals for an actor across all its films. Also, it’s likely supposed to contain this aggregated form of data for all actors, not just the first 15.

To do that, change

to

MadMatt · November 19, 2023, 10:23pm

Have you tried to execute the select in the procedure? Looks to me like you are missing a column in the group by, also, why are you joining down to rentals and then counting the film IDs?

CodeDragon57 · November 20, 2023, 4:15am

Because of the database’s design, the detailed table queries the rentals table because its basically just a record of all the rentals. The idea is basically to count the rows in the rental table grouped by the author_id. To do that, because of the sheer insanity and over-engineering (for lack of a better word) of this database, I just link each rental row by inventory id with each film_id, and finally each film to each actor by actor_id. I could be thinking about this entirely wrong though, so perhaps your suggestion of using the sum function might be more effective. When I first set out to do this project, that is what I was attempting to do, but with the payments table too. I quickly realized that doing it this way was flawed because some movies rental fee were higher than others so it wouldn’t give a very good picture of the most popular actors.

In its current form, my summary table is just a copy of the detailed table. They really aren’t very different. The LIMIT of 15 is simply so that I can make them different. If I didn’t include that limit, they would be the same.

I am trying to count the rental rows in the database. So like, the rental table holds each entry for a time and date that a customer rented a DVD.

So, because of the assignment requirements, the SELECT is simply a copy-paste of my original insert query into the detailed and summary tables.

Perhaps r.inventory_id and r.customer_id? Nope nvm.

After some more thought, I have gotten closer to what I need to fix my query. I need to remove rows from the COUNT function that have both the same r.inventory_id and r.customer_id. I tried this, but it’s not valid SQL syntax:

INSERT INTO detailed
SELECT a.actor_id, get_actor_name(a.actor_id) as actor_name, COUNT(DISTINCT r.inventory_id, DISTINCT r.customer_id) AS rental_count
FROM actor AS a
    JOIN film_actor AS fa ON a.actor_id = fa.actor_id
    JOIN film AS f ON fa.film_id = f.film_id
    JOIN inventory AS i ON f.film_id = i.film_id
    JOIN rental AS r ON i.inventory_id = r.inventory_id
GROUP BY a.actor_id
ORDER BY rental_count DESC;

Perhaps, instead of using the concatenation as the custom transformation function, I can use this fact to create my custom transformation.

CodeDragon57 · November 20, 2023, 4:26am

Actually, so I executed this query:

SELECT get_actor_name(a.actor_id), f.title, r.customer_id
    FROM actor AS a
        JOIN film_actor AS fa ON a.actor_id = fa.actor_id
        JOIN film AS f ON fa.film_id = f.film_id
        JOIN inventory AS i ON f.film_id = i.film_id
        JOIN rental AS r ON i.inventory_id = r.inventory_id
WHERE a.actor_id = 107;

P.S. ~~Oof sorry guys that are on mobile.~~ But that’s not an incorrect result. I don’t know how I got it if there are only 500 rows in the rental table…

CodeDragon57 · November 20, 2023, 4:35am

Oh my god. I just figured out the entire problem. Datagrip limits SELECT query output to 500 rows by default. There may still be some issue though, because I am not getting the expected result from the trigger function that I created.

CodeDragon57 · December 1, 2023, 3:36am

I created a two new tables for the DVD rental database found here: Load PostgreSQL Sample Database
I have created a trigger and a stored procedure for updating those tables each time a new query inserts data into the rental table:

-- noinspection SqlWithoutWhereForFile

CREATE OR REPLACE FUNCTION refresh_count_data()
    RETURNS TRIGGER
    LANGUAGE plpgsql
AS
$$
BEGIN
    DELETE FROM detailed;
    DELETE FROM summary;
    INSERT INTO detailed
    SELECT a.actor_id, get_actor_name(a.actor_id), COUNT(f.film_id) AS rental_count
    FROM actor AS a
        JOIN film_actor AS fa ON a.actor_id = fa.actor_id
        JOIN film AS f ON fa.film_id = f.film_id
        JOIN inventory AS i ON f.film_id = i.film_id
        JOIN rental AS r ON i.inventory_id = r.inventory_id
    GROUP BY a.actor_id
    ORDER BY rental_count DESC;
    INSERT INTO summary (actor_name, rental_count)
            SELECT actor_name,rental_count FROM detailed LIMIT 15;
    RETURN NULL;
END;
$$;

CREATE TRIGGER refresh_on_new_rental
    AFTER INSERT
    ON public.rental
    FOR EACH STATEMENT
EXECUTE PROCEDURE refresh_count_data();

Finally, I created a fake query to test it:

INSERT INTO public.rental(rental_date, inventory_id, customer_id, return_date, staff_id) VALUES
                  (current_date, 892, 15, current_date + INTERVAL '7 days', 1),
                  (current_date, 450, 15, current_date + INTERVAL '7 days', 1),
                  (current_date, 555, 15, current_date + INTERVAL '7 days', 1),
                  (current_date, 570, 15, current_date + INTERVAL '7 days', 1),
                  (current_date, 125, 15, current_date + INTERVAL '7 days', 1);

The intent was to add 5 new entries that would give actor, Mary Keitel, with actor_id 198 enough movies to make her more popular than Matthew Carrey. But the rental_count numbers stay the same on the summary table.

I know it’s been awhile, but this was the symptom that I was trying to solve 11 days ago. I just gave a wrong diagnoses. Is this enough for anyone to help me? I am so over this problem.

MadMatt · December 1, 2023, 3:41am

you have a limit in your second insert?

CodeDragon57 · December 1, 2023, 3:45am

That’s intentional. I inserted a

RAISE NOTICE 'refresh_count_data Trigger executed';

Statement to the stored procedure, and I do not see that string in any output.

CodeDragon57 · December 9, 2023, 4:04am

Although it is intentional, I tried removing the limit; but that didn’t fix it, so I placed it back.